I was about to write a quick post on why I’m fussy about what URLs look like (see below Locator + Name = Good), and thought there was an important distinction between URI s/
URNs and URLs. Then I googled to check my terminology and discovered that not only was I wrong, but it was hard to find a good succinct definition. I aim to clarify here the best-practice terminology of the basic building block of the web “URI” and the important notions of URI that are “names” (formerly URNs) and URIs that are “locators” (formerly URLs).
The basic story is, that a URI (Universal Resource Indicator) is the general term, and that URLs (L for locator) are things that locate them (in simple language you can paste a URL into a application like a browser/email client/ftp client and it should know where to find it. One might think – well what else could a URI be? An example could be, a unique way of referring to a book by its ISBN number (isbn:0714844403). NB: The http:// or isbn: at the beginning is part of the URI-scheme (a format for defining a recognizable URI). If something is only a name it is ocassionally called a URN (N for name). However URL and URN are now viewed as being unnecessary partitionings – instead a URI may be a locator, and/or it may be a name.
A locator can be (simplistically) a “link” (the target of a hyperlink). This makes it very tangible and graspable by most internet users.
A name can be something that uniquely identifies/represents or corresponds to something or someone. This is a more abstract concept which is in the past has only been important largely for programmers and the digerati/webnoscenti.
The most common source for defining URIs and URLs is the W3C page Naming and Addressing: URIs, URLs, … However the three dots are a give away, that nothing is actually resolved on this page, and within the jumble of links and metadata it is hard to find an authoritative source. Elsewhere for example, Tim Bray argues that, officially, “There is no such thing as a URL”. This is definitely wrong: According to the IETF (who really are the authoritative source) RFC3986 section 1.1.3 URI, URL, and URN is the current ‘truth’. If you need more detail, there is in fact an entire RFC dedicated to the topic of the terms, and why the confusion has arisen (RFC3305). The word URL still means what it always did – the location of something within compter-findable space (a web page, an ftp location, an email address….) – URLs are a specific subset of URIs – but the term is not that useful for definitive documentation as the “name” and “locator” aspects of URIs are not disjoint subsets, but overlapping.
This disposal of the terms URL and URN is useful – because it allows one to see what happens when the “locator” and “name” aspects of URIs overlap, or not…
Locator + Name = Good
However it gets more intersting for everyone when a URI is both a name and a locator – particularly when this is the canonical form of the name. This name can then be a token which can be passed around, but also inspected for its meaning (by typing into a web browser’s location field). This is one of the bases of many effective and lightweight web phenonmena – lightweight APIs, blog permalinks, and (relatively) new phenomena like the socialgraphing, and XFN. As web users increasingly have to manage their own online identities through multiple intertwinglable social networking sites, the idea of the link that is also a name (as is implicit in XFN) will become important. URLs are great when they are names as well as locators, but are even better when their form (the words and format they use) is expressive of their “namefulness”, and explain their own contruction.
Locator – Name = Broken links
When you have a locator which is not a name, you have basically some arbitrary text string giving you a page, which often has its form of technology embedded in it (URIs ending in .jsp, .asp, .do, .action or encoding data in the querystring ?id=nnnn). The linkage is an arbitrary result of the creation of the site, and it is highly likely that these old URIs will not and cannot be preserved when the site is migrated to a new technology.
Name – Locator = Confusion
The converse is scary, when a URI looks like a locator (e.g. it is a http URI) but it is NOT a locator only a name. Most of the time this doesn’t happen. A good case where it does come is in XML Schemas: in namespacing for example – in which the namespace is defined by a URI with an http URI-schema, but that doesn’t demand that the URI works as a locator but is just a unique name (it could (and often does) generate a 404 when typed in a browser). However it is much more confusing with the XML Schema schemaLocation attribute:
The schemaLocation attribute value consists of one or more pairs of URI references, separated by white space. The first member of each pair is a namespace name, and the second member of the pair is a hint describing where to find an appropriate schema document for that namespace. [W3C Schema Primer]
This means that a schemaLocation contains a http-URI that is not a locator (not a URL!) followed by a space followed by a http-URI that is a locator. Hmm. Makes sense to me now (after six or seven years of battling with XML Schemas…).