00134-AppendixO RFC1738 UniformResourceLocators (URL)
00134-AppendixO RFC1738 UniformResourceLocators (URL)
Berners-Lee
Request for Comments: 1738 CERN
Category: Standards Track L. Masinter
Xerox Corporation
M. McCahill
University of Minnesota
Editors
December 1994
Abstract
1. Introduction
This document describes the syntax and semantics for a compact string
representation for a resource available via the Internet. These
strings are called "Uniform Resource Locators" (URLs).
This document was written by the URI working group of the Internet
Engineering Task Force. Comments may be addressed to the editors, or
to the URI-WG <[email protected]>. Discussions of the group are archived
at <URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>
The generic syntax for URLs provides a framework for new schemes to
be established using protocols other than those defined in this
document.
<scheme>:<scheme-specific-part>
A URL contains the name of the scheme being used (<scheme>) followed
by a colon and then a string (the <scheme-specific-part>) whose
interpretation depends on the scheme.
the chararacter which has that octet as its code within the US-ASCII
[20] coded character set.
URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.
Unsafe:
Reserved:
Some URL schemes (such as the ftp, http, and file schemes) contain
names that can be considered hierarchical; the components of the
hierarchy are separated by "/".
3. Specific Schemes
While the syntax for the rest of the URL may vary depending on the
particular scheme selected, URL schemes that involve the direct use
of an IP-based protocol to a specified host on the Internet use a
common syntax for the scheme-specific data:
//<user>:<password>@<host>:<port>/<url-path>
user
An optional user name. Some schemes (e.g., ftp) allow the
specification of a user name.
password
An optional password. If present, it follows the user
name separated from it by a colon.
host
The fully qualified domain name of a network host, or its IP
address as a set of four decimal digit groups separated by
".". Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by ".", each domain
label starting and ending with an alphanumerical character and
possibly also containing "-" characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
addresses.
port
The port number to connect to. Most schemes designate
protocols that have a default port number. Another port number
may optionally be supplied, in decimal, separated from the
host by a colon. If the port is omitted, the colon is as well.
url-path
The rest of the locator consists of data specific to the
scheme, and is known as the "url-path". It supplies the
details of how the specified resource can be accessed. Note
that the "/" between the host (or port) and the url-path is
NOT part of the url-path.
The url-path syntax depends on the scheme being used, as does the
manner in which it is interpreted.
3.2. FTP
A user name and password may be supplied; they are used in the ftp
"USER" and "PASS" commands after first making the connection to the
FTP server. If no user name or password is supplied and one is
requested by the FTP server, the conventions for "anonymous" FTP are
to be used, as follows:
If the URL supplies a user name but no password, and the remote
server requests a password, the program interpreting the FTP URL
should request one from the user.
<cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>
Where <cwd1> through <cwdN> and <name> are (possibly encoded) strings
and <typecode> is one of the characters "a", "i", or "d". The part
";type=<typecode>" may be omitted. The <cwdx> and <name> parts may be
empty. The whole url-path may be omitted, including the "/"
delimiting it from the prefix containing user, password, host, and
port.
Within a name or CWD component, the characters "/" and ";" are
reserved and must be encoded. The components are decoded prior to
their use in the FTP protocol. In particular, if the appropriate FTP
sequence to access a particular file requires supplying a string
containing a "/" as an argument to a CWD or RETR command, it is
FTP URLs may also be used for other operations; for example, it is
possible to update a file on a remote file server, or infer
information about it from the directory listings. The mechanism for
doing so is not spelled out here.
3.2.4 Hierarchy
For some file systems, the "/" used to denote the hierarchical
structure of the URL corresponds to the delimiter used to construct a
file name hierarchy, and thus, the filename will look similar to the
URL path. This does NOT mean that the URL is a Unix filename.
3.2.5. Optimization
3.3. HTTP
http://<host>:<port>/<path>?<searchpart>
Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
3.4. GOPHER
The base Gopher protocol is described in RFC 1436 and supports items
and collections of items (directories). The Gopher+ protocol is a set
of upward compatible extensions to the base Gopher protocol and is
described in [2]. Gopher+ supports associating arbitrary sets of
attributes and alternate data representations with Gopher items.
Gopher URLs accommodate both Gopher and Gopher+ items and item
attributes.
gopher://<host>:<port>/<gopher-path>
<gophertype><selector>
<gophertype><selector>%09<search>
<gophertype><selector>%09<search>%09<gopher+_string>
Note that some Gopher <selector> strings begin with a copy of the
<gophertype> character, in which case that character will occur twice
consecutively. The Gopher selector string may be an empty string;
this is how Gopher clients refer to the top-level directory on a
Gopher server.
URLs for Gopher+ items have a second encoded tab (%09) and a Gopher+
string. Note that in this case, the %09<search> string must be
supplied, although the <search> element may be the empty string.
Gopher+ items which have a +ASK associated with them (i.e. Gopher+
items tagged with a "?") require the client to fetch the item’s +ASK
attribute to get the form definition, and then ask the user to fill
out the form and return the user’s responses along with the selector
string to retrieve the item. Gopher+ clients know how to do this but
depend on the "?" tag in the Gopher+ item description to know when to
handle this case. The "?" is used in the Gopher+ string to be
consistent with Gopher+ protocol’s use of this symbol.
+<view_name>%20<language_name>
+%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A
<a_gopher_selector><tab>+<tab>1<cr><lf>
+-1<cr><lf>
<ask_item1_value><cr><lf>
<ask_item2_value><cr><lf>
.<cr><lf>
3.5. MAILTO
mailto:<rfc822-addr-spec>
Note that the percent sign ("%") is commonly used within RFC 822
addresses and must be encoded.
Unlike many URLs, the mailto scheme does not represent a data object
to be accessed directly; there is no sense in which it designates an
object. It has a different use than the message/external-body type in
MIME.
3.6. NEWS
news:<newsgroup-name>
news:<message-id>
The news URLs are unusual in that by themselves, they do not contain
sufficient information to locate a single resource, but, rather, are
location-independent.
3.7. NNTP
nntp://<host>:<port>/<newsgroup-name>/<article-number>
Note that while nntp: URLs specify a unique location for the article
resource, most NNTP servers currently on the Internet today are
configured only to allow access from local clients, and thus nntp
URLs do not designate globally accessible resources. Thus, the news:
form of URL is preferred as a way of identifying news articles.
3.8. TELNET
telnet://<user>:<password>@<host>:<port>/
This URL does not designate a data object, but rather an interactive
service. Remote interactive services vary widely in the means by
which they allow remote logins; in practice, the <user> and
<password> supplied are advisory only: clients accessing a telnet URL
merely advise the user of the suggested username and password.
3.9. WAIS
wais://<host>:<port>/<database>
wais://<host>:<port>/<database>?<search>
wais://<host>:<port>/<database>/<wtype>/<wpath>
3.10 FILES
file://<host>/<path>
DISK$USER:[MY.NOTES]NOTE123456.TXT
might become
<URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>
3.11 PROSPERO
prospero://<host>:<port>/<hsoname>;<field>=<value>
allowed.
Note that a slash "/" may appear in the <hsoname> without quoting and
no significance may be assumed by the application. Though slashes
may indicate hierarchical structure on the server, such structure is
not guaranteed. Note that many <hsoname>s begin with a slash, in
which case the host or port will be followed by a double slash: the
slash from the URL syntax, followed by the initial slash from the
<hsoname>. (E.g., <URL:prospero://host.dom//pros/name> designates a
<hsoname> of "/pros/name".)
URL schemes must have demonstrable utility and operability. One way
to provide such a demonstration is via a gateway which provides
objects in the new scheme for clients using an existing protocol. If
the new scheme does not locate resources that are data objects, the
properties of names in the new space must be clearly defined.
The following scheme have been proposed at various times, but this
document does not define their syntax or use at this time. It is
suggested that IANA reserve their scheme names for future definition:
; FILE
; HTTP
; TELNET
; PROSPERO
; Miscellaneous definitions
6. Security Considerations
The URL scheme does not in itself pose a security threat. Users
should beware that there is no general guarantee that a URL which at
one time points to a given object continues to do so, and does not
even at some later time point to a different object due to the
movement of objects on servers.
7. Acknowledgements
This paper builds on the basic WWW design (RFC 1630) and much
discussion of these issues by many people on the network. The
discussion was particularly stimulated by articles by Clifford Lynch,
Brewster Kahle [10] and Wengyik Yeong [18]. Contributions from John
Curran, Clifford Neuman, Ed Vielmetti and later the IETF URL BOF and
URI working group were incorporated.
In addition, there are many occasions when URLs are included in other
kinds of text; examples include electronic mail, USENET news
messages, or printed on paper. In such cases, it is convenient to
have a separate syntactic wrapper that delimits the URL and separates
it from the rest of the text, and in particular from punctuation
marks that might be mistaken for part of the URL. For this purpose,
is recommended that angle brackets ("<" and ">"), along with the
prefix "URL:", be used to delimit the boundaries of the URL. This
wrapper does not form part of the URL and should not be used in
contexts in which delimiters are already specified.
Examples:
References
[7] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R.,
Sui, J., and M. Grinbaum, "WAIS Interface Protocol Prototype
Functional Specification", (v1.5), Thinking Machines
Corporation, April 1990.
<URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>
[17] St. Pierre, M, Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,
Kunze, J., Morris, H., and F. Schiettecatte, "WAIS over
Z39.50-1988", RFC 1625, WAIS, Inc., CNIDR, Thinking Machines
Corp., UC Berkeley, FS Consulting, June 1994.
<URL:ftp://ds.internic.net/rfc/rfc1625.txt>
Editors’ Addresses
Tim Berners-Lee
World-Wide Web project
CERN,
1211 Geneva 23,
Switzerland
Larry Masinter
Xerox PARC
3333 Coyote Hill Road
Palo Alto, CA 94034
Mark McCahill
Computer and Information Services,
University of Minnesota
Room 152 Shepherd Labs
100 Union Street SE
Minneapolis, MN 55455