Platform-Specific Path Conventions
20.1 Unix and Mac OS X Path Conventions
In Unix and Mac OS X paths, a forward slash (``/'') seperates elements of the path, a period (``.'') as a path element always means the directory indicated by preceding path, and two periods (``..'') as a path element always means the parent of the directory indicated by the preceding path. A path that starts with a tilde (``~'') indicates a user's home directory; the username follows the tilde (before a slash or the end of the path), where a tilde by itself indicates the home directory of the current user. No other character or byte has a special meaning within a path. Multiple adjacent slashes (``/'') are equivalent to a single slash (i.e., they act as a single path separator).
A path root is either / or a home-directory specification starting with tilde (``~''). A relative path whose first element starts with a tilde (``~'') is encoded by prefixing the path with period-slash (``./'').
Any pathname that ends with a slash (``/'') syntactically refers to a directory, as does any path whose last element is a single period (``.'') or double period (``..''), or any path that contains only a root.
A Unix and Mac OS X path is expanded by replacing a home-directory specification (starting ~) with an absolute path, and by replacing multiple adjacent slashes with a single slash.
For (bytes->path-element
, bytes
)bytes
can start
with a tilde (``~''), and it is encoded as a literal part of
the path element using a period-slash (``./'') prefix. The
bytes
argument must not contain a slash (``/''), otherwise the
exn:fail:contract
exception is raised.
For (path-element->bytes
or
path
)(path-element->string
, if the bytes form of
path
)path
starts with a period, slash,
and tilde (``./~''), the period-slash (``./'') prefix is not
included in the result.
For (build-path
,
when a base-path
sub-path
···)sub-path
starts with a period, slash,
and tilde (``./~''), the period and slash are removed before
adding the path. This conversion is performed because an initial
sequence period-slash-tilde (``./~'') is the canonical way
of representing relative paths whose first element's name starts with
a tilde.
For (
, if simplify-path
path use-filesystem?)path
starts
period-slash-tilde (``./~''), the leading period is the only
indicator, and there are no redundant slashes, then path
is
returned.
For (
producing split-path
path)base
,
name
, and must-be-dir?
, the result name
can start
with period-slash-tilde (``./~'') if the result would
otherwise start with tilde (``~'') and it is not the start of
path
. Furthermore, if path
starts with
period-slashes-tilde (``./~'', with any non-zero number of
``/''), then the period and slash are kept with the following element
(i.e., they are not split separately).
Under Mac OS X, Finder aliases are zero-length files.
20.2 Windows Path Conventions
In general, a Windows pathname consists of an optional drive specifier and a drive-specific path. As noted in section 11.3, a Windows path can be absolute but still relative to the current drive; such paths start with a forward slash or backslash separator and are not UNC paths or paths that start with \\?\.
A path that starts with a drive specification is complete. Roughly, a drive specification is either a Roman letter followed by a colon, a UNC path of the form \\machine\volume, or a \\?\ form followed by something other than REL\element or RED\element. (Variants of \\?\ paths are described further below.)
MzScheme fails to implement the usual Windows path syntax in one
way. Outside of MzScheme, a pathname C:rant.txt can be a
drive-specific relative path. That is, it names a
file rant.txt on drive C:, but the complete path to the
file is determined by the current working directory for
drive C:. MzScheme does not support drive-specific working
directories (only a working directory across all drives, as reflected
by the current-directory
parameter;
see section 7.9.1.1). Consequently, MzScheme implicitly
converts a path like C:rant.txt
into C:\rant.txt.
MzScheme-specific: Whenever a path starts with a drive specifier letter: that is not followed by a forward slash or backslash, a backslash is inserted as the path is expanded.
Otherwise, MzScheme follows standard Windows path conventions, but also adds \\?\REL and \\?\RED conventions to deal with paths inexpressible in the standard conventsion, plus conventions to deal with excessive backslashes in \\?\ paths.
In the following, letter stands for a Roman letter (case does not matter), machine stands for any sequence of characters that does not include backslashes or forward slashes and is not ?, volume stands for any sequence of characters that does not include backslashes or forward slashes, and element stands for any sequence of characters that does not include backslashes.
Trailing spaces and periods in a path element are ignored when the element is the last one in the path, unless the path starts with \\?\ or the element consists of only spaces and periods.
The following special ``files'', which access devices, exist in all directories, case-insensitively, and with all possible endings after a period or colon, except in pathnames that start with \\?\: NUL, CON, PRN, AUX, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9.
Except for \\?\ paths, forward slashes are equivalent to backslashes. Except for \\?\ paths and the start of UNC paths, multiple adjacent slashes and backslashes count as a single backslash. In a path that starts \\?\ paths, elements can be separated by either a single or double backslash.
A directory can be accessed with or without a trailing separator. In the case of a non-\\?\ path, the trailing separator can be any number of forward slashes and backslashes; in the case of a \\?\ path, a trailing separator must be a single backslash, except that two backslashes can follow \\?\letter:.
Except for \\?\ paths, a single period (.) as a path element means ``the current directory'', and a double period (..) as a path element means ``the parent directory.'' Up-directory path elements (i.e., ..) immediately after a drive are ignored.
A pathname that starts \\machine\volume (where a forward slash can replace any backslash) is a UNC path, and the starting \\machine\volume counts as the drive specifier.
Normally, a path element cannot contain any of the following characters:
< > : " / \ |
Except for backslash, path elements containing these characters can be accessed using a \\?\ path (assuming that the underlying filesystem allows the characters).
In a pathname that starts \\?\letter:\, the \\?\letter:\ prefix counts as the path's drive, as long as the path does not both contain non-drive elements and end with two consecutive backslashes, and as long as the path contains no sequence of three or more backslashes. Two backslashes can appear in place of the backslash before letter. Forward slashes cannot be used in place of backslashes (but forward slashes can be used in element names, though the result generally does not name an actual directory or file).
In a pathname that starts \\?\UNC\machine\volume, the \\?\UNC\machine\volume prefix counts as the path's drive, as long as the path does not end with two consecutive backslashes, and as long as the path contains no sequence of three or more backslashes. Two backslashes can appear in place of the backslash before UNC, the backslash after UNC, and/or the backslash after machine. The letters in the UNC part can be uppercase or lowercase, and forward slashes cannot be used in place of backslashes (but forward slashes can be used in element names).
MzScheme-specific: A pathname that starts \\?\REL\element or \\?\REL\\element is a relative path, as long as the path does not end with two consecutive backslashes, and as long as the path contains no sequence of three or more backslashes. This MzScheme-specific path form supports relative paths with elements that are not normally expressible in Windows paths (e.g., a final element that ends in a space). The REL part must be exactly the three uppercase letters, and forward slashes cannot be used in place of backslashes. If the path starts \\?\REL\.. then for as long as the path continues with reptitions of \.., each element counts as an up-directory element; a single backslash must be used to seperate the up-directory elements. As soon as a second backslash is used to separate the elements, or as soon as a non-.. element is encountered, the remaining elements are all literals (never up-directory elements). When a \\?\REL path value is converted to a string (or when the path value is written or displayed), the string does not contain the starting \\?\REL or the immediately following backslashes; converting a path value to a byte string preserves the \\?\REL prefix.
MzScheme-specific: A pathname that starts \\?\RED\element or \\?\RED\\element is a drive-relative path, as long as the path does not end with two consecutive backslashes, and as long as the path contains no sequence of three or more backslashes. This MzScheme-specific path form supports drive-relative paths (i.e., absolute given a drive) with elements that are not normally expressible in Windows paths. The RED part must be exactly the three uppercase letters, and forward slashes cannot be used in place of backslashes. Unlike \\?\REL paths, a .. element is always a literal path element. When a \\?\RED path value is converted to a string (or when the path value is written or displayed), the string does not contain the starting \\?\RED and it contains a single starting backslash; converting a path value to a byte string preserves the \\?\RED prefix.
Three additional MzScheme-specific rules provide meanings to character sequences that are otherwise ill-formed as Windows paths:
MzScheme-specific: In a pathname of the form \\?\any\\ where any is any non-empty sequence of characters other than letter: or \letter:, the entire path counts as the path's (non-existent) drive.
MzScheme-specific: In a pathname of the form \\?\any\\\elements, where any is any non-empty sequence of characters and elements is any sequence that does not start with a backslash, does not end with two backslashes, and does not contain a sequence of three backslashes, then \\?\any\\ counts as the path's (non-existent) drive.
MzScheme-specific: In a pathname that starts \\?\ and does not match any of the patterns from the preceding bullets, \\?\ counts as the path's (non-existent) drive.
Outside of MzScheme, except for \\?\ paths, pathnames are typically limited to 259 characters. MzScheme internally converts pathnames to \\?\ form as needed to avoid this limit. The operating system cannot access files through \\?\ paths that are longer than 32,000 characters or so.
Where the above descriptions says ``character,'' substitute ``byte'' for interpreting byte strings as paths. The encoding of Windows paths into bytes preserves ASCII characters, and all special characters mentioned above are ASCII, so all of the rules are the same.
Beware that the backslash path separator is an escape character in
MzScheme strings. Thus, the path
\\?\REL\..\\..
as a string must be written "\\\\?\\REL\\..\\\\.."
.
A path that ends with a directory separator syntactically refers to a directory. In addition, a path syntactcially refers to a directory if its last element is a same-directory or up-directory indicator (not quoted by a \\?\ form), or if it refers to a root.
Windows paths are expanded as follows: In paths that start \\?\, redundant backslashes are removed, an extra backslash is added in a \\?\REL if an extra one is not already present to separate up-directory indicators from literal path elements, and an extra backslash is similarly added after \\?\RED if an extra one is not already present. When \\?\ acts as the root and the path contains, to additional slashes (which might otherwise be redundant) are included after the root. For other paths, multiple slashes are converted to single slashes (except at the beginning of a shared folder name), a slash is inserted after the colon in a drive specification if it is missing.
For (bytes->path-element
, forward slashes,
colons, trailing dots, trailing whitespace, and special device names
(e.g., ``aux'') in bytes
)bytes
are encoded as a literal part of the
path element by using a
\\?\REL prefix.
The bytes
argument must not contain a backslash (``\''),
otherwise the exn:fail:contract
exception is raised.
For (path-element->bytes
or
path
)(path-element->string
, if the byte-string form
of path
)path
starts with a \\?\REL,
the prefix is not included in the result.
For (build-path
,
trailing spaces and periods are removed from the last
element of base-path
sub-path
···)base-path
and all but the last sub-path
(unless the element consists of only spaces and peroids), except for
those that start
with \\?\. If base-path
starts \\?\, then after each
non-\\?\REL\
and non-\\?\RED\ sub-path
is added, all slashes in the addition are converted to backslashes,
multiple consecutive backslashes are converted to a single backslash,
added . elements are removed, and added .. elements are
removed along with the preceding element; these conversions are not
performed on the original base-path
part of the result or on
any \\?\REL\ or
\\?\RED\ or sub-path
.
If a \\?\REL\
or \\?\RED\ sub-path
is added to a non-\\?\ base-path
, the
the base-path
(with any additions up to the
\\?\REL\ or
\\?\RED\ sub-path
) is
simplified and converted to a \\?\ path.
In other cases, a backslash may be added or removed
before combining paths to avoid changing the root meaning of the path
(e.g., combining //x and y produces /x/y, because
//x/y would be a UNC path instead of a drive-relative path).
For (
, simplify-path
path use-filesystem?)path
is
expanded, and if path
does not start with
\\?\, trailing spaces and
periods are removed, a slash is inserted after the colon in a drive
specification if it is missing, and a backslash is inserted after
\\?\ as a root if there are
elements and no extra backslash already. Otherwise, if no
indicators or redundant separators are in path
, then path
is returned.
For (
producing split-path
path)base
,
name
, and must-be-dir?
, splitting a path that does not start
with \\?\ can produce parts
that start with \\?\. For
example, splitting C:/x /aux/ produces
\\?\C:\x \
and
\\?\REL\\aux;
the \\?\ is needed in these
cases to preserve a trailing space after x and to avoid
referring to the AUX device instead of an aux file.