Platform-Specific Path Conventions

20.1  Unix and Mac OS X Path Conventions

In Unix and Mac OS X paths, a forward slash (``/'') seperates elements of the path, a period (``.'') as a path element always means the directory indicated by preceding path, and two periods (``..'') as a path element always means the parent of the directory indicated by the preceding path. A path that starts with a tilde (``~'') indicates a user's home directory; the username follows the tilde (before a slash or the end of the path), where a tilde by itself indicates the home directory of the current user. No other character or byte has a special meaning within a path. Multiple adjacent slashes (``/'') are equivalent to a single slash (i.e., they act as a single path separator).

A path root is either / or a home-directory specification starting with tilde (``~''). A relative path whose first element starts with a tilde (``~'') is encoded by prefixing the path with period-slash (``./'').

Any pathname that ends with a slash (``/'') syntactically refers to a directory, as does any path whose last element is a single period (``.'') or double period (``..''), or any path that contains only a root.

A Unix and Mac OS X path is expanded by replacing a home-directory specification (starting ~) with an absolute path, and by replacing multiple adjacent slashes with a single slash.

For (bytes->path-element bytes), bytes can start with a tilde (``~''), and it is encoded as a literal part of the path element using a period-slash (``./'') prefix. The bytes argument must not contain a slash (``/''), otherwise the exn:fail:contract exception is raised.

For (path-element->bytes path) or (path-element->string path), if the bytes form of path starts with a period, slash, and tilde (``./~''), the period-slash (``./'') prefix is not included in the result.

For (build-path base-path sub-path ···), when a sub-path starts with a period, slash, and tilde (``./~''), the period and slash are removed before adding the path. This conversion is performed because an initial sequence period-slash-tilde (``./~'') is the canonical way of representing relative paths whose first element's name starts with a tilde.

For (simplify-path path use-filesystem?), if path starts period-slash-tilde (``./~''), the leading period is the only indicator, and there are no redundant slashes, then path is returned.

For (split-path path) producing base, name, and must-be-dir?, the result name can start with period-slash-tilde (``./~'') if the result would otherwise start with tilde (``~'') and it is not the start of path. Furthermore, if path starts with period-slashes-tilde (``./~'', with any non-zero number of ``/''), then the period and slash are kept with the following element (i.e., they are not split separately).

Under Mac OS X, Finder aliases are zero-length files.

20.2  Windows Path Conventions

In general, a Windows pathname consists of an optional drive specifier and a drive-specific path. As noted in section 11.3, a Windows path can be absolute but still relative to the current drive; such paths start with a forward slash or backslash separator and are not UNC paths or paths that start with \\?\.

A path that starts with a drive specification is complete. Roughly, a drive specification is either a Roman letter followed by a colon, a UNC path of the form \\machine\volume, or a \\?\ form followed by something other than REL\element or RED\element. (Variants of \\?\ paths are described further below.)

MzScheme fails to implement the usual Windows path syntax in one way. Outside of MzScheme, a pathname C:rant.txt can be a drive-specific relative path. That is, it names a file rant.txt on drive C:, but the complete path to the file is determined by the current working directory for drive C:. MzScheme does not support drive-specific working directories (only a working directory across all drives, as reflected by the current-directory parameter; see section 7.9.1.1). Consequently, MzScheme implicitly converts a path like C:rant.txt into C:\rant.txt.

Otherwise, MzScheme follows standard Windows path conventions, but also adds \\?\REL and \\?\RED conventions to deal with paths inexpressible in the standard conventsion, plus conventions to deal with excessive backslashes in \\?\ paths.

In the following, letter stands for a Roman letter (case does not matter), machine stands for any sequence of characters that does not include backslashes or forward slashes and is not ?, volume stands for any sequence of characters that does not include backslashes or forward slashes, and element stands for any sequence of characters that does not include backslashes.

Three additional MzScheme-specific rules provide meanings to character sequences that are otherwise ill-formed as Windows paths:

Outside of MzScheme, except for \\?\ paths, pathnames are typically limited to 259 characters. MzScheme internally converts pathnames to \\?\ form as needed to avoid this limit. The operating system cannot access files through \\?\ paths that are longer than 32,000 characters or so.

Where the above descriptions says ``character,'' substitute ``byte'' for interpreting byte strings as paths. The encoding of Windows paths into bytes preserves ASCII characters, and all special characters mentioned above are ASCII, so all of the rules are the same.

Beware that the backslash path separator is an escape character in MzScheme strings. Thus, the path \\?\REL\..\\.. as a string must be written "\\\\?\\REL\\..\\\\..".

A path that ends with a directory separator syntactically refers to a directory. In addition, a path syntactcially refers to a directory if its last element is a same-directory or up-directory indicator (not quoted by a \\?\ form), or if it refers to a root.

Windows paths are expanded as follows: In paths that start \\?\, redundant backslashes are removed, an extra backslash is added in a \\?\REL if an extra one is not already present to separate up-directory indicators from literal path elements, and an extra backslash is similarly added after \\?\RED if an extra one is not already present. When \\?\ acts as the root and the path contains, to additional slashes (which might otherwise be redundant) are included after the root. For other paths, multiple slashes are converted to single slashes (except at the beginning of a shared folder name), a slash is inserted after the colon in a drive specification if it is missing.

For (bytes->path-element bytes), forward slashes, colons, trailing dots, trailing whitespace, and special device names (e.g., ``aux'') in bytes are encoded as a literal part of the path element by using a \\?\REL prefix. The bytes argument must not contain a backslash (``\''), otherwise the exn:fail:contract exception is raised.

For (path-element->bytes path) or (path-element->string path), if the byte-string form of path starts with a \\?\REL, the prefix is not included in the result.

For (build-path base-path sub-path ···), trailing spaces and periods are removed from the last element of base-path and all but the last sub-path (unless the element consists of only spaces and peroids), except for those that start with \\?\. If base-path starts \\?\, then after each non-\\?\REL\ and non-\\?\RED\ sub-path is added, all slashes in the addition are converted to backslashes, multiple consecutive backslashes are converted to a single backslash, added . elements are removed, and added .. elements are removed along with the preceding element; these conversions are not performed on the original base-path part of the result or on any \\?\REL\ or \\?\RED\ or sub-path. If a \\?\REL\ or \\?\RED\ sub-path is added to a non-\\?\ base-path, the the base-path (with any additions up to the \\?\REL\ or \\?\RED\ sub-path) is simplified and converted to a \\?\ path. In other cases, a backslash may be added or removed before combining paths to avoid changing the root meaning of the path (e.g., combining //x and y produces /x/y, because //x/y would be a UNC path instead of a drive-relative path).

For (simplify-path path use-filesystem?), path is expanded, and if path does not start with \\?\, trailing spaces and periods are removed, a slash is inserted after the colon in a drive specification if it is missing, and a backslash is inserted after \\?\ as a root if there are elements and no extra backslash already. Otherwise, if no indicators or redundant separators are in path, then path is returned.

For (split-path path) producing base, name, and must-be-dir?, splitting a path that does not start with \\?\ can produce parts that start with \\?\. For example, splitting C:/x /aux/ produces \\?\C:\\ and \\?\REL\\aux; the \\?\ is needed in these cases to preserve a trailing space after x and to avoid referring to the AUX device instead of an aux file.