XML: Parsing and Writing
(require xml) |
The xml library provides functions for parsing and generating XML. XML can be represented as an instance of the document structure type, or as a kind of S-expression that is called an X-expression.
The xml library does not provide Document Type Declaration (DTD) processing, including preservation of DTDs in read documents, or validation. It also does not expand user-defined entities or read user-defined entities in attributes. It does not interpret namespaces either.
1 Datatypes
(struct location (line char offset)) |
line : exact-nonnegative-integer? |
char : exact-nonnegative-integer? |
offset : exact-nonnegative-integer? |
(struct source (start stop)) |
start : location/c |
stop : location/c |
When XML is generated from an input stream by read-xml, locations are represented by location instances. When XML structures are generated by xexpr->xml, then locations are symbols.
| ||
| ||
|
(struct document-type (name external inlined)) |
name : symbol? |
external : external-dtd? |
inlined : false/c |
(struct prolog (misc dtd misc2)) |
misc : (listof misc/c) |
dtd : (or/c document-type false/c) |
misc2 : (listof misc/c) |
(struct (element source) (name attributes content)) |
name : symbol? |
attributes : (listof attribute?) |
content : (listof content/c) |
(struct (entity source) (text)) |
text : (or/c symbol? exact-nonnegative-integer?) |
The string field is assumed to be of the form <![CDATA[‹content›]]> with proper quoting of ‹content›. Otherwise, write-xml generates incorrect output.
(struct (exn:xml exn:fail:read) ()) |
The following grammar describes expressions that create X-expressions:
xexpr | = | string | ||
| | (list symbol (list (list symbol string) ...) xexpr ...) | |||
| | (cons symbol (list xexpr ...)) | |||
| | symbol | |||
| | exact-nonnegative-integer | |||
| | cdata | |||
| | misc |
A string is literal data. When converted to an XML stream, the characters of the data will be escaped as necessary.
A pair represents an element, optionally with attributes. Each attribute’s name is represented by a symbol, and its value is represented by a string.
A symbol represents a symbolic entity. For example, 'nbsp represents .
An exact-nonnegative-integer represents a numeric entity. For example, #x20 represents .
A cdata is an instance of the cdata structure type, and a misc is an instance of the comment or p-i structure types.
2 Reading and Writing XML
(read-xml [in]) → document? |
in : input-port? = (current-input-port) |
Malformed xml is reported with source locations in the form ‹l›.‹c›/‹o›, where ‹l›, ‹c›, and ‹o› are the line number, column number, and next port position, respectively as returned by port-next-location.
Any non-characters other than eof read from the input-port appear in the document content. Such special values may appear only where XML content may. See make-input-port for information about creating ports that return non-character values.
Example: | |||
| |||
(doc () (bold () "hi") " there!") |
(read-xml/element [in]) → element? |
in : input-port? = (current-input-port) |
(syntax:read-xml [in]) → syntax? |
in : input-port? = (current-input-port) |
(syntax:read-xml/element [in]) → syntax? |
in : input-port? = (current-input-port) |
(write-xml doc [out]) → void? |
doc : document? |
out : output-port? = (current-output-port) |
(write-xml/content content [out]) → void? |
content : content/c |
out : output-port? = (current-output-port) |
(display-xml doc [out]) → void? |
doc : document? |
out : output-port? = (current-output-port) |
(display-xml/content content [out]) → void? |
content : content/c |
out : output-port? = (current-output-port) |
3 XML and X-expression Conversions
(permissive-xexprs) → boolean? |
(permissive-xexprs v) → void? |
v : any/c |
(xml->xexpr content) → xexpr/c |
content : content/c |
(xexpr->xml xexpr) → content/c |
xexpr : xexpr/c |
(xexpr->string xexpr) → string? |
xexpr : xexpr/c |
(string->xexpr str) → xexpr/c |
str : string? |
((eliminate-whitespace tags choose) elem) → element? |
tags : (listof symbol?) |
choose : (boolean? . -> . boolean?) |
elem : element? |
(validate-xexpr v) → (one-of/c #t) |
v : any/c |
(correct-xexpr? v success-k fail-k) → any/c |
v : any/c |
success-k : (-> any/c) |
fail-k : (exn:invalid-xexpr? . -> . any/c) |
4 Parameters
(empty-tag-shorthand) |
→ (or/c (one-of/c 'always 'never) (listof symbol?)) |
(empty-tag-shorthand shorthand) → void? |
shorthand : (or/c (one-of/c 'always 'never) (listof symbol?)) |
When the parameter is set to 'always, the abbreviated notation is always used. When set of 'never, the abbreviated notation is never generated. when set to a list of symbols is provided, tags with names in the list are abbreviated. The default is 'always.
The abbreviated form is the preferred XML notation. However, most browsers designed for HTML will only properly render XHTML if the document uses a mixture of the two formats. The html-empty-tags constant contains the W3 consortium’s recommended list of XHTML tags that should use the shorthand.
Example: | ||||
| ||||
<html><body bgcolor="red">Hi!<br />Bye!</body></html> |
(collapse-whitespace) → boolean? |
(collapse-whitespace collapse?) → void? |
collapse? : any/c |
(read-comments) → boolean? |
(read-comments preserve?) → void? |
preserve? : any/c |
(xexpr-drop-empty-attributes) → boolean? |
(xexpr-drop-empty-attributes drop?) → void? |
drop? : any/c |
5 PList Library
(require xml/plist) |
The xml/plist library provides the ability to read and write XML documents that conform to the plist DTD, which is used to store dictionaries of string–value associations. This format is used by Mac OS X (both the operating system and its applications) to store all kinds of data.
A plist dictionary is a value that could be created by an expression matching the following dict-expr grammar:
dict-expr | = | (list 'dict assoc-pair ...) | ||
assoc-pair | = | (list 'assoc-pair string pl-value) | ||
pl-value | = | string | ||
| | (list 'true) | |||
| | (list 'false) | |||
| | (list 'integer integer) | |||
| | (list 'real real) | |||
| | dict-expr | |||
| | (list 'array pl-value ...) |
(plist-dict? any/c) → boolean? |
any/c : v |
(read-plist in) → plist-dict? |
in : input-port? |
(write-plist dict out) → void? |
dict : plist-dict? |
out : output-port? |
Examples: | ||||||||||||||||
| ||||||||||||||||
> (define-values (in out) (make-pipe)) | ||||||||||||||||
> (write-plist my-dict out) | ||||||||||||||||
> (close-output-port out) | ||||||||||||||||
> (define new-dict (read-plist in)) | ||||||||||||||||
> (equal? my-dict new-dict) | ||||||||||||||||
#t |
The XML generated by write-plist in the above example looks like the following, if re-formatted by:
<?xml version="1.0" encoding="UTF-8"?> |
<!DOCTYPE plist SYSTEM |
"file://localhost/System/Library/DTDs/PropertyList.dtd"> |
<plist version="0.9"> |
<dict> |
<key>first-key</key> |
<string>just a string with some whitespace</string> |
<key>second-key</key> |
<false /> |
<key>third-key</key> |
<dict /> |
<key>fourth-key</key> |
<dict> |
<key>inner-key</key> |
<real>3.432</real> |
</dict> |
<key>fifth-key</key> |
<array> |
<integer>14</integer> |
<string>another string</string> |
<true /> |
</array> |
<key>sixth-key</key> |
<array /> |
</dict> |
</plist> |