Version: 4.2.1
HTML: Parsing Library
The html library provides
functions to read html documents and structures to represent them.
| ||
|
Reads (X)HTML from a port, producing an html instance.
(read-html-as-xml port) → (listof content/c) |
port : input-port? |
Reads HTML from a port, producing an X-expression compatible with the
xml library (which defines content/c).
(read-html-comments) → boolean? |
(read-html-comments v) → void? |
v : any/c |
If v is not #f, then comments are read and returned. Defaults to #f.
(use-html-spec) → boolean? |
(use-html-spec v) → void? |
v : any/c |
If v is not #f, then the HTML must respect the HTML specification
with regards to what elements are allowed to be the children of
other elements. For example, the top-level "<html>"
element may only contain a "<body>" and "<head>"
element. Defaults to #f.
1 Example
| |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
|
2 HTML Structures
pcdata, entity, and attribute are defined in the xml documentation.
A html-content is either
(struct html-element (attributes)) |
attributes : (listof attribute) |
Any of the structures below inherits from html-element.
(struct (html-full struct:html-element) (content)) |
content : (listof html-content) |
Any html tag that may include content also inherits from
html-full without adding any additional fields.
A Contents-of-html is either
(struct (blockquote html-full) ()) |
(struct (basefont html-element) ()) |
(struct (br html-element) ()) |
(struct (area html-element) ()) |
(struct (alink html-element) ()) |
(struct (img html-element) ()) |
(struct (param html-element) ()) |
(struct (hr html-element) ()) |
(struct (input html-element) ()) |
(struct (col html-element) ()) |
(struct (isindex html-element) ()) |
(struct (base html-element) ()) |
(struct (meta html-element) ()) |
A Contents-of-head is either
A Contents-of-tr is either
A Contents-of-table is either
A Contents-of-fieldset is either
G2
A Contents-of-select is either
A Contents-of-dl is either
A Contents-of-pre is either
G9
G11
A Contents-of-object-applet is either
G2
A Contents-of-map is either
A Contents-of-a is either
G7
A Contents-of-address is either
G5
A Contents-of-body is either
A G12 is either
A G11 is either
A G10 is either
A G9 is either
A G8 is either
A G7 is either
G8
G12
A G6 is either
G7
A G5 is either
G6
A G4 is either
G8
G10
A G3 is either
A G2 is either
G3