8 Parsing and classifying syntax
The syntax/parse library provides a framework for describing and parsing syntax. Using syntax/parse, macro writers can define new syntactic categories, specify their legal syntax, and use them to write clear, concise, and robust macros. The library also provides a pattern-matching form, syntax-parse, which offers many improvements over syntax-case.
(require syntax/parse) |
8.1 Parsing syntax
This section describes the syntax-parse pattern matching form, syntax patterns, and attributes.
(syntax-parse stx-expr parse-option clause ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If the syntax object fails to match any of the patterns (or all matches fail the corresponding clauses’ side conditions), a syntax error is raised.
The #:literals option specifies identifiers that should match as literals, rather than simply being pattern variables. A literal in the literals list has two components: the identifier used within the pattern to signify the positions to be matched (pattern-id), and the identifier expected to occur in those positions (literal-id). If the single-identifier form is used, the same identifier is used for both purposes.
Note: Unlike syntax-case, syntax-parse requires all literals to have a binding. To match identifiers by their symbolic names, consider using the atom-in-list syntax class instead.
Many literals can be declared at once via one or more literal sets, imported with the #:literal-sets option. The literal-set definition determines the literal identifiers to recognize and the names used in the patterns to recognize those literals.
The #:conventions option imports conventions that give default syntax classes to pattern variables that do not explicitly specify a syntax class.
(syntax-parser maybe-literals clause ) |
The grammar of syntax patterns accepted by syntax-parse and syntax-parser is given in the following table:
S-pattern | = | pvar-id | ||
| | pvar-id:syntax-class-id | |||
| | literal-id | |||
| | atomic-datum | |||
| | (H-pattern . S-pattern) | |||
| | ((~or EH-pattern ...+) ... . S-pattern) | |||
| | (EH-pattern ... . S-pattern) | |||
| | (~and S-pattern ...+) | |||
| | (~or S-pattern ...+) | |||
| | #(pattern-part ...) | |||
| | #s(prefab-struct-key pattern-part ...) | |||
| | (~rest S-pattern) | |||
| | (~describe expr S-pattern) | |||
| | (~! . S-pattern) | |||
| | (~bind [attr-id expr] ...) | |||
| | (~fail maybe-fail-condition message-expr) | |||
L-pattern | = | () | ||
| | (H-pattern . L-pattern) | |||
| | ((~or EH-pattern ...+) ... . L-pattern) | |||
| | (EH-pattern ... . L-pattern) | |||
| | (~rest L-pattern) | |||
| | (~! . L-pattern) | |||
H-pattern | = | (~or H-pattern ...+) | ||
| | (~seq . L-pattern) | |||
| | (~describe expr H-pattern) | |||
| | S-pattern | |||
EH-pattern | = | (~once H-pattern once-option ...) | ||
| | (~optional H-pattern optional-option ...) | |||
| | H-pattern |
There are three main kinds of syntax pattern: S-patterns (for “single patterns”), H-patterns (for “head patterns”), and EH-patterns (for “ellipsis head patterns”). A fourth kind, L-patterns (for “list patterns”), is a restricted subset of S-patterns. When a special form in this manual refers to syntax-pattern (eg, the description of the syntax-parse special form), it means specifically S-pattern.
8.1.1 S-pattern variants
An S-pattern (for “single pattern”) is a pattern that describes a single term. The pattern may, of course, consist of other parts. For example, (17 ...) is an S-pattern that matches any term that is a proper list of repeated 17 numerals. The L-patterns (for “list pattern”) are S-pattern having a restricted structure that constrains it to match only terms that are proper lists.
Here are the variants of S-pattern:
pvar-id If pvar-id has no syntax class (by #:declare or #:convention), the pattern matches anything. The pattern variable is bound to the matched subterm, unless the pattern variable is the wildcard (_), in which case no binding occurs.
If pvar-id does have an associated syntax class, it behaves like the following form.
pvar-id:syntax-class-id Matches only subterms specified by the syntax-class-id. The syntax class’s attributes are computed for the subterm and bound to the pattern variables formed by prefixing pvar-id. to the name of the attribute. pvar-id is bound to the matched subterm.
If pvar-id is _, no attributes are bound.
If pvar-id is empty (that is, if the pattern is of the form :syntax-class-id), then the syntax class’s attributes are bound, but their names are not prefixed first.
Examples:
> (syntax-parse #'x [var:id (syntax-e #'var)]) x
> (syntax-parse #'12 [var:id (syntax-e #'var)]) eval:2:0: ?: expected identifier at: 12 in: 12
> (syntax-parse #'(x y z) [var:id (syntax-e #'var)]) eval:3:0: x: expected identifier at: (x y z) in: (x y z)
literal-id An identifier that appears in the literals list is not a pattern variable; instead, it is a literal that matches any identifier free-identifier=? to it.
Specifically, if literal-id is the “pattern” name of an entry in the literals list, then it represents a pattern that matches only identifiers free-identifier=? to the “literal” name. These identifiers are often the same.
Examples:
> (syntax-parse #'(define x 12) #:literals (define) [(define var:id body:expr) 'ok]) ok
> (syntax-parse #'(lambda x 12) #:literals (define) [(define var:id body:expr) 'ok]) eval:5:0: lambda: expected the literal identifier define
at: lambda in: (lambda x 12)
> (syntax-parse #'(define x 12) #:literals ([def define]) [(def var:id body:expr) 'ok]) ok
> (syntax-parse #'(lambda x 12) #:literals ([def define]) [(def var:id body:expr) 'ok]) eval:7:0: lambda: expected the literal identifier define
at: lambda in: (lambda x 12)
atomic-datum Numbers, strings, booleans, keywords, and the empty list match as literals.
Examples:
> (syntax-parse #'(a #:foo bar) [(x #:foo y) (syntax->datum #'y)]) bar
> (syntax-parse #'(a foo bar) [(x #:foo y) (syntax->datum #'y)]) eval:9:0: a: expected the literal #:foo at: foo in: (a foo
bar)
(H-pattern . S-pattern) Matches any term that can be decomposed into a list prefix matching the H-pattern and a suffix matching the S-pattern.
Note that the pattern may match terms that are not even improper lists; if the head pattern can match a zero-length head, then the whole pattern matches whatever the tail pattern accepts.
The first pattern can be an S-pattern, in which case the whole pattern matches any pair whose first element matches the first pattern and whose rest matches the second.
See H-patterns for more information.
(EH-pattern ... . S-pattern) Matches any term that can be decomposed into a list head matching some number of repetitions of the EH-pattern alternatives (subject to its repetition constraints) followed by a list tail matching the S-pattern.
In other words, the whole pattern matches either the second pattern (which need not be a list) or a term whose head matches one of the alternatives of the first pattern and whose tail recursively matches the whole sequence pattern.
The ~or-free variant is shorthand for the ~or variant with just one alternative.
See EH-patterns for more information.
(~and S-pattern ) Matches any syntax that matches all of the included patterns.
Attributes bound in subpatterns are available to subsequent subpatterns. The whole pattern binds all of the subpatterns’ attributes.
One use for ~and-patterns is preserving a whole term (including its lexical context, source location, etc) while also examining its structure. Syntax classes are useful for the same purpose, but ~and can be lighter weight.
Example:
> (syntax-parse #'(m (import one two)) #:literals (import) [(_ (~and import-clause (import i ))) (let ([bad (check-imports (syntax->list #'(i )))]) (when bad (raise-syntax-error #f "bad import" #'import-clause bad)) 'ok)]) ?: unbound literal not allowed
(~or S-pattern ) Matches any term that matches one of the included patterns.
The whole pattern binds all of the subpatterns’ attributes. An attribute that is not bound by the “chosen” subpattern has a value of #f. The same attribute may be bound by multiple subpatterns, and if it is bound by all of the subpatterns, it is sure to have a value if the whole pattern matches.
Examples:
> (syntax-parse #'a [(~or x:id (~and x #f)) (syntax->datum #'x)]) a
> (syntax-parse #'#f [(~or x:id (~and x #f)) (syntax->datum #'x)]) #f
#(pattern-part ...) Matches a term that is a vector whose elements, when considered as a list, match the S-pattern corresponding to (pattern-part ).
Examples:
> (syntax-parse #'#(1 2 3) [#(x y z) (syntax->datum #'z)]) 3
> (syntax-parse #'#(1 2 3) [#(x y ...) (syntax->datum #'(y ))]) (2 3)
> (syntax-parse #'#(1 2 3) [#(x ~rest y) (syntax->datum #'y)]) (2 3)
#s(prefab-struct-key pattern-part ...) Matches a term that is a prefab struct whose key is exactly the given key and whose sequence of fields, when considered as a list, match the S-pattern corresponding to (pattern-part ).
Examples:
> (syntax-parse #'#s(point 1 2 3) [#s(point x y z) 'ok]) ok
> (syntax-parse #'#s(point 1 2 3) [#s(point x y ...) (syntax->datum #'(y ))]) (2 3)
> (syntax-parse #'#s(point 1 2 3) [#s(point x ~rest y) (syntax->datum #'y)]) (2 3)
(~rest S-pattern) Matches just like the inner S-pattern. The ~rest pattern form is useful in positions where improper lists (“dots”) are not allowed by the reader, such as vector and structure patterns (see above).
Examples:
> (syntax-parse #'(1 2 3) [(x ~rest y) (syntax->datum #'y)]) (2 3)
> (syntax-parse #'#(1 2 3) [#(x ~rest y) (syntax->datum #'y)]) (2 3)
(~describe expr S-pattern) The ~describe pattern form annotates a pattern with a description, a string expression that is evaluated in the scope of all prior attribute bindings. If parsing the inner pattern fails, then the description is used to synthesize the error message.
A describe-pattern also affects backtracking in two ways:
A cut-pattern (~!) within a describe-pattern only eliminates choice-points created within the describe-pattern.
If a describe-pattern succeeds, then all choice points created within the describe-pattern are discarded, and a failure after the describe-pattern backtracks to a choice point before the describe-pattern, never one within it.
(~! . S-pattern) The ~! operator, pronounced “cut”, eliminates backtracking choice points and commits parsing to the current branch of the pattern it is exploring.
Common opportunities for cut-patterns come from recognizing special forms based on keywords. Consider the following expression:
> (syntax-parse #'(define-values a 123) #:literals (define-values define-syntaxes) [(define-values (x:id ) e) 'define-values] [(define-syntaxes (x:id ) e) 'define-syntaxes] [e 'expression]) expression
Given the ill-formed term (define-values a 123), the expression tries the first clause, fails to match a against the pattern (x:id ), and then backtracks to the second clause and ultimately the third clause, producing the value 'expression. But the term is not an expression; it is an ill-formed use of define-values! The proper way to write the syntax-parse expression follows:
> (syntax-parse #'(define-values a 123) #:literals (define-values define-syntaxes) [(define-values ~! (x:id ) e) 'define-values] [(define-syntaxes ~! (x:id ) e) 'define-syntaxes] [e 'expression]) eval:23:0: define-values: expected sequence of terms or
expected the literal () at: a in: (define-values a 123)
Now, given the same term, syntax-parse tries the first clause, and since the keyword define-values matches, the cut-pattern commits to the current pattern, eliminating the choice points for the second and third clauses. So when the clause fails to match, the syntax-parse expression raises an error.
The effect of a ~! pattern is delimited by the nearest enclosing ~describe pattern. If there is no enclosing ~describe pattern but the cut occurs within a syntax class definition, then only choice points within the syntax class definition are discarded.
(~bind [attr-id expr] ) This pattern matches any term. Its effect is to evaluate the exprs and bind them to the given attr-ids as attributes.
(~fail maybe-fail-condition message-expr)
maybe-fail-condition =
| #:when condition-expr | #:unless condition-expr This pattern succeeds or fails independent of the term being matched against. If the condition is absent, or if the #:when condition evaluates to a true value, or if the #:unless condition evaluates to #f, then the pattern fails with the given message. Otherwise the pattern succeeds.
Fail patterns can be used together with cut patterns to recognize specific ill-formed terms and address them with specially-created failure messages.
8.1.2 H-pattern variants
An H-pattern (for “head pattern”) is a pattern that describes some number of terms that occur at the head of some list (possibly an improper list). An H-pattern’s usefulness comes from being able to match heads of different lengths. H-patterns are useful for specifying optional forms such as keyword arguments.
Here are the variants of H-pattern:
(~seq . L-pattern) Matches a head whose elements, if put in a list, would match the given L-pattern.
Example:
> (syntax-parse #'(1 2 3 4) [((~seq 1 2 3) 4) 'ok]) ok
(~or H-pattern ) Like the S-pattern version of ~or, but matches a term head instead.
Example:
> (syntax-parse #'(#:foo 2 a b c) [((~or (~seq #:foo x) (~seq)) y:id ) (attribute x)]) #<syntax:25:0>
(~describe expr H-pattern) Like the S-pattern version of ~describe, but matches a head pattern instead.
S-pattern Matches a head of one element, which must be a term matching the given S-pattern.
8.1.3 EH-pattern forms
An EH-pattern (for “ellipsis-head pattern”) is pattern that describes some number of terms, like an H-pattern, but may also place contraints on the number of times it occurs in a repetition. EH-patterns (and ellipses) are useful for matching keyword arguments where the keywords may come in any order.
Examples: | ||||||
| ||||||
> (parser1 #'(#:a 1)) | ||||||
ok | ||||||
> (parser1 #'(#:b 2 #:c 3 #:c 25 #:a 'hi)) | ||||||
ok | ||||||
> (parser1 #'(#:a 1 #:a 2)) | ||||||
eval:29:0: ?: too many occurrences of #:a keyword after 4 | ||||||
terms at: (#:a 1 #:a 2) in: (#:a 1 #:a 2) |
The pattern requires exactly one occurrence of the #:a keyword and argument, at most one occurrence of the #:b keyword and argument, and any number of #:c keywords and arguments. The “pieces” can occur in any order.
Here are the variants of EH-pattern:
(~once H-pattern once-option )
once-option = #:name name-expr | #:too-few too-few-message-expr | #:too-many too-many-message-expr Matches if the inner H-pattern matches. This pattern must be selected exactly once in the match of the entire repetition sequence.
If the pattern is not chosen in the repetition sequence, then an error is raised with a message, either too-few-message-expr or "missing required occurrence of name-expr".
If the pattern is chosen more than once in the repetition sequence, then an error is raised with a message, either too-many-message-expr or "too many occurrences of name-expr".
(~optional H-pattern optional-option )
optional-option = #:name name-expr | #:too-many too-many-message-expr Matches if the inner H-pattern matches. This pattern may be used at most once in the match of the entire repetition.
If the pattern is chosen more than once in the repetition sequence, then an error is raised with a message, either too-many-message-expr or "too many occurrences of name-expr".
8.1.4 Pattern directives
Both syntax-parse and syntax-parser support directives for annotating the pattern and specifying side conditions. The grammar for pattern directives follows:
pattern-directive | = | #:declare pattern-id syntax-class-id | ||
| | #:declare pattern-id (syntax-class-id expr ...) | |||
| | #:with syntax-pattern expr | |||
| | #:fail-when condition-expr message-expr | |||
| | #:fail-unless condition-expr message-expr |
#:declare pvar-id syntax-class-id
#:declare pvar-id (syntax-class-id expr ) The first form is equivalent to using the pvar-id:syntax-class-id form in the pattern (but it is illegal to use both for a single pattern variable). The #:declare form may be preferred when writing macro-defining macros or to avoid dealing with structured identifiers.
The second form allows the use of parameterized syntax classes, which cannot be expressed using the “colon” notation. The exprs are evaluated outside the scope of any of the attribute bindings from pattern that the #:declare directive applies to.
#:with syntax-pattern expr Evaluates the expr in the context of all previous attribute bindings and matches it against the pattern. If the match succeeds, the pattern’s attributes are added to environment for the evaluation of subsequent side conditions. If the #:with match fails, the matching process backtracks. Since a syntax object may match a pattern in several ways, backtracking may cause the same clause to be tried multiple times before the next clause is reached.
#:fail-when condition-expr message-expr
#:fail-unless condition-expr message-expr Evaluates the condition-expr in the context of all previous attribute bindings. If the value is any non-false value for #:fail-when or if the value is #f for #:fail-unless, the matching process backtracks (with the given message); otherwise, it continues.
(attribute attr-id) |
The values returned by attribute never undergo additional wrapping as syntax objects, unlike values produced by some uses of syntax, quasisyntax, etc. Consequently, the attribute form is preferred when the attribute value is used as data, not placed in a syntax object.
8.2 Syntax Classes
Syntax classes provide an abstraction mechanism for the specification of syntax. Built-in syntax classes are supplied that recognize basic classes such as identifiers and keywords. Programmers can compose basic syntax classes to build specifications of more complex syntax, such as lists of distinct identifiers and formal arguments with keywords. Macros that manipulate the same syntactic structures can share syntax class definitions. The structure of syntax classes and patterns also allows syntax-parse to automatically generate error messages for syntax errors.
When a syntax class accepts (matches) a syntax object, it computes and provides attributes based on the contents of the matched syntax. While the values of the attributes depend on the matched syntax, the set of attributes and each attribute’s ellipsis nesting depth is fixed for each syntax class.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
#:attributes (attr-arity-decl ) Declares the attributes of the syntax class. An attribute arity declaration consists of the attribute name and optionally its ellipsis depth (zero if not explicitly specified).
If the attributes are not explicitly listed, they are inferred as the set of all pattern variables occurring in every variant of the syntax class. Pattern variables that occur at different ellipsis depths are not included, nor are nested attributes.
#:description description The description argument is an expression (evaluated in a scope containing the syntax class’s parameters) that should evaluate to a string. It is used in error messages involving the syntax class. For example, if a term is rejected by the syntax class, an error of the form "expected description" may be synthesized.
If absent, the name of the syntax class is used instead.
#:transparent Indicates that errors may be reported with respect to the internal structure of the syntax class.
#:literals (literal-entry)
#:literal-sets (literal-set )
#:conventions (convention-id ) Declares the literals and conventions that apply to the syntax class’s variant patterns and their immediate #:with clauses. Patterns occuring within subexpressions of the syntax class (for example, on the right-hand side of a #:fail-when clause) are not affected.
These options have the same meaning as under syntax-parse.
(pattern syntax-pattern stxclass-pattern-directive )
stxclass-pattern-directive = pattern-directive | #:rename internal-id external-id Accepts syntax matching the given syntax pattern with the accompanying pattern directives as in syntax-parse.
The attributes of the variant are the attributes of the pattern together with all attributes bound by #:with clauses, including nested attributes produced by syntax classes associated with the pattern variables.
| ||
|
8.2.1 Attributes
A syntax class has a set of attributes. Each attribute has a name, an ellipsis depth, and a set of nested attributes. When an instance of the syntax class is parsed and bound to a pattern variable, additional pattern variables are bound for each of the syntax class’s attributes. The name of these additional pattern variables is the dotted concatenation of the primary pattern variable with the name of the attribute.
For example, if pattern variable p is bound to an instance of a syntax class with attribute a, then the pattern variable p.a is bound to the value of that attribute. The ellipsis depth of p.a is the sum of the depths of p and attribute a.
The attributes of a syntax class are either given explicitly with an #:attributes option or inferred from the pattern variables of the syntax class’s variants.
8.2.2 Inspection tools
The following special forms are for debugging syntax classes.
(syntax-class-attributes syntax-class-id) |
(syntax-class-parse syntax-class-id stx-expr arg-expr ) |
8.3 Literal sets and Conventions
Sometimes the same literals are recognized in a number of different places. The most common example is the literals for fully expanded programs, which are used in many analysis and transformation tools. Specifying literals individually is burdensome and error-prone. As a remedy, syntax/parse offers literal sets. A literal set is defined via define-literal-set and used via the #:literal-set option of syntax-parse.
(define-literal-set name-id (literal )) | ||||||||||
|
Examples: | ||||
| ||||
| ||||
s |
(define-conventions name-id (id-pattern syntax-class) ) | |||||||||||||||||||||||||
|
Examples: | |||
| |||
| |||
(a b c) | |||
| |||
| |||
(a (b c) 1 (2 3)) |
8.4 Library syntax classes and literal sets
8.4.1 Syntax classes
| |
| |
| |
| |
| |
| |
| |
| |
| |
|
(static predicate description) |
When used outside of the dynamic extend of a macro transformer (see syntax-transforming?), matching fails.
The attribute value contains the value the name is bound to.
(atom-in-list atoms description) |
Use atom-in-list instead of a literals list when recognizing identifier based on their symbolic names rather than their bindings.
8.4.2 Literal sets
Note that the literal-set uses the names #%plain-lambda and #%plain-app, not lambda and #%app.