Binding as Sets of Scopes
Notes on a new model of macro expansion for Racket
Hygienic macro expansion is desirable for the same reason as lexical scope: both enable local reasoning about binding so that program fragments compose reliably. The analogy suggests specifying hygienic macro expansion as a kind of translation into lexical-scope machinery. In particular, variables must be “renamed” to match the mechanisms of lexical scope as the variables interact with macros.
A specification of hygiene in terms of renaming accommodates simple binding forms well enough, but it becomes unwieldy for recursive definition contexts (Flatt et al. 2012, section 3.8), especially for contexts that allow a mixture of macro and non-macro definitions. The renaming approach is also difficult to implement compactly and efficiently in a macro system that supports “hygiene bending” operations, such as datum->syntax, because a history of renamings must be recorded for replay on an arbitrary symbol.
In a new macro expander for Racket, we discard the renaming approach and start with a generalized idea of macro scope, where lexical scope is just a special case of macro scope when applied to a language without macros. The resulting implementation is substantially simpler than one based on renaming, and it avoids bugs that have proven too difficult to repair in the current macro expander.1The bugs manifest most commonly as failures of submodules within typed/racket modules.
The change to the expander’s underlying model of binding potentially affects existing Racket macros, since the expander exposes many hygiene-bending operations. All purely pattern-based macros work as before,2Or so I think. I’m hopeful that the claim can be formulated and proved as a theorem. however, and experiments indicate that the vast majority of other macros continue to work, too.
1 Macro Scope via Scope Sets
(let ([x 1]) (let-syntax ([m (lambda (stx) #'x)]) (lambda (x) (m))))
(let ([x{a} 1]) (let-syntax ([m{a, b} (lambda (stx) #'x{a})]) (lambda (x{a, b, c}) (m{a, b, c}))))
(let ([x{a} 1]) (let-syntax ([m{a, b} (lambda (stx) #'x{a})]) (lambda (x{a, b, c}) x{a, d})))
Lexical scope corresponds to scope sets that are constrained to a particular shape: For any given set, there’s a single scope s that implies all the others (i.e., the ones around s in the program). As a result, s by itself (along with a symbolic name) is enough information to identify a binding. We normally describe lexical scope in terms of the closest such s for some notion of “closest.” Given scope sets instead of individual scopes, we can define “closest” as the largest relevant set.
More generally, we can define binding based on subsets: A reference’s binding is found as one whose set of scopes is a subset of the reference’s own scopes (in addition to having the same symbolic name). The advantage of using sets of scopes is that macro expansion creates scope sets that overlap in more general ways; there’s not always a s that implies all the others. Absent a determining s, we can’t identify a binding by a single scope, but we can identify it by a set of scopes.
If arbitrary sets of scopes are possible, then two different bindings might have overlapping scopes, neither might be a subset of the other, and both might be subsets of a particular reference’s scope set. In that case, the reference is ambiguous.
Ambiguous bindings can happen only if the macro system provides unusual operations. They will not show up with syntax-rules, and they will not show up with just syntax-case and syntax unless state or compile-time binding is used to communicate between macro invocations.
You may discern a correspondence between “scope” here and “mark” in other descriptions of Scheme macros. Internally, the implementation still uses the term “mark.”
2 Bindings
creates a new scope;
adds the scope to every identifier in binding position, as well as to the region where the bindings apply; and
extends a global table that maps a 〈symbol, scope set〉 pair to a representation of a binding.
(let ([x 1]) (let-syntax ([m (lambda (stx) #'x)]) (lambda (x) (m))))
|
|
|
A free-identifier=? comparison on identifiers checks whether the two identifiers have the same binding. A bound-identifier=? comparison checks that two identifiers have exactly the same scope sets, independent of the binding table.
Note that (bound-identifier=? x y) does not completely answer the question “would x bind y?”A #t result answers that question in the affirmative, but x might bind y even if the result is #f. The same is currently true in definition contexts for Racket and Chez Scheme, which (like the set-of-scopes macro system) print #f but produce 1 for this example:
(let () (define-syntax (m stx) (syntax-case stx () [(_ a b) (begin (printf "~s\n" (bound-identifier=? #'a #'b)) #'(begin (define a 1) b))])) (define-syntax n (syntax-rules () [(_ id) (m id x)])) (n x)) In practice, bound-identifier=? is used to check whether two identifiers would conflict as bindings in the same context. It continues to work for that purpose with set-of-scopes binding.
The binding table can grow forever, but when a particular scope becomes unreachable (i.e., when no reachable syntax object includes a reference to the scope), then any mapping that includes the scope becomes unreachable. This weak mapping is easily implemented by attaching the mapping to the scope, instead of using an actual global table. Any scope in a scope set can house the binding, since the binding can only be referenced using all of the scopes in the set. Attaching to the most recently allocated scope is a good heuristic, because the most recent scope is likely to be maximally distinguishing and have the shortest lifetime.
When a syntax object is serialized, the serialization must pull along the fragment of the binding table that is relevant for the syntax object’s scopes. Again, that extraction happens more or less automatically when the binding-table entries are implemented by attaching them to scopes. Deserializing a syntax object creates fresh representations for every serialized scope, but preserving sharing ensures that binding relationships are preserved among identifiers in the syntax object.
3 Macro Definitions in a Recursive Scope
(letrec-syntax ([identity (syntax-rules () [(_ misc-id) (lambda (x) (let ([misc-id 'other]) x))])]) (identity x))
(letrec-syntax ([identity (syntax-rules () [(_ misc-id) (lambda (x{a}) (let ([misc-id 'other]) x{a}))])]) (identity x{a}))
This example went wrong because it didn’t introduce the right scopes for letrec-syntax. We got part of it right: the letrec-syntax form should create a scope that covers both the right-hand side and body, and it should bind with that scope, reflecting the rec part of the name. In addition, however, letrec-syntax should introduce a second scope just for the body. The scope for just the body ensures that use-site identifiers have a scope that is absent from introduced identifiers (just as introduced identifiers already get a scope that is absent from use-site identifiers).
(identity x{a, e})
(define-syntax identity (syntax-rules () [(_ misc-id) (lambda (x) (let ([misc-id 'other]) x))])) (identity x)
This problem is similar to automatically detecting which identifiers in a macro expansion were introduced by the macro. The usual technique is to mark the input to the macro, and then anti-mark the result of the macro; original pieces are left unmarked, and introduced pieces are (anti-)marked. In the case of a definition context, all forms start out with an expression scope, but the scope is removed from a form that turns out to be a macro definition. The expression scope is also removed on the binding part of form that turns out to be a non-macro definition, since all definitions and macro definitions are in a single mutually recursive scope.
4 Local Bindings and Syntax Quoting
It’s tempting to think that the compile-time let should introduce a phase-specific scope that applies only for compile-time references, in which case it won’t affect x as a run-time reference. That adjustment doesn’t solve the problem in general, since a macro can generate compile-time bindings and references just as well as run-time bindings and references.
A solution is for the expansion of quote-syntax to discard certain scopes on its content. The discarded scopes are those from binding forms that enclosed the quote-syntax form up to a phase crossing or module top-level. In the case of a quote-syntax form within a macro binding’s right-hand side, those scopes are all of the scopes introduced on the right-hand side of the macro binding.
(free-identifier=? (let ([x 1]) #'x) #'x)
Note: Racket’s macro system matches Dybvig et al. (1993), where both free-identifier=? and bound-identifier=? produce #f for the above arguments, and bound-identifier=? always implies free-identifier=?. The current psyntax implementation, as used by Chez Scheme and other implementations and as consistent with Adams (2015), produces #t and #f for free-identifier=? and bound-identifier=?, respectively; as the example illustrates, bound-identifier=? does not imply free-identifier=?. The set-of-scopes system produces #t and #t for free-identifier=? and bound-identifier=?, respectively, and bound-identifier=? always implies free-identifier=?.
If quote-syntax did not prune scopes, then not only would the result above be #f, it would also be #f with (let ([y 1]) #'x) instead of (let ([x 1]) #'x). That similarity reflects the switch to attaching identifier-independent scopes to identifiers instead of attaching identifier-specific renamings.
Arguably, the issue here is the way that pieces of syntax from different local scopes are placed into the same result syntax object, with the expectation that all the pieces are treated the same way. In other words, Racket programmers have gotten used to an unusual variant of quote-syntax, and most macros could be written just as well with a non-pruning variant. Then again, the pruning variant of quote-syntax tends to discard information about local bindings that is usually unwanted but preserved by the current quote-syntax. Meanwhile, supplying an additional, non-pruning variant of quote-syntax poses no problems.
There’s precedent for a variant of syntax-case that does not support assembling pieces as in the example. An early version of van Tonder’s macro expander (van Tonder 2007) had that property as a result of making the evaluation of syntax generate a fresh context.
5 Modules and Phases
The module form creates a new scope for its bindings, and it also creates an expression scope to separate macro definitions from potential macro uses (like any definition context). A submodule uses module* ... #f creates a nested scopes in the obvious way. In other module* and module submodule forms, the macro expander prevents access to the enclosing module’s bindings by initially replacing the scope that corresponds to the enclosing module with a fresh scope (so that none of the enclosing bindings are available, but distinctions among identifer scopes are preserved).
Mappings in the binding table might be phase-specific. That is, while we previously said that the binding table maps a 〈symbol, scope set〉 pair to a binding, the table domain might actually be a 3-tuple: 〈symbol, scope set, phase〉.
Scope might be phase-specific, where a module form introduces a scope for every binding phase to its body, and only the scopes at a given phase are used to locate a reference’s binding.
(define x 1) (define-for-syntax x 2) (define id #'x) (define-for-syntax id #'x) (provide id (for-syntax id))
If the binding table holds phase-specific mappings, then each instantiation of a module can create a fresh mark and duplicate all the original mappings to ones that add the new mark, but phase-shifted appropriately. Suitable representation choices can make the duplication and shifting inexpensive.
(define-for-syntax x 10) (let ([x 1]) (let-syntax ([y x]) ....))
6 The Syntax-Function Zoo
Compared to Dybvig et al. (1993), Racket adds many functions for manipulating syntax objects during macro expansion in ways that are sensitive to the expansion context. All of those functions make as much sense with a set-of-scopes notion of binding, and mostly they are easier to specify.
In particular, Racket’s notion of an internal-definition context, which is a first-class construct during expansion, is difficult to specify in terms of renamings. An internal-definition context is backed by a renaming on syntax objects, where the renaming can refer to itself or other renamings, and so the binding-resolution process must handle cycles. With set-of-scopes binding, an internal-definition context is backed by a pair of scopes: one for the whole context, and one for expressions within the context; an internal-definition context doesn’t create cyclic syntax-object structures, and it needs no special rules for resolving references to bindings.
Unfortunately, none of the members of Racket’s syntax-function zoo seem to be made obsolete by set-of-scopes binding. Functions such as syntax-local-get-shadower have the same uses as before.
7 Compatibility with the Current Racket Expander
The documentation for Racket’s current macro expansion attempts to avoid references to the underlying mark-and-rename model. As a result, it is often too imprecise to expose differences created by a change to set-of-scope binding. One goal of the new model is to allow the specification and documentation of Racket’s macro expander to be tightened. Meanwhile, it should not be surprising that most macros confined themselves to behaviors that stay the same in both systems.
While most Racket programs expand and run the same with a set-of-scope expander as before, macros that are intended to manipulate scope in complex ways can expose the difference. The Racket implementations of the splicing forms in racket/splicing must be adjusted, as well as some details of unit, class, and define-generics forms.
Macros that use explicit internal-definition contexts are among the
most likely to need adaptation, since internal-definition contexts
formerly made distinctions among specific identifiers—
(define-syntax-rule (define1 id) (begin (define x 1) ; stash a reference to the introduced identifier: (define-syntax id #'x))) (define-syntax (use stx) (syntax-case stx () [(_ id) (with-syntax ([old-id (syntax-local-value #'id)]) #'(begin (define x 2) ; reference to old-id ends up ambiguous: old-id))])) (define1 foo) (use foo)
(begin (define x 1) (define-syntax foo #'x)) (define-syntax (use stx) (syntax-case stx () [(_ id) (with-syntax ([old-id (syntax-local-value #'id)]) #'(begin (define x 2) old-id))])) (use foo)
In addition to affecting macros, the change to the binding model can affect functions that inspect fully-expanded programs. In a fully expanded program with Racket’s current expander, a local binding and a reference to that binding use identifiers that are bound-identifier=?; otherwise, re-expansion would go wrong. With a set-of-scopes notion of binding, the expander is no longer obliged to ensure that the reference is bound-identifier=? to the identifier for it’s reference; free-identifier=? is sufficient.
As on many other points, the change to fully expanded programs is one where local bindings become more like module-level bindings. Also, the current macro expander will sometimes report that an identifier’s binding is “local” when it actually refers to a module-level binding; the new implementation doesn’t have that bug.
8 Model
Below is a model in the style of Flatt et al. (2012). It’s comparable to a model that includes support for internal-definition contexts, but the resolve metafunction is dramatically simplified, even compared to a model without internal-definition contexts.
Although this model includes pruning within syntax, that pruning is not so interesting, since compile-time code is parsed directly instead of macro-expanded.
Bibliography
Michael D. Adams. Towards the Essence of Hygiene. In Proc. Pronciples of Programming Languages, 2015. |
R. Kent Dybvig, Robert Hieb, and Carl Bruggeman. Syntactic Abstraction in Scheme. Lisp and Symbolic Computation 5(4), 1993. |
Matthew Flatt, Ryan Culpepper, Robert Bruce Findler, and David Darais. Macros that Work Together: Compile-Time Bindings, Partial Expansion, and Definition Contexts. Journal of Functional Programming 22(2), 2012. |
Andre van Tonder. R6RS Libraries and Macros. 2007. http://www.het.brown.edu/people/andre/macros/index.html |
1The bugs manifest most commonly as failures of submodules within typed/racket modules.