Binding as Sets of Scopes Notes on a new model of macro expansion for Racket

This document is superceded by a later revision.

Binding as Sets of Scopes
Notes on a new model of macro expansion for Racket

Matthew Flatt

Hygienic macro expansion is desirable for the same reason as lexical scope: both enable local reasoning about binding so that program fragments compose reliably. The analogy suggests specifying hygienic macro expansion as a kind of translation into lexical-scope machinery. In particular, variables must be “renamed” to match the mechanisms of lexical scope as the variables interact with macros.

A specification of hygiene in terms of renaming accommodates simple binding forms well enough, but it becomes unwieldy for recursive definition contexts (Flatt et al. 2012, section 3.8), especially for contexts that allow a mixture of macro and non-macro definitions. The renaming approach is also difficult to implement compactly and efficiently in a macro system that supports “hygiene bending” operations, such as datum->syntax, because a history of renamings must be recorded for replay on an arbitrary symbol.

In a new macro expander for Racket, we discard the renaming approach and start with a generalized idea of macro scope, where lexical scope is just a special case of macro scope when applied to a language without macros. The resulting implementation is substantially simpler than one based on renaming, and it avoids bugs that have proven too difficult to repair in the current macro expander.1The bugs manifest most commonly as failures of submodules within typed/racket modules.

The change to the expander’s underlying model of binding potentially affects existing Racket macros, since the expander exposes many hygiene-bending operations. All purely pattern-based macros work as before,2Or so I think. I’m hopeful that the claim can be formulated and proved as a theorem. however, and experiments indicate that the vast majority of other macros continue to work, too.

1 Macro Scope via Scope Sets

A scope corresponds to a binding context, and every identifier in a program (source, partially expanded, or expanded) has a set of scopes. For example, in

(let ([x 1])
(lambda (y)
z))

the let form corresponds to a scope a, and the lambda form corresponds to b. That is, everything in the let’s body is in a, and everything in the inner lambda’s body is in b; the set of scopes associated with z is {a, b}.3We might say that “the environment of z is {a, b},” but for macro expansion, we traditionally use the word “environment” to refer to a mapping from bindings to variables and compile-time values. To avoid confusion, we’ll refrain from using “environment” to mean a set of scopes.

In a macro-extensible language, expanding a use of a macro creates a new scope in the same way that a binding form creates a new scope. Starting with

(let ([x 1])
  (let-syntax ([m (lambda (stx) #'x)])
    (lambda (x)
       (m))))

the right-hand side of the m binding is has the scope set {a}, while the final m has scope set {a, b, c} corresponding to the let, let-syntax, and let forms. We can write the scope sets next to each x and m at the point where macro expansion reaches the (m) form:

(let ([x{a} 1])
  (let-syntax ([m{a, b} (lambda (stx) #'x{a})])
    (lambda (x{a, b, c})
      (m{a, b, c}))))

The expansion of (m) produces x with the scope set {a, d}, where d is a new scope for the macro’s expansion:

(let ([x{a} 1])
  (let-syntax ([m{a, b} (lambda (stx) #'x{a})])
    (lambda (x{a, b, c})
      x{a, d})))

The absence of c on the final x explains why it doesn’t refer to the inner binding of x. At the same time, if a different m places a macro-introduced x in a binding position around an x from a macro use (m x), the x from the use is not macro-introduced and doesn’t have the scope d, so it wouldn’t refer to the macro-introduced binding.

Lexical scope corresponds to scope sets that are constrained to a particular shape: For any given set, there’s a single scope s that implies all the others (i.e., the ones around s in the program). As a result, s by itself (along with a symbolic name) is enough information to identify a binding. We normally describe lexical scope in terms of the closest such s for some notion of “closest.” Given scope sets instead of individual scopes, we can define “closest” as the largest relevant set.

More generally, we can define binding based on subsets: A reference’s binding is found as one whose set of scopes is a subset of the reference’s own scopes (in addition to having the same symbolic name). The advantage of using sets of scopes is that macro expansion creates scope sets that overlap in more general ways; there’s not always a s that implies all the others. Absent a determining s, we can’t identify a binding by a single scope, but we can identify it by a set of scopes.

If arbitrary sets of scopes are possible, then two different bindings might have overlapping scopes, neither might be a subset of the other, and both might be subsets of a particular reference’s scope set. In that case, the reference is ambiguous.

Ambiguous bindings can happen only if the macro system provides unusual operations. They will not show up with syntax-rules, and they will not show up with just syntax-case and syntax unless state or compile-time binding is used to communicate between macro invocations.
You may discern a correspondence between “scope” here and “mark” in other descriptions of Scheme macros. Internally, the implementation still uses the term “mark.”

2 Bindings

When macro expansion encounters a binding form, it

creates a new scope;
adds the scope to every identifier in binding position, as well as to the region where the bindings apply; and
extends a global table that maps a 〈symbol, scope set〉 pair to a representation of a binding.

Each local binding is represented by a unique, opaque value (e.g., a gensym). A binding to a module export is represented by the module name paired with a serializable identifier for the definition within the module.

For example,

(let ([x 1])
  (let-syntax ([m (lambda (stx) #'x)])
    (lambda (x)
      (m))))

more precisely expands as

(let ([x{a} 1])
  (let-syntax ([m{a, b} (lambda (stx{e}) #'x{a})])
    (lambda (x{a, b, c})
       x{a, d})))

x{a}	→	x4
m{a, b}	→	m8
stx{e}	→	stx15
x{a, b, c}	→	x16

where the compile-time environment along the way maps x4 to a variable, stx15 to a variable, m8 to a macro, and x16 to a variable. The reference x{a, d} has the binding x4, because x{a} is the mapping for x in the binding table that has the largest subset of {a, d}.

A free-identifier=? comparison on identifiers checks whether the two identifiers have the same binding. A bound-identifier=? comparison checks that two identifiers have exactly the same scope sets, independent of the binding table.

Note that (bound-identifier=? x y) does not completely answer the question “would x bind y?”A #t result answers that question in the affirmative, but x might bind y even if the result is #f. The same is currently true in definition contexts for Racket and Chez Scheme, which (like the set-of-scopes macro system) print #f but produce 1 for this example:
(let ()
  (define-syntax (m stx)
    (syntax-case stx ()
      [(_ a b)
       (begin
         (printf "~s\n" (bound-identifier=? #'a #'b))
         #'(begin
             (define a 1)
             b))]))
  (define-syntax n
    (syntax-rules ()
      [(_ id) (m id x)]))
  (n x))
In practice, bound-identifier=? is used to check whether two identifiers would conflict as bindings in the same context. It continues to work for that purpose with set-of-scopes binding.

The binding table can grow forever, but when a particular scope becomes unreachable (i.e., when no reachable syntax object includes a reference to the scope), then any mapping that includes the scope becomes unreachable. This weak mapping is easily implemented by attaching the mapping to the scope, instead of using an actual global table. Any scope in a scope set can house the binding, since the binding can only be referenced using all of the scopes in the set. Attaching to the most recently allocated scope is a good heuristic, because the most recent scope is likely to be maximally distinguishing and have the shortest lifetime.

When a syntax object is serialized, the serialization must pull along the fragment of the binding table that is relevant for the syntax object’s scopes. Again, that extraction happens more or less automatically when the binding-table entries are implemented by attaching them to scopes. Deserializing a syntax object creates fresh representations for every serialized scope, but preserving sharing ensures that binding relationships are preserved among identifiers in the syntax object.

3 Macro Definitions in a Recursive Scope

If we try out the the set-of-scopes idea on letrec-syntax, and if we try an example that checks whether a use-site identifier captures a macro-introduced identifier, then we get a bad result:

(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x)
                               (let ([misc-id 'other])
                                 x))])])
   (identity x))

Assuming that the letrec-syntax form creates a scope a, the scope must be added to both the right-hand side and body of the letrec-syntax form (to create recursive bindings):

(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x{a})
                               (let ([misc-id 'other])
                                 x{a}))])])
   (identity x{a}))

Expanding (identity x{a}) creates the scope set b and produces

(lambda (x{a, b})
(let ([x{a} 'other])
x{a, b}))

where b is added to each of the two introduced xs. The lambda introduces a new scope c, and let introduces d, producing

(lambda (x{a, b, c})
(let ([x{a, c, d} 'other])
x{a, b, c, d}))

At this point, unfortunately, the binding of the innermost x is ambiguous: {a, b, c, d} is a subset of both {a, b, c} and {a, c, d}, neither of which is a subset of the other. Instead, we want that x to refer to the lambda binding.

This example went wrong because it didn’t introduce the right scopes for letrec-syntax. We got part of it right: the letrec-syntax form should create a scope that covers both the right-hand side and body, and it should bind with that scope, reflecting the rec part of the name. In addition, however, letrec-syntax should introduce a second scope just for the body. The scope for just the body ensures that use-site identifiers have a scope that is absent from introduced identifiers (just as introduced identifiers already get a scope that is absent from use-site identifiers).

Following that rule, if we call the body scope e, then we start with

(identity x{a, e})

which expands to

(lambda (x{a, b})
(let ([x{a, e} 'other])
x{a, b}))

which ends up as

(lambda (x{a, b, c})
(let ([x{a, c, d, e} 'other])
x{a, b, c, d}))

There’s no ambiguity, and the final x refers to the lambda binding as intended.

While treating the body of a letrec-syntax as a separate scope may seem unintuitive, besides fixing the above problem, it generalizes well to a definition context that allows both macro and non-macro definitions (i.e., a generalization of letrec-syntax). In such a context, we might write identity and its use as

(define-syntax identity
  (syntax-rules ()
    [(_ misc-id)
     (lambda (x)
       (let ([misc-id 'other])
         x))]))

(identity x)

We still need a scope that divides the definition of identity from its use. Then again, we can’t draw the line before expansion like we do with letrec-syntax, because distinguishing expressions from definitions may require expansion (e.g., to ensure that (identity x) doesn’t expand to another macro definition).

This problem is similar to automatically detecting which identifiers in a macro expansion were introduced by the macro. The usual technique is to mark the input to the macro, and then anti-mark the result of the macro; original pieces are left unmarked, and introduced pieces are (anti-)marked. In the case of a definition context, all forms start out with an expression scope, but the scope is removed from a form that turns out to be a macro definition. The expression scope is also removed on the binding part of form that turns out to be a non-macro definition, since all definitions and macro definitions are in a single mutually recursive scope.

4 Local Bindings and Syntax Quoting

The set-of-scopes approach to binding “just works” for pattern-based macros, but it makes finer distinctions among identifiers than expected by existing Racket macros that use #` or quote-syntax. To be consistent with the way that Racket macros are currently written, quote-syntax must discard some scopes. For example, in the macro

(lambda (stx)
  (let ([id #'x])
     #`(let ([#,id 1])
         x)))

the x that takes the place of #,id should bind the x that is in the resulting let’s body. The x that is bound to id, however, is not in the scope that is introduced by the compile-time let:

(lambda (stx{a})
  (let ([id{a, b} #'x{a}])
     #`(let ([#,id{a, b} 1])
          x{a, b})))

If quote-syntax (implicit in #`) preserves all scopes on an identifier, then with set-of-scopes binding, the x that replaces #,id will not capture the x in the generated let body.

It’s tempting to think that the compile-time let should introduce a phase-specific scope that applies only for compile-time references, in which case it won’t affect x as a run-time reference. That adjustment doesn’t solve the problem in general, since a macro can generate compile-time bindings and references just as well as run-time bindings and references.

A solution is for the expansion of quote-syntax to discard certain scopes on its content. The discarded scopes are those from binding forms that enclosed the quote-syntax form up to a phase crossing or module top-level. In the case of a quote-syntax form within a macro binding’s right-hand side, those scopes are all of the scopes introduced on the right-hand side of the macro binding.

The resulting macro system is different than the current Racket macro system. Experiments suggest that the vast majority of macro implementations work either way, but it’s easy to construct an example that behaves differently:

(free-identifier=? (let ([x 1]) #'x)
#'x)

In Racket’s current macro system, the result is #f. The set-of-scopes system with a scope-pruning quote-syntax produces #t, instead, because the let-generated scope is stripped away from #'x.

Note: Racket’s macro system matches Dybvig et al. (1993), where both free-identifier=? and bound-identifier=? produce #f for the above arguments, and bound-identifier=? always implies free-identifier=?. The current psyntax implementation, as used by Chez Scheme and other implementations and as consistent with Adams (2015), produces #t and #f for free-identifier=? and bound-identifier=?, respectively; as the example illustrates, bound-identifier=? does not imply free-identifier=?. The set-of-scopes system produces #t and #t for free-identifier=? and bound-identifier=?, respectively, and bound-identifier=? always implies free-identifier=?.

If quote-syntax did not prune scopes, then not only would the result above be #f, it would also be #f with (let ([y 1]) #'x) instead of (let ([x 1]) #'x). That similarity reflects the switch to attaching identifier-independent scopes to identifiers instead of attaching identifier-specific renamings.

Arguably, the issue here is the way that pieces of syntax from different local scopes are placed into the same result syntax object, with the expectation that all the pieces are treated the same way. In other words, Racket programmers have gotten used to an unusual variant of quote-syntax, and most macros could be written just as well with a non-pruning variant. Then again, the pruning variant of quote-syntax tends to discard information about local bindings that is usually unwanted but preserved by the current quote-syntax. Meanwhile, supplying an additional, non-pruning variant of quote-syntax poses no problems.

There’s precedent for a variant of syntax-case that does not support assembling pieces as in the example. An early version of van Tonder’s macro expander (van Tonder 2007) had that property as a result of making the evaluation of syntax generate a fresh context.

5 Modules and Phases

The module form creates a new scope for its bindings, and it also creates an expression scope to separate macro definitions from potential macro uses (like any definition context). A submodule uses module* ... #f creates a nested scopes in the obvious way. In other module* and module submodule forms, the macro expander prevents access to the enclosing module’s bindings by initially replacing the scope that corresponds to the enclosing module with a fresh scope (so that none of the enclosing bindings are available, but distinctions among identifer scopes are preserved).

A module must distinguish bindings that have the same name and scopes but different phases, and this distinction might be implemented in a couple of ways:

Mappings in the binding table might be phase-specific. That is, while we previously said that the binding table maps a 〈symbol, scope set〉 pair to a binding, the table domain might actually be a 3-tuple: 〈symbol, scope set, phase〉.
Scope might be phase-specific, where a module form introduces a scope for every binding phase to its body, and only the scopes at a given phase are used to locate a reference’s binding.

Absent specific operations, these possibilities are indistinguishable. Phase-specific scopes imply that any entry in the binding table applies to a particular phase—as long as there’s no way to shift the phase of a scope relative to other scopes. Alternatively, we can model a phase in a 3-tuple key as an extra scope in the set, and we can treat all other scopes as an set of level-specific scopes—as long as there is no way to extract individual scopes from a set. Making the binding table phase-specific seems simpler.

Instantiating a module at a particular phase implies a phase shift in its syntax literals. Consider the module

(define x 1)
(define-for-syntax x 2)

(define id #'x)
(define-for-syntax id #'x)

(provide id (for-syntax id))

and suppose that the module is imported both normally and for-syntax, the later with a s: prefix. At phase level 1 in the importing module, both id and s:id will be bound to an identifier x that had the same scopes originally, but they should refer to different x bindings (in different module instances with different values).

If the binding table holds phase-specific mappings, then each instantiation of a module can create a fresh mark and duplicate all the original mappings to ones that add the new mark, but phase-shifted appropriately. Suitable representation choices can make the duplication and shifting inexpensive.

A unified treatment of module-level and local binding makes local binding phase-specific. That is, in

(define-for-syntax x 10)
(let ([x 1])
(let-syntax ([y x])
....))

the x on the right-hand side of let-syntax see the top-level phase-1 x binding, not the phase-0 local binding. This is a change from Racket’s current approach to binding and phases, but the only programs that are affected are ones that would trigger an out-of-phase error in the current system.

6 The Syntax-Function Zoo

Compared to Dybvig et al. (1993), Racket adds many functions for manipulating syntax objects during macro expansion in ways that are sensitive to the expansion context. All of those functions make as much sense with a set-of-scopes notion of binding, and mostly they are easier to specify.

In particular, Racket’s notion of an internal-definition context, which is a first-class construct during expansion, is difficult to specify in terms of renamings. An internal-definition context is backed by a renaming on syntax objects, where the renaming can refer to itself or other renamings, and so the binding-resolution process must handle cycles. With set-of-scopes binding, an internal-definition context is backed by a pair of scopes: one for the whole context, and one for expressions within the context; an internal-definition context doesn’t create cyclic syntax-object structures, and it needs no special rules for resolving references to bindings.

Unfortunately, none of the members of Racket’s syntax-function zoo seem to be made obsolete by set-of-scopes binding. Functions such as syntax-local-get-shadower have the same uses as before.

7 Compatibility with the Current Racket Expander

The documentation for Racket’s current macro expansion attempts to avoid references to the underlying mark-and-rename model. As a result, it is often too imprecise to expose differences created by a change to set-of-scope binding. One goal of the new model is to allow the specification and documentation of Racket’s macro expander to be tightened. Meanwhile, it should not be surprising that most macros confined themselves to behaviors that stay the same in both systems.

While most Racket programs expand and run the same with a set-of-scope expander as before, macros that are intended to manipulate scope in complex ways can expose the difference. The Racket implementations of the splicing forms in racket/splicing must be adjusted, as well as some details of unit, class, and define-generics forms.

Macros that use explicit internal-definition contexts are among the most likely to need adaptation, since internal-definition contexts formerly made distinctions among specific identifiers—the ones explicitly registered to create renamings—while the distinction now is more uniform. Some macros that use internal-definition contexts can switch to a simpler creation of a fresh scope (formerly “mark”), while others require a completely different strategy.

Here’s an example that produces 1 with Racket’s current expander, but it provokes an ambiguous-binding error with the set-of-scopes expander:

(define-syntax-rule (define1 id)
(begin
   (define x 1)
   ; stash a reference to the introduced identifier:
   (define-syntax id #'x)))

(define-syntax (use stx)
  (syntax-case stx ()
    [(_ id)
     (with-syntax ([old-id (syntax-local-value #'id)])
       #'(begin
           (define x 2)
           ; reference to old-id ends up ambiguous:
           old-id))]))

(define1 foo)
(use foo)

In the set-of-scopes model, define1 and use introduce bindings from two separate macro expansions, and they also arrange for an reference to be introduced by both of those macros, hence the ambiguity. Arguably, in this case, the use macro is broken, as illustrated in a variant of the program without define1 that produces 2 with both expanders:

(begin
  (define x 1)
  (define-syntax foo #'x))

(define-syntax (use stx)
  (syntax-case stx ()
    [(_ id)
     (with-syntax ([old-id (syntax-local-value #'id)])
       #'(begin
           (define x 2)
           old-id))]))

(use foo)

The use macro can be fixed for both expanders and both contexts by applying syntax-local-introduce to the result of (syntax-local-value #'id), since the identifier conceptually exists outside of this macro’s expansion. Such an application of syntax-local-introduce is typically needed and typically present in existing Racket macros that bring stashed identifiers into a new context.

In addition to affecting macros, the change to the binding model can affect functions that inspect fully-expanded programs. In a fully expanded program with Racket’s current expander, a local binding and a reference to that binding use identifiers that are bound-identifier=?; otherwise, re-expansion would go wrong. With a set-of-scopes notion of binding, the expander is no longer obliged to ensure that the reference is bound-identifier=? to the identifier for it’s reference; free-identifier=? is sufficient.

As on many other points, the change to fully expanded programs is one where local bindings become more like module-level bindings. Also, the current macro expander will sometimes report that an identifier’s binding is “local” when it actually refers to a module-level binding; the new implementation doesn’t have that bug.

8 Model

Below is a model in the style of Flatt et al. (2012). It’s comparable to a model that includes support for internal-definition contexts, but the resolve metafunction is dramatically simplified, even compared to a model without internal-definition contexts.

Although this model includes pruning within syntax, that pruning is not so interesting, since compile-time code is parsed directly instead of macro-expanded.

Bibliography

Michael D. Adams. Towards the Essence of Hygiene. In Proc. Pronciples of Programming Languages, 2015.

R. Kent Dybvig, Robert Hieb, and Carl Bruggeman. Syntactic Abstraction in Scheme. Lisp and Symbolic Computation 5(4), 1993.

Matthew Flatt, Ryan Culpepper, Robert Bruce Findler, and David Darais. Macros that Work Together: Compile-Time Bindings, Partial Expansion, and Definition Contexts. Journal of Functional Programming 22(2), 2012.

Andre van Tonder. R6RS Libraries and Macros. 2007. http://www.het.brown.edu/people/andre/macros/index.html

1The bugs manifest most commonly as failures of submodules within typed/racket modules.

2Or so I think. I’m hopeful that the claim can be formulated and proved as a theorem.

3We might say that “the environment of z is {a, b},” but for macro expansion, we traditionally use the word “environment” to refer to a mapping from bindings to variables and compile-time values. To avoid confusion, we’ll refrain from using “environment” to mean a set of scopes.

1	Macro Scope via Scope Sets
2	Bindings
3	Macro Definitions in a Recursive Scope
4	Local Bindings and Syntax Quoting
5	Modules and Phases
6	The Syntax-Function Zoo
7	Compatibility with the Current Racket Expander
8	Model
	Bibliography