Binding as Sets of Scopes
Notes on a new model of macro expansion for Racket
1 Macro Scope via Scope Sets
2 Bindings
3 Macros, Scopes, and Definitions
4 Local Bindings and Syntax Quoting
5 First-Class Definition Contexts
6 Rename Transformers
7 Modules and Phases
8 The Top Level
9 The Syntax-Function Zoo
10 Compatibility with the Current Racket Expander
11 Debugging Support
12 Model
Acknowledgments
Bibliography
This document is superceded by a later revision.

Binding as Sets of Scopes
Notes on a new model of macro expansion for Racket

Matthew Flatt

Hygienic macro expansion is desirable for the same reason as lexical scope: both enable local reasoning about binding so that program fragments compose reliably. The analogy suggests specifying hygienic macro expansion as a kind of translation into lexical-scope machinery. In particular, variables must be “renamed” to match the mechanisms of lexical scope as the variables interact with macros.

A specification of hygiene in terms of renaming accommodates simple binding forms well enough, but it becomes unwieldy for recursive definition contexts (Flatt et al. 2012, section 3.8), especially for contexts that allow a mixture of macro and non-macro definitions. The renaming approach is also difficult to implement compactly and efficiently in a macro system that supports “hygiene bending” operations, such as datum->syntax, because a history of renamings must be recorded for replay on an arbitrary symbol.

In a new macro expander for Racket, we discard the renaming approach and start with a generalized idea of macro scope, where lexical scope is just a special case of macro scope when applied to a language without macros. The resulting implementation is simpler than one based on renaming, and it avoids bugs that have proven too difficult to repair in the current macro expander.The bugs manifest most commonly as failures of submodules within typed/racket modules.

The change to the expander’s underlying model of binding potentially affects existing Racket macros, since the expander exposes many hygiene-bending operations. All purely pattern-based macros work as before,Or so I think. I’m hopeful that the claim can be formulated and proved as a theorem. however, and experiments indicate that the vast majority of other macros continue to work, too.

1 Macro Scope via Scope Sets

A scope corresponds to a binding context, and every identifier in a program (source, partially expanded, or expanded) has a set of scopes. For example, in
(let ([x 1])
  (lambda (y)
    z))
the let form corresponds to a scope a, and the lambda form corresponds to b. That is, everything in the let’s body is in a, and everything in the inner lambda’s body is in b; the set of scopes associated with z is {a, b}.We might say that “the environment of z is {a, b},” but for macro expansion, we also use the word “environment” to refer to a mapping from bindings to variables and compile-time values. To avoid confusion, I’ll refrain from using “environment” to mean a set of scopes.

In a macro-extensible language, expanding a use of a macro creates a new scope in the same way that a binding form creates a new scope. Starting with
(let ([x 1])
  (let-syntax ([m (lambda (stx) #'x)])
    (lambda (x)
       (m))))
the right-hand side of the m binding has the scope set {a}, while the final m has scope set {a, b, c} corresponding to the let, let-syntax, and lambda forms. We can write the scope sets next to each x and m at the point where macro expansion reaches the (m) form:
(let ([x{a} 1])
  (let-syntax ([m{a, b} (lambda (stx) #'x{a})])
    (lambda (x{a, b, c})
      (m{a, b, c}))))
The expansion of (m) produces x with the scope set {a, d}, where d is a new scope for identifiers that are introduced by the macro’s expansion:
(let ([x{a} 1])
  (let-syntax ([m{a, b} (lambda (stx) #'x{a})])
    (lambda (x{a, b, c})
      x{a, d})))
The absence of c on the final x explains why it doesn’t refer to the inner binding of x. At the same time, if a different m places a macro-introduced x in a binding position around an x from a macro use (m x), the x from the use is not macro-introduced and doesn’t have the scope d, so it wouldn’t refer to the macro-introduced binding.

Lexical scope corresponds to scope sets that are constrained to a particular shape: For any given set, there’s a single scope s that implies all the others (i.e., the ones around s in the program). As a result, s by itself (along with a symbolic name) is enough information to identify a binding. We normally describe lexical scope in terms of the closest such s for some notion of “closest.” Given scope sets instead of individual scopes, we can define “closest” as the largest relevant set.

More generally, we can define binding based on subsets: A reference’s binding is found as one whose set of scopes is a subset of the reference’s own scopes (in addition to having the same symbolic name). The advantage of using sets of scopes is that macro expansion creates scope sets that overlap in more general ways; there’s not always a s that implies all the others. Absent a determining s, we can’t identify a binding by a single scope, but we can identify it by a set of scopes.

If arbitrary sets of scopes are possible, then two different bindings might have overlapping scopes, neither might be a subset of the other, and both might be subsets of a particular reference’s scope set. In that case, the reference is ambiguous.

Ambiguous bindings can happen only if the macro system provides unusual operations. They will not show up with syntax-rules, and they will not show up with just syntax-case and syntax unless state or compile-time binding is used to communicate between macro invocations.

Our notion of “scope” is related to and insprired by the concept of “mark” in other descriptions of Scheme macros (Dybvig et al. 1993; Flatt et al. 2012).

2 Bindings

When macro expansion encounters a binding form, it
  • creates a new scope;

  • adds the scope to every identifier in binding position, as well as to the region where the bindings apply; and

  • extends a global table that maps a ⟨symbol, scope set⟩ pair to a representation of a binding.

Each local binding is represented by a unique, opaque value (e.g., a gensym). A binding to a module export is represented by the module name paired with a serializable identifier for the definition within the module.

For example,
(let ([x 1])
  (let-syntax ([m (lambda (stx) #'x)])
    (lambda (x)
      (m))))
more precisely expands as
(let ([x{a} 1])
  (let-syntax ([m{a, b} (lambda (stx{e}) #'x{a})])
    (lambda (x{a, b, c})
       x{a, d})))

 

x{a}

x4

m{a, b}

m8

stx{e}

stx15

x{a, b, c}

x16

where the compile-time environment along the way maps x4 to a variable, stx15 to a variable, m8 to a macro, and x16 to a variable. The reference x{a, d} has the binding x4, because x{a} is the mapping for x in the binding table that has the largest subset of {a, d}.

A free-identifier=? comparison on identifiers checks whether the two identifiers have the same binding. A bound-identifier=? comparison checks that two identifiers have exactly the same scope sets, independent of the binding table.

Note that (bound-identifier=? x y) does not completely answer the question “would x bind y?”A #t result answers that question in the affirmative, but x might bind y even if the result is #f. The same is currently true in definition contexts for Racket and Chez Scheme, which (like the set-of-scopes macro system) print #f but produce 1 for this example:
(let ()
  (define-syntax (m stx)
    (syntax-case stx ()
      [(_ a b)
       (begin
         (printf "~s\n" (bound-identifier=? #'a #'b))
         #'(begin
             (define a 1)
             b))]))
  (define-syntax n
    (syntax-rules ()
      [(_ id) (m id x)]))
  (n x))

In practice, bound-identifier=? is used to check whether two identifiers would conflict as bindings in the same context. It continues to work for that purpose with set-of-scopes binding.

The binding table can grow forever, but when a particular scope becomes unreachable (i.e., when no reachable syntax object includes a reference to the scope), then any mapping that includes the scope becomes unreachable. This weak mapping can be approximated by attaching the mapping to the scope, instead of using an actual global table. Any scope in a scope set can house the binding, since the binding can only be referenced using all of the scopes in the set. Attaching to the most recently allocated scope is a good heuristic, because the most recent scope is likely to be maximally distinguishing and have the shortest lifetime.

When a syntax object is serialized, the serialization must pull along the fragment of the binding table that is relevant for the syntax object’s scopes. Again, that extraction happens more or less automatically when the binding-table entries are implemented by attaching them to scopes, but an explicit “garbage collection” of the scopes is useful in practice to make serialized syntax more compact. Deserializing a syntax object creates fresh representations for every serialized scope, but preserving sharing ensures that binding relationships are preserved among identifiers in the syntax object—or more generally, among syntax objects within a module.

3 Macros, Scopes, and Definitions

The expansion of a macro use creates two new scopes: one scope for forms that are introduced by the macro, and one scope for forms that are present at the use site. The need for a use-site scope may not be apparent, particularly for readers who are familiar with existing, mark-based approaches to macros.

Consider the following letrec-syntax form, which effectively checks whether a use-site identifier captures a macro-introduced identifier:
(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x)
                               (let ([misc-id 'other])
                                 x))])])
   (identity x))
Assuming that the letrec-syntax form creates a scope a, the scope must be added to both the right-hand side and body of the letrec-syntax form (to create recursive bindings):
(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x{a})
                               (let ([misc-id 'other])
                                 x{a}))])])
   (identity x{a}))
If we create a scope only for introduced forms in a macro expansion, then expanding (identity x{a}) creates the scope set b and produces
(lambda (x{a, b})
  (let ([x{a} 'other])
    x{a, b}))
where b is added to each of the two introduced xs. The lambda introduces a new scope c, and let introduces d, producing
(lambda (x{a, b, c})
  (let ([x{a, c, d} 'other])
    x{a, b, c, d}))
At this point the binding of the innermost x is ambiguous: {a, b, c, d} is a superset of both {a, b, c} and {a, c, d}, neither of which is a subset of the other. Instead, we want that x to refer to the lambda binding.

Adding a scope for the macro-use site corrects this problem. If we call the use-site scope e, then we start with

(identity x{a, e})

which expands to
(lambda (x{a, b})
  (let ([x{a, e} 'other])
    x{a, b}))
which ends up as
(lambda (x{a, b, c})
  (let ([x{a, c, d, e} 'other])
    x{a, b, c, d}))
There’s no ambiguity, and the final x refers to the lambda binding as intended.

Unfortunately, adding a use-site scope creates a problem for macros that expand to definitions, as opposed to expressions. For example, consider a define-identity macro that is intended to expand to a definition of a given identifier as the identity function:
(define-syntax-rule (define-identity id)
  (define id (lambda (x) x)))
 
(define-identity f)
(f 5)
If the expansion of (define-identity f) adds a scope to the use-site f, the resulting definition will not bind the f in (f 5).

The underlying issue is that a definition context must treat use-site and introduced identifiers asymmetically as binding identifiers. In
(define-syntax-rule (define-five misc-id)
  (begin
   (define misc-id 5)
   x))
 
(define-five x)
the introduced x should refer to an x that is defined in the enclosing scope, which turns out to be the same x that appears at the use site of define-five. But in
(define-syntax-rule (define-other-five misc-id)
  (begin
   (define x 5)
   misc-id))
 
(define-other-five x)
the x from the use site should not refer to the macro-introduced binding x.

To support macros that expand to definitions of given identifiers, a definition context must keep track of scopes created for macro uses, and it must remove those scopes from identifiers that end up in binding positions. In the define-identity and define-five examples, the use-site scope is removed from the binding identifiers x and f, so they are treated the same as if their definitions appeared directly in the source.

As an optimization, a use-site scope is needed only when a macro is expanded in a definition context where the macro’s definition is in the same definition context. (Otherwise, the use is in a more nested scope, and thus already distinguished from introduced identifiers.) That combination happens regularly in a module body, but it is much less common than macro uses generally (since almost every form in a Racket source program is a macro use). This “optimization” is visible if the macro system provides certain operations, such as forcing the expansion of a sub-form, but it is invisible to pattern-based macros and to typical syntax-case macros.

To adapt to the use-site scopes added by a set-of-scopes expander, Racket’s syntax-local-introduce must flip both the macro-introduction scope and the use-site scope (if any) of the current macro expansion.

4 Local Bindings and Syntax Quoting

The set-of-scopes approach to binding works the same as previous models for pattern-based macros, but it makes finer distinctions among identifiers than expected by existing Racket macros that use #` or quote-syntax. To be consistent with the way that Racket macros are currently written, quote-syntax must discard some scopes. For example, in the macro
(lambda (stx)
  (let ([id #'x])
     #`(let ([#,id 1])
         x)))
the x that takes the place of #,id should bind the x that is in the resulting let’s body. The x that is bound to id, however, is not in the scope that is introduced by the compile-time let:
(lambda (stx{a})
  (let ([id{a, b} #'x{a}])
     #`(let ([#,id{a, b} 1])
          x{a, b})))
If quote-syntax (implicit in #`) preserves all scopes on an identifier, then with set-of-scopes binding, the x that replaces #,id will not capture the x in the generated let body.

It’s tempting to think that the compile-time let should introduce a phase-specific scope that applies only for compile-time references, in which case it won’t affect x as a run-time reference. That adjustment doesn’t solve the problem in general, since a macro can generate compile-time bindings and references just as well as run-time bindings and references.

A solution is for the expansion of quote-syntax to discard certain scopes on its content. The discarded scopes are those from binding forms that enclosed the quote-syntax form up to a phase crossing or module top-level, as well as any use-site scopes recorded for macro invocations within those binding forms. In the case of a quote-syntax form within a macro binding’s right-hand side, those scopes cover all of the scopes introduced on the right-hand side of the macro binding.

The resulting macro system is different than the current Racket macro system. Experiments suggest that the vast majority of macro implementations work either way, but it’s easy to construct an example that behaves differently:
(free-identifier=? (let ([x 1]) #'x)
                   #'x)
In Racket’s current macro system, the result is #f. The set-of-scopes system with a scope-pruning quote-syntax produces #t, instead, because the let-generated scope is stripped away from #'x.

Note: Racket’s macro system matches Dybvig et al. (1993), where both free-identifier=? and bound-identifier=? produce #f for the above arguments, and bound-identifier=? always implies free-identifier=?. The current psyntax implementation, as used by Chez Scheme and other implementations and as consistent with Adams (2015), produces #t and #f for free-identifier=? and bound-identifier=?, respectively; as the example illustrates, bound-identifier=? does not imply free-identifier=?. The set-of-scopes system produces #t and #t for free-identifier=? and bound-identifier=?, respectively, and bound-identifier=? always implies free-identifier=?.

If quote-syntax did not prune scopes, then not only would the result above be #f, it would also be #f with (let ([y 1]) #'x) instead of (let ([x 1]) #'x). That similarity reflects the switch to attaching identifier-independent scopes to identifiers instead of attaching identifier-specific renamings.

Arguably, the issue here is the way that pieces of syntax from different local scopes are placed into the same result syntax object, with the expectation that all the pieces are treated the same way. In other words, Racket programmers have gotten used to an unusual variant of quote-syntax, and most macros could be written just as well with a non-pruning variant. Then again, the pruning variant of quote-syntax tends to discard information about local bindings that is usually unwanted but preserved by the current quote-syntax.

There’s precedent for a variant of syntax-case that does not support assembling pieces as in the example. An early version of van Tonder’s macro expander (van Tonder 2007) had that property as a result of making the evaluation of syntax generate a fresh context.

Supplying an additional, non-pruning variant of quote-syntax poses no problems. Racket’s new macro expander provides the non-pruning variant when a #:local keyword is added to a quote-syntax form. For example,
(free-identifier=? (let ([x 1]) (quote-syntax x #:local))
                   (quote-syntax x #:local))
produces #f instead of #t, because the scope introduced by let is preserved in the body’s syntax object. The non-pruning variant of quote-syntax is useful for embedding references in a program’s full expansion that are meant to be inspected by tools other than the Racket compiler; Typed Racket’s implementation uses the #:local variant of quote-syntax to embed type declarations (including declarations for local bindings) in a program’s expansion for use by its type checker.

5 First-Class Definition Contexts

A Racket macro can implement a new kind of definition context, including one that mixes variable and macro definitions, by allocating a first-class (at compilation time) definition context. The syntax-local-make-definition-context function creates such contexts. A macro can force expansion of forms in the definition context, it can add variable bindings to the definition context, and it can add compile-time bindings and values that are referenced by further macro expansion within the definition context.

To a first approximation, a first-class definition context corresponds to a scope that is added to any form expanded within the definition context and that houses the definition context’s bindings. A definition context also has a compile-time environment frame (extending the context of the macro use) to house the mapping of bindings to variables and compile-time values.

As for other definition contexts (see Macros, Scopes, and Definitions), the compile-time environment must track use-site scopes that are generated for macro expansions within a first-class definition context. If the macro moves any identifier into a binding position in the overall expansion, then the macro normally must remove accumulated use-site scopes (for the current definition context only) by applying syntax-local-identifier-as-binding to the identifier. For example, the unit form implements a definition context that is similar to the body of a lambda, but variables are internally transformed to support mutually recursive references across unit boundaries.
(unit (import)
      (export)
 (define x 1)
 x)
In this example, (define x 1) is expanded to (define-values (x) 1) with a use-site scope on x, but the intent is for this definition of x to capture the reference at the end of the unit form. If the unit macro simply moved the binding x into a letrec right-hand side, the x would not capture the final x as moved into the letrec body; the use-site scope on the definition’s x would prevent it from capturing the use. The solution is for the unit macro to apply syntax-local-identifier-as-binding to the definition’s x before using it as a letrec binding. Macros that use a definition context and bound-identifier=? must similarly apply syntax-local-identifier-as-binding to identifiers before comparing them with bound-identifier=?.

Even if a macro does not create a first-class definition context, some care is needed if a macro forces the expansion of subforms and moves pieces of the result into binding positions. Such a macro probably should not use syntax-local-identifier-as-binding, but it should first ensure that the macro use is in an expression context before forcing any subform expansions. Otherwise, the subform expansions could interact in unexpected ways with the use-site scopes of an enclosing definition context.

Use-site scopes associated with a first-class definition context are not stored directly in the compile-time environment frame for the definition context. Instead, they are stored in the closest frame that is not for a first-class definition context, so that the scopes are still tracked when the definition context is discarded (when the macro returns, typically). The scope for the definition context itself is similarly recorded in the closest such frame, so that quote-syntax can remove it, just like other binding scopes.

6 Rename Transformers

Racket’s macro API includes support for binding aliases through rename transformers. A compile-time binding to the result of make-rename-transformer is similar to a binding to a macro transformer that replaces the binding’s identifier with the aliased identifier. In addition, however, binding to a rename transformer causes free-identifier=? to report #t for the original identifier and its alias.

With set-of-scopes binding, a binding alias is supported through an extension of the binding table. The mapping from a ⟨symbol, scope set⟩ pair is to a ⟨binding, maybe-aliased⟩ pair, where an maybe-aliased is either empty or another identifier (i.e., a symbol and scope set) to which the mapped identifier should be considered free-identifier=?. When a transformer-binding form such as define-syntax or letrec-syntax detects that the value to be installed for a binding as a rename transformer, it updates the binding table to register the identifier within the transformer as an optional-alias.

The implementation of free-identifier=? must follow alias chains. Cycles are possible, and they cause the aliased identifier to be treated as unbound.

7 Modules and Phases

The module form creates a new scope for its body. More precisely, a module form creates two new scopes: one that roughly reflects “outside edge” of the module, covering everything that is originally in the module body, and one for the “inside edge” of the module, covering everything that appears in the module through macro expansion for forms in the module’s top level. The “inside edge” scope is the one that’s like any definition context, while the “outside edge” scope distinguishes identifiers that had no scopes before being introduced through macro expansion.

A (module* name #f ....) submodule form, where #f indicates that the enclosing module’s bindings should be visible, creates an additional scope in the obvious way. For other module* and module submodule forms, the macro expander prevents access to the enclosing module’s bindings by removing the two scopes of the enclosing module.

A module must distinguish bindings that have the same name and scopes but different phases, and this distinction might be implemented in a couple of ways:
  • Mappings in the binding table might be phase-specific. That is, while we previously said that the binding table maps a ⟨symbol, scope set⟩ pair to a binding, the table domain might actually be a 3-tuple: ⟨symbol, scope set, phase⟩.

  • Scopes might be phase-specific, where a module form introduces a distinct scope for every binding phase to its body, and only the scopes at a given phase are used to locate a reference’s binding.

    Within this possibility, we might consider individual scopes to reside always at a particular phase, or we might consider a syntax object to have a per-phase set of scopes. Those sub-possibilities can be distinguished if the macro system offers phase-shifting and scope-differencing operations (as Racket does).

No matter how we choose to distinguish phases, instantiating a module at a particular phase implies a phase shift in its syntax literals. Consider the module
(define x 1)
(define-for-syntax x 2)
 
(define id #'x)
(define-for-syntax id #'x)
 
(provide id (for-syntax id))
and suppose that the module is imported both normally and for-syntax, the later with a s: prefix. At phase level 1 in the importing module, both id and s:id will be bound to an identifier x that had the same scopes originally, but they should refer to different x bindings (in different module instances with different values).

Among the possibilities for distinguishing phases, having per-phase sets of scopes on an identifier makes the phase-shifting operation most natural. A local binding or macro expansion can add scopes at all phases, while module adds a distinct inside-edge scope to every phase (and the same outside-edge scope to all phases); since every binding within a module is forced to have that module’s phase-specific inside-edge scopes, bindings at different scopes will be appropriately distinguished.

Racket constrains operations that inspect and adjust scopes on syntax objects to those that add, remove, or flip sets of scopes relative to some other syntax object. As a result, all of the phase-specific scopes for a modules inside edge are added to or removed from a syntax object together.

Having a distinct “root” scope for each phase makes most local bindings phase-specific. That is, in
(define-for-syntax x 10)
(let ([x 1])
  (let-syntax ([y x])
    ....))
the x on the right-hand side of let-syntax see the top-level phase-1 x binding, not the phase-0 local binding. This is a change from Racket’s current approach to binding and phases, but the only programs that are affected are ones that would trigger an out-of-context error in the current system. Meanwhile, macros can construct identifiers that have no module scope, so out-of-context errors are still possible.

8 The Top Level

A namespace in Racket is a top-level evaluation context. Each call to eval uses a particular namespace (either the current namespace or one supplied to eval), and each readevalprint loop works in a particular namespace. Namespaces are first-class values in Racket. A namespace can be created as fresh, or it can be extracted from a module instantiation to simulate further evaluation in the module’s body.

As the connection to modules may suggest, a top-level namespace corresponds to a pair of scopes in the same way that a module has a scope. Each top-level namespace has the same outside-edge scope, but a distinct inside-edge scope where bindings reside.

The interactive and incremental nature of a top-level context poses certain semantic challenges when macro and variable definitions and re-definitions are allowed. For example, a reference to an unbound identifier within a function cannot be rejected out-of-hand, because it might be defined later within the namespace before the function is called. Similarly, a reference might be resolved as a variable when a function is created, but a later definition could change the identifier’s binding to a macro, so the function must either continue to refer to a variable or be somehow reinterpreted to have a macro use. These challenges are compounded when macros expand to a mixture of variable and macro definitions. Overall, the top level is hopeless: it cannot provide a treatment of binding that is as consistent as module while also performing its job as an interactive, exploratory evaluation context. In Racket, we accept top-level compromises and put all “real” code in modules.

Fortunately, top-level compromises pose little for set-of-scopes binding. Supporting an incremental and redefinition-capable top-level context requires only that the binding table allow updates of existing bindings, which is straightforward.

An more troublesome aspect of top-level namespaces in Racket is that a form might be captured (via quote-syntax), expanded, or compiled in one namespace, and then evaluated in another namespace. Historically, top-level bindings have been equated with “unbound,” so that expanded and compiled forms originating in a top-level context could move freely among namespaces. This treatment as “unbound” has been fuzzy, however, and forms that originate from module namespaces have been treated differently from forms that originate in a non-module namespace.

To accommodate top-level namespaces with as much consistency (of binding treatment) and convenience (of moving forms among top-level namespaces) as possible, we introduce one more dimension to syntax objects. Instead of having a single set of scopes per phase, each syntax object has a sequence of scope sets per phase. When a syntax object is introduced to a top-level context that is not already included in its scope set (at a gven phase), the current scope set is cloned as a new first item of the list of sets; all further scope-set manipulations affect that first item. When looking up an identifier’s binding, however, the sequence is traversed until a binding is found. In other words, all but the first item in the list act as fallbacks for locating a binding. In practice, this fallback mechanisms is consistent with most existing code without otherwise interfering with scope management (since the fallbacks apply only when an identifier is otherwise unbound).

9 The Syntax-Function Zoo

Compared to Dybvig et al. (1993) or even Flatt et al. (2012), Racket adds many functions for manipulating syntax objects during macro expansion in ways that are sensitive to the expansion context. We have mentioned first-class definition context and rename transformers, but Racket provides many more tools:

As mentioned in First-Class Definition Contexts, a first-class definition context is difficult to specify in terms of renamings. In that case, an internal-definition context is backed by a renaming on syntax objects, but the renaming can refer to itself or other renamings, and so the binding-resolution process must handle a complex form of cycles. With set-of-scopes binding, an internal-definition context is backed by a scope for the context; an internal-definition context doesn’t create cyclic syntax-object structures, and it needs no special rules for resolving references to bindings.

10 Compatibility with the Current Racket Expander

The documentation for Racket’s current macro expansion attempts to avoid references to the underlying mark-and-rename model. As a result, it is often too imprecise to expose differences created by a change to set-of-scope binding. One goal of the new model is to allow the specification and documentation of Racket’s macro expander to be tightened; scope sets are precise enough for specification, but abstract enough to allow high-level reasoning.

Most Racket programs expand and run the same with a set-of-scope expander as before. Hygienic, pattern-based macros are always unaffected (but in the current macro system, a pattern-based macro that expands to a submodule can be effectively non-hygienic; see below). Meanwhile, macros that are intended to manipulate scope in complex ways can expose the difference between the macro systems. For example, the Racket implementations of the splicing forms in racket/splicing must change, and some internal details of the unit, class, and define-generics forms must also change.

Here’s an example that produces 1 with Racket’s current expander, but it provokes an ambiguous-binding error with the set-of-scopes expander:
(define-syntax-rule (define1 id)
 (begin
   (define x 1)
   ; stash a reference to the introduced identifier:
   (define-syntax id #'x)))
 
(define-syntax (use stx)
  (syntax-case stx ()
    [(_ id)
     (with-syntax ([old-id (syntax-local-value #'id)])
       #'(begin
           (define x 2)
           ; reference to old-id ends up ambiguous:
           old-id))]))
 
(define1 foo)
(use foo)
In the set-of-scopes model, define1 and use introduce bindings from two separate macro expansions, and they also arrange for an reference to be introduced by both of those macros, hence the ambiguity. Arguably, in this case, the use macro is broken, as illustrated in a variant of the program without define1 that produces 2 with both expanders:
(begin
  (define x 1)
  (define-syntax foo #'x))
 
(define-syntax (use stx)
  (syntax-case stx ()
    [(_ id)
     (with-syntax ([old-id (syntax-local-value #'id)])
       #'(begin
           (define x 2)
           old-id))]))
 
(use foo)
The use macro can be fixed for both expanders and both contexts by applying syntax-local-introduce to the result of (syntax-local-value #'id), since the identifier conceptually exists outside of this macro’s expansion. Such an application of syntax-local-introduce is typically needed and typically present in existing Racket macros that bring stashed identifiers into a new context.

Here are some specific other ways in with existing Racket code may fail with a set-of-scopes expander:
  • Macros that use explicit internal-definition contexts are among the most likely to need adaptation. As described in First-Class Definition Contexts, such macros typically need to use syntax-local-identifier-as-binding on identifiers that are inspected and manipulated as bindings. Macros that use internal-definition contexts to create unusual binding patterns (e.g., splicing-let-syntax) may need more radical changes, since internal-definition contexts formerly made distinctions among specific identifiers—the ones explicitly registered to create renamings—while the distinction now is more uniform. Some such macros can switch to a simpler creation of a fresh scope (formerly “mark”), while others require a completely different strategy.

  • In the current macro system, if unbound identifiers with the same symbolic name are pulled from different modules into a new one, and if the introducing macros arrange for the identifiers to have no distinct macro-introduction marks (e.g., by using syntax-local-introduce), then either of those identifiers can bind the other (since neither had a binding). With the set-of-scopes system, the two identifiers do no bind each other, since they have different scopes from their original modules.

  • In the current macro system, a module form for a submodule is expanded by first discarding all lexical context. The set-of-scopes expander instead removes only the scope of the enclosing module. As a result, some macros that expand to submodules must more precisely manage their contexts.

    In the current expander, removing all lexical context ensures that no binding outside the module can be referenced directly, but to support re-expansion of the submodule, a property is added on a module to disable context stripping on future expansions and to skip over the module when adding context for an enclosing module. No special treatment is needed for re-expansion in the set-of-scopes expander, but the more limited context stripping means that certain (non-hygienic) submodule-producing macros no longer work.

    For example, the macro
    (define-syntax-rule (gen e)
      (module generated racket/base e))
    currently expands so that racket/base is available for reference by e, but with the set-of-scopes expander, racket/base retains its macro-introduced scope and with not bind the use-site replacement for e.

    At the same time, with the set-of-scopes expander, a macro from one module that expands to a submodule in another module runs the risk of provoking an out-of-context error, since the macro’s module context is not removed form the generated submodule.

  • With the current macro expander, the #%top form is implicitly wrapped around any use of an identifier outside a module when the identifier does not refer to a macro. The new expander uses #%top only for identifiers that have no binding (which makes top-level expansion slightly more consistent with module expansion).

11 Debugging Support

Binding resolution in Racket’s current macro expander is completely opaque to macro implementers. When something goes wrong, the expander can report little more than “unbound identifier” or “out of context”, because the process of replaying renamings and the encodings used for the renamings are difficult to unpack and relate to the programmer.

A set-of-scopes expander is more frequently in a position to report “unbound identifier, but here are the identifier’s scopes, and here are some bindings that are connected to those scopes.” In the case of ambiguous bindings, the expander can report the referencing identifier’s scopes and the scopes of the competing bindings. These details are reported in a way similar to stack traces: subject to optimization and representation choices, and underspecified as a result, but invaluable for debugging purposes.

For example, when placed in a module name example, the ambigious-reference error from Compatibility with the Current Racket Expander produces an error like this one:

x: identifier's binding is ambiguous

  context...:

   #(1772 module) #(1773 module example 0) #(2344 macro) #(2358 macro)

  matching binding...:

   #<module-path-index:()>

   #(1772 module) #(1773 module example 0) #(2344 macro)

  matching binding...:

   #<module-path-index:()>

   #(1772 module) #(1773 module example 0) #(2358 macro)

  in: x

Each scope is printed as a Racket vector, where the vector starts with a number that is distinct for every scope. A symbol afterward provides a hint at the scope’s origin: 'module for a module scope, 'macro for a macro-introduction scope, 'use-site for a macro use-site scope, or 'local for a local binding form. In the case of a 'module scope that corresponds to the inside edge, the module’s name and a phase (since an inside-edge scope is generated for each phase) are shown.

The #<module-path-index:()>s in the error correspond to the binding, and they mean “in this module.” Overall, the message shows that x has scopes corresponding to two different macro expansions, and it’s bound by definitions that were produced by the expansions separately.

12 Model

Below is a model in the style of Flatt et al. (2012). The model is comparable to a model that includes support for internal-definition contexts, but the resolve metafunction is dramatically simplified, even compared to a model without internal-definition contexts.

Although this model includes pruning within syntax, that pruning is not so interesting, since compile-time code is parsed directly instead of macro-expanded.

image

image

image

image

image

Acknowledgments

Thanks to Matthias Felleisen, Robby Findler, Sam Tobin-Hochstadt, Jay McCarthy, Ryan Culpepper, Michael Adams, Michael Ballantyne, Tim Disney, Simon Peyton Jones, Shriram Krishnamurthi, and Eelco Visser for discussions about set-of-scopes binding.

Bibliography

Michael D. Adams. Towards the Essence of Hygiene. In Proc. Pronciples of Programming Languages, 2015.

Eli Bazilay, Ryan Culpepper, and Matthew Flatt. Keeping it Clean with Syntax Parameters. In Proc. Workshop on Scheme and Functional Programming, 2011.

R. Kent Dybvig, Robert Hieb, and Carl Bruggeman. Syntactic Abstraction in Scheme. Lisp and Symbolic Computation 5(4), 1993.

Matthew Flatt, Ryan Culpepper, Robert Bruce Findler, and David Darais. Macros that Work Together: Compile-Time Bindings, Partial Expansion, and Definition Contexts. Journal of Functional Programming 22(2), 2012.

Andre van Tonder. R6RS Libraries and Macros. 2007. http://www.het.brown.edu/people/andre/macros/index.html