Modules that Split Expansion-Time and Run-Time Code
[informal proposal]
--------------------------------------------------------------
Goal
----
The goal of this proposal is to define a language for declaring code
modules used to construct programs. The code is to be written in
languages with the same lexical and data conventions as R5RS Scheme,
including --- but not limited to --- R5RS Scheme itself.
Each module is split into two parts:
* its interface, which can affect the denotation of other module and
interface bodies; and
* its implementation, which cannot affect the denotation of other
module or interface bodies.
This split enables mutual import dependencies among module
implementations (but not interfaces).
Language
--------
A program consists of a collection of module interface and
implementation declarations, plus a designation of a "main"
module. All Scheme code exists within a module. The code to be
executed at run-time is the body of the main module, the body of all
modules that it uses, an so on (i.e., the transitive closure of the
module-use relation).
Every module consists of two parts: an interface declaration and an
implementation declaration. The two parts are connected by a shared
module name.
(interface M ...) ;; describes syntactic extensions and exported
;; variables associated with M
(module M ...) ;; describes the implementation associated with M,
;; including definitions for the exported variables
To clarify, the `interface' and `module' forms are *not* extensions to
Scheme. This proposal describes an entirely new language, whose only
top-level forms are `interface', `module', and `program'.
Will's `counter' example, from the Waddell-Dybvig-based proposal,
would read as follows in the proposed language:
; >> All examples are untested <<
(interface counter
(uses export syntax-rules)
;
(extends R5RS)
(define-syntax count
(syntax-rules ()
((count)
(begin (set! c (+ c 1))
c))))
(export-private c))
(module counter
(uses R5RS)
;
(define c 0))
(interface pseudo-random
(uses export)
;
(export rand))
(module pseudo-random
(uses R5RS)
;
(define (rand n)
(set! x (modulo (+ (* a x) c) m))
(modulo (quotient x 8) n))
(define a 701)
(define x 1)
(define c 743483)
(define m 524288))
(interface TheProgram
(uses export)
;
(export myrandom))
(module TheProgram
(uses R5RS counter pseudo-random)
;
(define (myrandom)
(define m 1000)
(modulo (+ (rand 1000) (count)) m))
;; ... uses of myrandom ...
)
(program TheProgram)
A typical R5RS Scheme programs
can be converted to a program in
this language as follows:
(interface TheProgram
(uses)
;; No syntax, no exports
)
(module TheProgram
(uses R5RS)
;; the original program, unchanged
)
(program TheProgram)
but only if
does not define any identifier multiple times, does
not define any identifier pre-defined by R5RS, and does not use either
of the `interaction-environment' or `load' procedures (which are
optional in R5RS). [The appropriate handling of these procedures is
not clear to me. As syntactic forms, they could be made sensible:
`load' might act as #include, and `interaction-environment' might grab
the bindings in the enclosing module.]
The language is designed to be especially friendly to compilers and
project management tools, supporting both separate compilation and
mutual dependencies in a straightforward way. It is also designed to
support languages more restrictive than R5RS (e.g., pedagogical
subsets of Scheme) and "towers of languages".
It does not seem particularly friendly to REPLs, though Will's notes
in the Waddell-Dybvig-based proposal should apply here, too. An
implementation might choose to enrich the set of top-level forms with
the usual Scheme forms, plus an `open' form to make a module's
variables and syntax visible at the top level. However, the scope of
such top-level variables and syntax should never include the body of
`interface' or `module' declarations; doing so would violate the
spirit of this proposal, and encourage unportable code.
Syntax
------
= * (program )
= (interface
(uses ...)
)
| (module
(uses ...)
...)
=
| ( : ) ; adds :
; to the beginning
; of all names
= ... ; depends on context
= ... ; depends on context
Semantics
---------
Module Implementations
- - - - - - - - - - -
When a module `uses' another, it gets the (public) exported variable
bindings of the other module, as well as the (public) syntax bindings
declared in the other module's interface. No other bindings are
available, except ones that are defined directly within the module
implementation. Thus, while the lexical structure of a `module'
declaration is fixed to be that of Scheme (i.e., S-expressions), the
set of syntax and procedures that apply in the body of a module is
determined completely by the `uses' clause of the module (and the
relevant interface declarations). [Note a module does not get the
syntax bindings from its own interface, and it cannot `use' itself.]
Imported variable bindings refer to locations, not values, so that
mutations to a variable are visible to all importers of the variable.
A variable may be mutated only in the module that defines the
variable. [Debatable, as Will notes in the other proposal.] All
variables in a module must be bound; an unbound variable is a syntax
error. A module cannot declare a variable twice, or declare a variable
with the same name as an imported variable or syntactic form.
Module implementations may be mutually dependent; i.e., two modules
may `use' each other. The use relationship implies a partial ordering
of evaluation at run-time:
* If module A uses [transitively] module B, and module B does not use
[transitively] module A, then module B's definitions and commands
will be executed before A's definition and commands.
* The definitions and commands of mutually dependent modules are
executed in an unspecified order. [If it matters, we could split
`uses' into two different forms, a la Bigloo: one that requires
prior initialization of the used module, and one that does not.]
The bodies of modules that are never used (in the transitive closure
of uses, starting from the designated main module) are not executed at
run-time.
Module Interfaces
- - - - - - - - -
Just as for `module', the `uses' clause of an `interface' declaration
determines the syntax available within the interface. At least two
modules must be built into an implementation: `syntax-rules' and
`export'. The `syntax-rules' module provides syntax for `extends' and
`define-syntax', as illustrated below. The `export' language provides
syntax for declaring module exports. Both of these built-in languages
must somehow collude with the language implementation to work (or they
must elaborate to some lower-level module that colludes with the
language implementation).
With `syntax-rules', when an interface publically `extends' another,
variable and syntax bindings get added to any module that `uses' the
extending interface. As a consequence, macro templates defined in the
extending interface can refer to variables and syntax in the extended
interface. When an interface `extends-private' another, variable and
syntax bindings from the extended interface may appear *only* in macro
templates within the extending interface. Private extension is useful
for defining restrictive languages:
(interface LambdaCalculus
(uses syntax-rules)
;
(extends-private (r5 : R5RS))
;
(define-syntax lambda
(syntax-rules ()
[(lambda (x) E) (r5:lambda (x) E)]))
;
(define-syntax #application ;; Weird hack: we assume
;; that the expander converts
;; ( ...)
;; to
;; (#application ...)
;; so we can define its meaning.
;; Any better solution?
(syntax-rules ()
[(#application E1 E2) (r5:#application E1 E2)])))
(module X
(uses LambdaCalculus)
;
;; Only single-argument lambda
;; and single-argument application
;; can appear in this module. Anything
;; else is a syntax error.
((lambda (x) (x x)) (lambda (x) (x x))))
Macro definitions in an `interface' expression using `syntax-rules'
are refentially transparent: `extends' and `export' declarations
introduce variables into the lexical environment of the macro
definition, so that variable and syntax references in a macro template
refer to the corresponding exported and extended variables and syntax.
In the same way that module implementations are required to be closed,
module interfaces are also required to be closed, including variables
that appear in macro templates. Note that the "counter" interface at
the beginning of this proposal was forced to extend the R5RS module
(either publically or privately). If it did not, then `begin', `set!',
`+', and function application would be undefined in the macro
template.
When an interface `extends' another, it *does not* get the syntax
bindings for direct use in expanding the `interface' expression
itself. An interface `uses' another interface to get such bindings.
When an interface M1 `uses' M2, the variable and syntax bindings of M2
are *not* visible to users of M1. Thus, `extends' does not imply
`uses', and `uses' does not imply `extends'.
The following example illustrates the difference between `uses' and
`extends':
(interface MacroDefiningMacro
(uses syntax-rules)
;
(extends-private R5RS)
;
;; A macro `M' that expands to the definition of a macro `MM'.
;; Thus, the `M' macro might be suitable for use within an
;; interface or within a module implementation.
(define-syntax M
(syntax-rules ()
[(M) (define-syntax MM
(syntax-rules ()
[(MM) 5]))])))
; ----------------------------------------
(interface MExtender
(uses syntax-rules)
;
(extends-private R5RS)
(extends MacroDefiningMacro) ; <<<<<<<
;
; (M) ;; <- would be a syntax error.
;
;; (N) expands to (M) in a user of MExtender
(define-syntax N
(syntax-rules ()
[(N) (M)])))
(module X
(uses R5RS MExtender)
;
(M) ;; expands to def of MM macro;
; (N) ;; <- would do the same thing
(display (MM))) ;; displays 5
; ----------------------------------------
(interface MMDefiner
(uses syntax-rules MacroDefiningMacro) ; <<<<<<
;
(M)) ;; Expands to the definition of MM
(module X
(uses R5RS MMDefiner)
;
(display (MM)) ;; displays 5
; (M) ;; <- would be a syntax error, unbound M
)
[Tower of languages:] A `uses' clause in an interface implies the
execution of one or more module bodies at *expansion* time. In an
implementation that supplies, say, `syntax-case', the exports of the
used module are then available in the using interface for syntax
transformer definitions.
(interface SyntaxHelper
(uses syntax-rules)
;
(export helper))
(module SyntaxHelper
(uses syntax-case)
;
(define helper
(lambda (x) ...)))
(interface MyComplexSyntax
(uses syntax-case SyntaxHelper)
;
(define-syntax my-complex-form
(lambda (x)
(helper x))))
(module X
(uses MyComplexSyntax)
;
(my-complex-form ...))
In such cases, the module(s) executed at expansion time are
(conceptually) executed anew every time a module is expanded. Thus,
the `uses' mechanism does not provide a way to communicate arbitrary
values across applications of the expander to different
modules. Furthermore, if a module is used by both module interfaces
and implementations, the run-time instance of the module is distinct
from all expansion-time instances.
In contrast to `uses' in module implementations, the `uses' relation
among interfaces must be acyclic. The `extends' relation may be
cyclic, however.
Minimal Bindings and Built-in Modules
- - - - - - - - - - - - - - - - - - -
All forms allowable within an `interface' or `module' body must be
defined by a `used' module. Thus, in
(module X
(uses)
)
the must be empty; there is no legal form. Similarly, in
(interface X
(uses)
)
the must also be empty.
Certain standard module names must be built-in to all
implementations. Most of the above `interface' examples assume a
`syntax-rules' module which supplies `define-syntax' and `export'. It
could also provide `define-private-syntax' and `export-private',
declaring macros and variables to be used by macros within the
interface, only.
For interfaces using `syntax-rules':
= (define-syntax
)
| (define-private-syntax
)
| (export ...)
| (export-private ...)
| ( *)
[Although I used the name `syntax-rules' because I thought it would
make the examples easier to read, it is probably not the best name for
the module that defines syntax for declaring exports, etc.]
Similarly, the name `R5RS' would probably be built-in, especially for
use by module implementations. In that case, would be the of R5RS.
If the `R5RS' module is built in, then we may ask what the following
`interface' expression means:
(interface X
(uses R5RS)
...)
The rules of the language say that expressions appearing in place of
the "..." get evaluated at compile time. This is conceivably useful
for displaying expansion-time messages to the user.
A consequence of this design is that a compiler for this module
language would have to have an interpreter for the module language
built into it! For most Scheme systems, this is probably no big deal,
but it's worth noting.
If an implementation provides on `syntax-case' as it's only built-in
syntax-binding mechanism, then it would be reasonable to ignore
non-macro definitions in `interface' expressions. The compiling
programmer would notice, but there's no way a user of the resulting
executable could tell the difference.
Issues
------
* Is it too much trouble to require an `interface' and `module'
declaration for every module? [I don't think so, but users may
disagree.]
* Is all this power useful? [I like very much the idea of supplying a
portable implementation of DrScheme's teaching languages. Might
require `syntax-case', though.]
* The proposed syntax isn't especially nice. It would be better to
have something less verbose for the common case. (Is the common
case "uses R5RS and self" for implementations, and "uses
syntax-rules, extends publically R5RS" for interfaces?)