Modules that Split Expansion-Time and Run-Time Code [informal proposal] -------------------------------------------------------------- Goal ---- The goal of this proposal is to define a language for declaring code modules used to construct programs. The code is to be written in languages with the same lexical and data conventions as R5RS Scheme, including --- but not limited to --- R5RS Scheme itself. Each module is split into two parts: * its interface, which can affect the denotation of other module and interface bodies; and * its implementation, which cannot affect the denotation of other module or interface bodies. This split enables mutual import dependencies among module implementations (but not interfaces). Language -------- A program consists of a collection of module interface and implementation declarations, plus a designation of a "main" module. All Scheme code exists within a module. The code to be executed at run-time is the body of the main module, the body of all modules that it uses, an so on (i.e., the transitive closure of the module-use relation). Every module consists of two parts: an interface declaration and an implementation declaration. The two parts are connected by a shared module name. (interface M ...) ;; describes syntactic extensions and exported ;; variables associated with M (module M ...) ;; describes the implementation associated with M, ;; including definitions for the exported variables To clarify, the `interface' and `module' forms are *not* extensions to Scheme. This proposal describes an entirely new language, whose only top-level forms are `interface', `module', and `program'. Will's `counter' example, from the Waddell-Dybvig-based proposal, would read as follows in the proposed language: ; >> All examples are untested << (interface counter (uses export syntax-rules) ; (extends R5RS) (define-syntax count (syntax-rules () ((count) (begin (set! c (+ c 1)) c)))) (export-private c)) (module counter (uses R5RS) ; (define c 0)) (interface pseudo-random (uses export) ; (export rand)) (module pseudo-random (uses R5RS) ; (define (rand n) (set! x (modulo (+ (* a x) c) m)) (modulo (quotient x 8) n)) (define a 701) (define x 1) (define c 743483) (define m 524288)) (interface TheProgram (uses export) ; (export myrandom)) (module TheProgram (uses R5RS counter pseudo-random) ; (define (myrandom) (define m 1000) (modulo (+ (rand 1000) (count)) m)) ;; ... uses of myrandom ... ) (program TheProgram) A typical R5RS Scheme programs

can be converted to a program in this language as follows: (interface TheProgram (uses) ;; No syntax, no exports ) (module TheProgram (uses R5RS)

;; the original program, unchanged ) (program TheProgram) but only if

does not define any identifier multiple times, does not define any identifier pre-defined by R5RS, and does not use either of the `interaction-environment' or `load' procedures (which are optional in R5RS). [The appropriate handling of these procedures is not clear to me. As syntactic forms, they could be made sensible: `load' might act as #include, and `interaction-environment' might grab the bindings in the enclosing module.] The language is designed to be especially friendly to compilers and project management tools, supporting both separate compilation and mutual dependencies in a straightforward way. It is also designed to support languages more restrictive than R5RS (e.g., pedagogical subsets of Scheme) and "towers of languages". It does not seem particularly friendly to REPLs, though Will's notes in the Waddell-Dybvig-based proposal should apply here, too. An implementation might choose to enrich the set of top-level forms with the usual Scheme forms, plus an `open' form to make a module's variables and syntax visible at the top level. However, the scope of such top-level variables and syntax should never include the body of `interface' or `module' declarations; doing so would violate the spirit of this proposal, and encourage unportable code. Syntax ------ = * (program ) = (interface (uses ...) ) | (module (uses ...) ...) = | ( : ) ; adds : ; to the beginning ; of all names = ... ; depends on context = ... ; depends on context Semantics --------- Module Implementations - - - - - - - - - - - When a module `uses' another, it gets the (public) exported variable bindings of the other module, as well as the (public) syntax bindings declared in the other module's interface. No other bindings are available, except ones that are defined directly within the module implementation. Thus, while the lexical structure of a `module' declaration is fixed to be that of Scheme (i.e., S-expressions), the set of syntax and procedures that apply in the body of a module is determined completely by the `uses' clause of the module (and the relevant interface declarations). [Note a module does not get the syntax bindings from its own interface, and it cannot `use' itself.] Imported variable bindings refer to locations, not values, so that mutations to a variable are visible to all importers of the variable. A variable may be mutated only in the module that defines the variable. [Debatable, as Will notes in the other proposal.] All variables in a module must be bound; an unbound variable is a syntax error. A module cannot declare a variable twice, or declare a variable with the same name as an imported variable or syntactic form. Module implementations may be mutually dependent; i.e., two modules may `use' each other. The use relationship implies a partial ordering of evaluation at run-time: * If module A uses [transitively] module B, and module B does not use [transitively] module A, then module B's definitions and commands will be executed before A's definition and commands. * The definitions and commands of mutually dependent modules are executed in an unspecified order. [If it matters, we could split `uses' into two different forms, a la Bigloo: one that requires prior initialization of the used module, and one that does not.] The bodies of modules that are never used (in the transitive closure of uses, starting from the designated main module) are not executed at run-time. Module Interfaces - - - - - - - - - Just as for `module', the `uses' clause of an `interface' declaration determines the syntax available within the interface. At least two modules must be built into an implementation: `syntax-rules' and `export'. The `syntax-rules' module provides syntax for `extends' and `define-syntax', as illustrated below. The `export' language provides syntax for declaring module exports. Both of these built-in languages must somehow collude with the language implementation to work (or they must elaborate to some lower-level module that colludes with the language implementation). With `syntax-rules', when an interface publically `extends' another, variable and syntax bindings get added to any module that `uses' the extending interface. As a consequence, macro templates defined in the extending interface can refer to variables and syntax in the extended interface. When an interface `extends-private' another, variable and syntax bindings from the extended interface may appear *only* in macro templates within the extending interface. Private extension is useful for defining restrictive languages: (interface LambdaCalculus (uses syntax-rules) ; (extends-private (r5 : R5RS)) ; (define-syntax lambda (syntax-rules () [(lambda (x) E) (r5:lambda (x) E)])) ; (define-syntax #application ;; Weird hack: we assume ;; that the expander converts ;; ( ...) ;; to ;; (#application ...) ;; so we can define its meaning. ;; Any better solution? (syntax-rules () [(#application E1 E2) (r5:#application E1 E2)]))) (module X (uses LambdaCalculus) ; ;; Only single-argument lambda ;; and single-argument application ;; can appear in this module. Anything ;; else is a syntax error. ((lambda (x) (x x)) (lambda (x) (x x)))) Macro definitions in an `interface' expression using `syntax-rules' are refentially transparent: `extends' and `export' declarations introduce variables into the lexical environment of the macro definition, so that variable and syntax references in a macro template refer to the corresponding exported and extended variables and syntax. In the same way that module implementations are required to be closed, module interfaces are also required to be closed, including variables that appear in macro templates. Note that the "counter" interface at the beginning of this proposal was forced to extend the R5RS module (either publically or privately). If it did not, then `begin', `set!', `+', and function application would be undefined in the macro template. When an interface `extends' another, it *does not* get the syntax bindings for direct use in expanding the `interface' expression itself. An interface `uses' another interface to get such bindings. When an interface M1 `uses' M2, the variable and syntax bindings of M2 are *not* visible to users of M1. Thus, `extends' does not imply `uses', and `uses' does not imply `extends'. The following example illustrates the difference between `uses' and `extends': (interface MacroDefiningMacro (uses syntax-rules) ; (extends-private R5RS) ; ;; A macro `M' that expands to the definition of a macro `MM'. ;; Thus, the `M' macro might be suitable for use within an ;; interface or within a module implementation. (define-syntax M (syntax-rules () [(M) (define-syntax MM (syntax-rules () [(MM) 5]))]))) ; ---------------------------------------- (interface MExtender (uses syntax-rules) ; (extends-private R5RS) (extends MacroDefiningMacro) ; <<<<<<< ; ; (M) ;; <- would be a syntax error. ; ;; (N) expands to (M) in a user of MExtender (define-syntax N (syntax-rules () [(N) (M)]))) (module X (uses R5RS MExtender) ; (M) ;; expands to def of MM macro; ; (N) ;; <- would do the same thing (display (MM))) ;; displays 5 ; ---------------------------------------- (interface MMDefiner (uses syntax-rules MacroDefiningMacro) ; <<<<<< ; (M)) ;; Expands to the definition of MM (module X (uses R5RS MMDefiner) ; (display (MM)) ;; displays 5 ; (M) ;; <- would be a syntax error, unbound M ) [Tower of languages:] A `uses' clause in an interface implies the execution of one or more module bodies at *expansion* time. In an implementation that supplies, say, `syntax-case', the exports of the used module are then available in the using interface for syntax transformer definitions. (interface SyntaxHelper (uses syntax-rules) ; (export helper)) (module SyntaxHelper (uses syntax-case) ; (define helper (lambda (x) ...))) (interface MyComplexSyntax (uses syntax-case SyntaxHelper) ; (define-syntax my-complex-form (lambda (x) (helper x)))) (module X (uses MyComplexSyntax) ; (my-complex-form ...)) In such cases, the module(s) executed at expansion time are (conceptually) executed anew every time a module is expanded. Thus, the `uses' mechanism does not provide a way to communicate arbitrary values across applications of the expander to different modules. Furthermore, if a module is used by both module interfaces and implementations, the run-time instance of the module is distinct from all expansion-time instances. In contrast to `uses' in module implementations, the `uses' relation among interfaces must be acyclic. The `extends' relation may be cyclic, however. Minimal Bindings and Built-in Modules - - - - - - - - - - - - - - - - - - - All forms allowable within an `interface' or `module' body must be defined by a `used' module. Thus, in (module X (uses) ) the must be empty; there is no legal form. Similarly, in (interface X (uses) ) the must also be empty. Certain standard module names must be built-in to all implementations. Most of the above `interface' examples assume a `syntax-rules' module which supplies `define-syntax' and `export'. It could also provide `define-private-syntax' and `export-private', declaring macros and variables to be used by macros within the interface, only. For interfaces using `syntax-rules': = (define-syntax ) | (define-private-syntax ) | (export ...) | (export-private ...) | ( *) [Although I used the name `syntax-rules' because I thought it would make the examples easier to read, it is probably not the best name for the module that defines syntax for declaring exports, etc.] Similarly, the name `R5RS' would probably be built-in, especially for use by module implementations. In that case, would be the of R5RS. If the `R5RS' module is built in, then we may ask what the following `interface' expression means: (interface X (uses R5RS) ...) The rules of the language say that expressions appearing in place of the "..." get evaluated at compile time. This is conceivably useful for displaying expansion-time messages to the user. A consequence of this design is that a compiler for this module language would have to have an interpreter for the module language built into it! For most Scheme systems, this is probably no big deal, but it's worth noting. If an implementation provides on `syntax-case' as it's only built-in syntax-binding mechanism, then it would be reasonable to ignore non-macro definitions in `interface' expressions. The compiling programmer would notice, but there's no way a user of the resulting executable could tell the difference. Issues ------ * Is it too much trouble to require an `interface' and `module' declaration for every module? [I don't think so, but users may disagree.] * Is all this power useful? [I like very much the idea of supplying a portable implementation of DrScheme's teaching languages. Might require `syntax-case', though.] * The proposed syntax isn't especially nice. It would be better to have something less verbose for the common case. (Is the common case "uses R5RS and self" for implementations, and "uses syntax-rules, extends publically R5RS" for interfaces?)