On this page:
8.1 From Modules to Languages
8.2 The Core module Form
8.3 Modules and Renaming
8.4 Adjusting #%module-begin
8.5 The Application Form
8.6 Installing a Language
8.7 Exercises
7.4.0.4

8 Languages

The #lang that starts a Racket-program file determines what the rest of the file means. Specifically, the identifier immediately after #lang selects an meaning for the rest of the file, and it gets control at the character level. The only constraint on a #lang’s meaning is that it denotes a Racket module that can be referenced using the file’s path. That module is obligated to define certain things to star the language’s implementation.

8.1 From Modules to Languages

It turns out that you can write the primitive module form directly in DrRacket. If you leave out any #lang line and write

(module example racket/base
  (#%module-begin
    (+ 1 2)))

then it’s the same as

#lang racket/base
(+ 1 2)

and if you write the latter form, then it essentially turns into the former form. Both forms have the same (+ 1 2) because #lang racket uses the native syntax for the module body.

Technically, there’s a difference in intent in the above two chunks of text showing programs. In the second case witth #lang, the parentheses are meant as actual parenthesis characters that reside in a file. In the first case with module, the parentheses are just a way to write a text representation of the actual value, which is a syntax object that contains a lists of syntax objects that contain symbols, and so on. A language implementation has to actually parse the parentheses in the second block of code to produce the first.

Let’s define a language for running external programs that supports the run form and nothing else. We’ll define pfsh so that

#lang pfsh
(whoami)
(ls -l)
(echo Hello)

corresponds to

(module example pfsh
  (#%module-begin
   (whoami)
   (ls -l)
   (echo Hello)))

For now, we don’t want to bother parsing at the level of parentheses, so we’ll actually write

"example.rkt"

#lang s-exp "pfsh.rkt"
(whoami)
(ls -l)
(echo Hello)
The s-exp language doesn’t do anything but parse parentheses into syntax objects. For this example, it directly generates the syntax object
(module example "pfsh.rkt"
  (#%module-begin
   (whoami)
   (ls -l)
   (echo Hello)))

Without creating a "pfsh.rkt" file, copy the #lang s-exp "pfsh.rkt" example into DrRacket and click the Macro Stepper button. The stepper will immediately error, since there’s no "pfsh.rkt" module, but it will show you the parsed form.

which is half-way to where we want to be: the whoami and echo syntax objects are still here to be expanded by macros, but we no longer have to worry about parsing characters. (The change from pfsh to "pfsh.rkt" just lets us work with relative paths, for now, instead of installing a pfsh collection.)

8.2 The Core module Form

The core module grammar is
  Module = 
(module name initial-import-module
  (#%module-begin
    form ...))
  | 
(module _name _initial-import-module
    form ...)
The second variant is a shorthand for the first, and it is automatically converted to the first variant by adding #%module-begin.

For a module that comes from a file, the name turns out to be ignored, because the file path acts as the actual module name. The key part is initial-import-module. The module named by initial-import-module gives meaning to some set of identifiers that can be used in the module body. There are absolutely no pre-defined identifiers for the body of a module. Even things like lambda or #%module-begin must be exported by initial-import-module if they are going to be used in the module body’s forms.

If require is provided by initial-import-module, then it can be used to pull in additional names for use by forms. If there’s no way to get at require, define, or other binding forms from the exports of initial-import-module, then nothing but the exports of initial-import-module will ever be available to the forms.

Since every module for has an explicit or implicit #%module-begin, initial-import-module had better provide #%module-begin. If a language should allow the same sort of definition-or-expression sequence as racket, then it can just re-export #%module-begin from racket. As we will see, there are some other implicit forms, all of which start with #%, and initial-import-module must provide those forms if they’re going to be triggered.

Here is the simplest possible Racket language module:
Since "simple.rkt" provides #%module-begin, it’s a valid initial import. You can use it in the empty program

"use-simple.rkt"

#lang s-exp "simple.rkt"
as long as "use-simple.rkt" is saved in the same directory as "simple.rkt" (so that the relative path works). You can add comments after the #lang line, since comments are stripped away by the parser. Nothing else in the body is going to work, though. Actually, (#%module-begin) will work, since #%module-begin is bound and since s-exp relies on the implicit introduction of #%module-begin instead of adding it explicitly. That’s a flaw in s-exp.

8.3 Modules and Renaming

Let’s create a variant "pfsh0.rkt" that has a run form to run an external program:

#lang s-exp "pfsh0.rkt"
(run ls -l)

Since that’s equivalent to

(module example "pfsh0.rkt"
  (#%module-begin
   (run ls -l)))

then we need to create a "pfsh0.rkt" module that provides #%module-begin and run. The run macro’s job is to treat its identifiers as strings and deliver them to the run function that we defined in "run.rkt":

"pfsh0.rkt"

#lang racket/base
(require "run.rkt"
         (for-syntax racket/base
                     syntax/parse))
 
(provide #%module-begin
         (rename-out [pfsh:run run]))
 
(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

We’ve wrapped void around the call to run to suppress the success or failure boolean that would otherwise print after the run program’s output.

8.4 Adjusting #%module-begin

The biggest difference between the pfsh that we’ve implemented and the pfsh that we want is that we have to put run before every program name. Instead of (run ls), we want to write (ls).

Since macros can do any kind of work at compile time, you might imagine changing pfsh so that it scans the filesystem and builds up a set of definitions based on the programs that are currently available via the PATH environment variable. That’s not how scripting languages are meant to work, though. Also, it’s likely to cause trouble to use the filesystem and environment-variable state at such a fine granularity to determine bindings of a module.

Another possibility is to change #%module-begin so that it takes every form in the module and adds run to the front:

"pfsh1.rkt"

#lang racket/base
(require "run.rkt"
         (for-syntax racket/base
                     syntax/parse))
 
(provide (rename-out [pfsh:module-begin #%module-begin]
                     [pfsh:run run]))
 
(define-syntax (pfsh:module-begin stx)
  (syntax-parse stx
    [(_ (prog:id arg:id ...) ...)
     #'(#%module-begin
        (pfsh:run prog arg ...)
        ...)]))
 
(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

Notice that pfsh:module-begin adds pfsh:run to the start of each body form, not just run. That’s because it wants to insert a reference to run as provided by "pfsh1.rkt", and that form is called pfsh:run in the environment of the pfsh:module-begin implementation.

Meanwhile, there’s probably no point to exporting run, since it can never referenced directly in a module that is implemented with #lang s-exp "pfsh1.rkt".

8.5 The Application Form

While adjusting [#%module-begin] works for "pfsh1.rkt", it’s not a very composable approach. If we later want to support more kinds of forms in the module body, we have to change pfsh:module-begin to recognize each of them.

Instead, we would like to change the default meaning of parentheses. In Racket, a pair of parentheses mean a function call by default. In pfsh, a pair of parentheses should mean running an external program by default. The “by default” part concedes that an identifier after an open parenthesis can change the meaning of the parenthesis, such as when define appears after an open parenthesis. Otherwise, though, it’s as if a function-call identifier appears after the open parenthesis to specify a function-call form... and function-call exists, except that it’s spelled #%app.

In other words, in the racket language, when you write

(+ 1 2)

since + is not bound as a macro or core syntactic form, that expands to

(#%app + 1 2)

The #%app provided by racket is defined as a macro that expands to the core syntactic form for function calls. That core form is also called #%app internally, but in the rare case that we have to refer to the core form, we use the alias #%plain-app.

To change the default meaning of parentheses for pfsh, then, we can rename pfsh:run to #%app on export:

"pfsh2.rkt"

#lang racket/base
(require "run.rkt"
         (for-syntax racket/base
                     syntax/parse))
 
(provide #%module-begin
         (rename-out [pfsh:run #%app]))
 
(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))
After that small adjustment, we conceptually change each run in a pfsh module to #%app, but we don’t actually have to write the #%app, since it’s added automatically by the expander:

"use-pfsh2.rkt"

#lang s-exp "pfsh2.rkt"
(whoami)
(ls -l)
(echo Hello)

8.6 Installing a Language

Let’s take the last step in defining a language, which will let use switch from #lang s-exp "pfsh2.rkt" to #lang pfsh. To enable writing #lang pfsh, we must do two things:

The part of a language that specifies its parsing from characters to syntax objects is called a reader. A language’s reader is implemented by a reader submodule (i.e., a nested module) inside the language’s module. That submodule must export a read-syntax function that takes an input port, reads characters from it, and constructs a module form as a syntax object. For historical reasons, the submodule should also provide a read function that does the same thing but returns a plain S-expression instead of a syntax object.

Here’s one way to implement the reader submodule:
(module reader racket
  (provide (rename-out [pfsh:read-syntax read-syntax]
                       [pfsh:read read]))
 
  (define (pfsh:read-syntax name in)
    (datum->syntax #f `(module anything pfsh
                         (#%module-begin
                          ,@(read-body name in)))))
 
  (define (read-body name in)
    (define e (read-syntax name in))
    (if (eof-object? e)
        '()
        (cons e (read-body name in))))
 
  (define (pfsh:read in)
    (syntax->datum (pfsh:read-syntax 'src in))))

Notice that pfsh:read-syntax constructs a module that uses pfsh as the initial import. Otherwise, it doesn’t really do anything specific to pfsh, and most of the work is performed by the built-in read-syntax function that reads a single term (such an an identifier or parenthesized form) as a syntax object. In fact, since this pattern is so common, Racket provides a syntax/module-reader language that expects just the pfsh part and builds the rest of the submodule around that. #;
(module reader syntax/module-reader
  pfsh)

In short, we just need to add those two lines to our current pfsh implementation, and then save it as "main.rkt" in a "pfsh" directory. Here’s the complete implementation:

"pfsh/main.rkt"

#lang racket/base
(require "run.rkt"
         (for-syntax racket/base
                     syntax/parse))
 
(provide #%module-begin
         (rename-out [pfsh:run #%app]))
 
(module reader syntax/module-reader
  pfsh)
 
(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

You’ll also need "run.rkt" in the same "pfsh" directory.

To install this as a package, select Install Package... from the DrRacket File menu, click the Browse button to select a Directory, and select the "pfsh" directory. Alternatively, run
  raco pkg install pfsh/
on the command line—and beware that the trailing slash is necessary (otherwise, raco pkg will consult a remote server to look for a registered pfsh package).

After either of those steps, you can run

#lang pfsh
(echo Hello!)

8.7 Exercises

Start with either "pfsh2.rkt" or "pfsh/main.rkt", depending on whether you want to install "pfsh" as a package.

  1. Our pfsh language so far only allows run as an application form. Change pfsh to supply all of racket/base, but with #%app as pfsh:run.

    A module can export everything that it imported from another module using (all-from-out module) in provide. Also, for any provide-spec within provide, (except-out provide-spec id ...) is the same as provide-spec, but omitting the ids.

    After updating pfsh, this program should work and print 'done after listing files:

    #lang pfsh
     
    (define done-sym 'done)
    (ls -l)
    done-sym
  2. After the previous exercise, nearly all of racket/base is available—but difficult to use, because function application is always turned into a shell command.

    A more useful language would treat a parenthesized form as a shell command only if the function position is an unbound identifier. Simple pattern matching can distinguish identifiers from non-identifiers, but determining binding requires more help from the macro expander. The identifier-binding function provides that cooperation; it takes an identifier and returns #f if it is not bound, and it returns a non-#f value if the identifier is bound.

    Change pfsh to treat a parenthesized form as a shell command only if the function position is an unbound identifier, so the following program works to list all files twice:

    #lang pfsh
     
    (define (list-all)
      (ls -l))
     
    (list-all)
    (list-all)
  3. To make pfsh more like other shell languages, it would be nice if

    ls -l

    with no parentheses would list all files. One way to make that work is to say that parentheses are implicit when a source line has multiple forms all on the same line, without any parentheses around the source line. With that rule, then

    whoami
    ls -l
    (whoami)
    (ls -l)

    would work to run whoami twice and list files twice (interleaved).

    We could implement this rule by writing a character-level parser, but it turns out that syntax objects from the default S-expression have enough information to implement implicit parentheses.

    Change pfsh to allow implicit parentheses by changing #%module-begin to add them. You can determine when syntax objects are on the same line by using the syntax-line function. You can infer that parentheses are present when syntax-e produces a value for which pair? produces a true value.

    Changing #%module-begin isn’t really the right idea, as we explore in the next exercise, but try this bad idea, first.

  4. The problem with adding implicit parentheses in #%module-begin is that it confuses two layers: The existence of tokens on the same line is properly a reader-level decision, since it’s about sequences of characters.

    As an illustration of the problem, consider this program:
    #lang racket/base
    (require (for-syntax syntax/parse
                         syntax/strip-context))
     
    (define-syntax (main-submodule stx)
      (syntax-parse stx
        [(_ word)
         #:with word (strip-context #'word)
         #'(module main pfsh
             echo word)]))
     
    (main-submodule hello)
    There’s a little subtlety here in making hello have the right binding by using strip-context, but making it have the right source location to line up with echo would be much more trouble.

    The racket/base language is not set up for implicit parentheses, so a better and more consistent strategy for pfsh is to make implicit parentheses part of the reader. Move your strategy in the previous exercise from #%module-begin to the reader. You can use #:wrapper1 in syntax/module-reader to adjust the result that the reader would otherwise return.