8 Languages
The #lang that starts a Racket-program file determines what the rest of the file means. Specifically, the identifier immediately after #lang selects an meaning for the rest of the file, and it gets control at the character level. The only constraint on a #lang’s meaning is that it denotes a Racket module that can be referenced using the file’s path. That module is obligated to define certain things to star the language’s implementation.
8.1 From Modules to Languages
It turns out that you can write the primitive module form directly in DrRacket. If you leave out any #lang line and write
(module example racket/base (#%module-begin (+ 1 2)))
then it’s the same as
#lang racket/base (+ 1 2)
and if you write the latter form, then it essentially turns into the former form. Both forms have the same (+ 1 2) because #lang racket uses the native syntax for the module body.
Technically, there’s a difference in intent in the above two chunks of text showing programs. In the second case witth #lang, the parentheses are meant as actual parenthesis characters that reside in a file. In the first case with module, the parentheses are just a way to write a text representation of the actual value, which is a syntax object that contains a lists of syntax objects that contain symbols, and so on. A language implementation has to actually parse the parentheses in the second block of code to produce the first.
Let’s define a language for running external programs that supports the run form and nothing else. We’ll define pfsh so that
#lang pfsh (whoami) (ls -l) (echo Hello)
corresponds to
(module example pfsh (#%module-begin (whoami) (ls -l) (echo Hello)))
(module example "pfsh.rkt" (#%module-begin (whoami) (ls -l) (echo Hello)))
Without creating a "pfsh.rkt" file, copy the #lang s-exp "pfsh.rkt" example into DrRacket and click the Macro Stepper button. The stepper will immediately error, since there’s no "pfsh.rkt" module, but it will show you the parsed form.
8.2 The Core module Form
Module | = |
| |||||
| |
|
For a module that comes from a file, the name turns out to be ignored, because the file path acts as the actual module name. The key part is initial-import-module. The module named by initial-import-module gives meaning to some set of identifiers that can be used in the module body. There are absolutely no pre-defined identifiers for the body of a module. Even things like lambda or #%module-begin must be exported by initial-import-module if they are going to be used in the module body’s forms.
If require is provided by initial-import-module, then it can be used to pull in additional names for use by forms. If there’s no way to get at require, define, or other binding forms from the exports of initial-import-module, then nothing but the exports of initial-import-module will ever be available to the forms.
Since every module for has an explicit or implicit #%module-begin, initial-import-module had better provide #%module-begin. If a language should allow the same sort of definition-or-expression sequence as racket, then it can just re-export #%module-begin from racket. As we will see, there are some other implicit forms, all of which start with #%, and initial-import-module must provide those forms if they’re going to be triggered.
"simple.rkt"
#lang racket/base (provide #%module-begin)
8.3 Modules and Renaming
Let’s create a variant "pfsh0.rkt" that has a run form to run an external program:
#lang s-exp "pfsh0.rkt" (run ls -l)
Since that’s equivalent to
(module example "pfsh0.rkt" (#%module-begin (run ls -l)))
then we need to create a "pfsh0.rkt" module that provides #%module-begin and run. The run macro’s job is to treat its identifiers as strings and deliver them to the run function that we defined in "run.rkt":
"pfsh0.rkt"
#lang racket/base (require "run.rkt" (for-syntax racket/base syntax/parse)) (provide #%module-begin (rename-out [pfsh:run run])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))
We’ve wrapped void around the call to run to suppress the success or failure boolean that would otherwise print after the run program’s output.
8.4 Adjusting #%module-begin
The biggest difference between the pfsh that we’ve implemented and the pfsh that we want is that we have to put run before every program name. Instead of (run ls), we want to write (ls).
Since macros can do any kind of work at compile time, you might imagine changing pfsh so that it scans the filesystem and builds up a set of definitions based on the programs that are currently available via the PATH environment variable. That’s not how scripting languages are meant to work, though. Also, it’s likely to cause trouble to use the filesystem and environment-variable state at such a fine granularity to determine bindings of a module.
Another possibility is to change #%module-begin so that it takes every form in the module and adds run to the front:
"pfsh1.rkt"
#lang racket/base (require "run.rkt" (for-syntax racket/base syntax/parse)) (provide (rename-out [pfsh:module-begin #%module-begin] [pfsh:run run])) (define-syntax (pfsh:module-begin stx) (syntax-parse stx [(_ (prog:id arg:id ...) ...) #'(#%module-begin (pfsh:run prog arg ...) ...)])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))
Notice that pfsh:module-begin adds pfsh:run to the start of each body form, not just run. That’s because it wants to insert a reference to run as provided by "pfsh1.rkt", and that form is called pfsh:run in the environment of the pfsh:module-begin implementation.
Meanwhile, there’s probably no point to exporting run, since it can never referenced directly in a module that is implemented with #lang s-exp "pfsh1.rkt".
8.5 The Application Form
While adjusting [#%module-begin] works for "pfsh1.rkt", it’s not a very composable approach. If we later want to support more kinds of forms in the module body, we have to change pfsh:module-begin to recognize each of them.
Instead, we would like to change the default meaning of parentheses. In Racket, a pair of parentheses mean a function call by default. In pfsh, a pair of parentheses should mean running an external program by default. The “by default” part concedes that an identifier after an open parenthesis can change the meaning of the parenthesis, such as when define appears after an open parenthesis. Otherwise, though, it’s as if a function-call identifier appears after the open parenthesis to specify a function-call form... and function-call exists, except that it’s spelled #%app.
(+ 1 2)
"pfsh2.rkt"
#lang racket/base (require "run.rkt" (for-syntax racket/base syntax/parse)) (provide #%module-begin (rename-out [pfsh:run #%app])) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))
8.6 Installing a Language
Let’s take the last step in defining a language, which will let use switch from #lang s-exp "pfsh2.rkt" to #lang pfsh. To enable writing #lang pfsh, we must do two things:
Adjust our language implementation so that it explicitly specifies S-expression parsing, instead of having S-expression parsing imposed externally.
Install our language as a package so that #lang pfsh will work from anywhere.
The part of a language that specifies its parsing from characters to syntax objects is called a reader. A language’s reader is implemented by a reader submodule (i.e., a nested module) inside the language’s module. That submodule must export a read-syntax function that takes an input port, reads characters from it, and constructs a module form as a syntax object. For historical reasons, the submodule should also provide a read function that does the same thing but returns a plain S-expression instead of a syntax object.
(module reader racket (provide (rename-out [pfsh:read-syntax read-syntax] [pfsh:read read])) (define (pfsh:read-syntax name in) (datum->syntax #f `(module anything pfsh (#%module-begin ,@(read-body name in))))) (define (read-body name in) (define e (read-syntax name in)) (if (eof-object? e) '() (cons e (read-body name in)))) (define (pfsh:read in) (syntax->datum (pfsh:read-syntax 'src in))))
(module reader syntax/module-reader pfsh)
In short, we just need to add those two lines to our current pfsh implementation, and then save it as "main.rkt" in a "pfsh" directory. Here’s the complete implementation:
#lang racket/base (require "run.rkt" (for-syntax racket/base syntax/parse)) (provide #%module-begin (rename-out [pfsh:run #%app])) (module reader syntax/module-reader pfsh) (define-syntax (pfsh:run stx) (syntax-parse stx [(_ prog:id arg:id ...) #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))
You’ll also need "run.rkt" in the same "pfsh" directory.
After either of those steps, you can run
#lang pfsh (echo Hello!)
8.7 Exercises
Start with either "pfsh2.rkt" or "pfsh/main.rkt", depending on whether you want to install "pfsh" as a package.
Our pfsh language so far only allows run as an application form. Change pfsh to supply all of racket/base, but with #%app as pfsh:run.
A module can export everything that it imported from another module using (all-from-out module) in provide. Also, for any provide-spec within provide, (except-out provide-spec id ...) is the same as provide-spec, but omitting the ids.
After updating pfsh, this program should work and print 'done after listing files:
#lang pfsh (define done-sym 'done) (ls -l) done-sym After the previous exercise, nearly all of racket/base is available—
but difficult to use, because function application is always turned into a shell command. A more useful language would treat a parenthesized form as a shell command only if the function position is an unbound identifier. Simple pattern matching can distinguish identifiers from non-identifiers, but determining binding requires more help from the macro expander. The identifier-binding function provides that cooperation; it takes an identifier and returns #f if it is not bound, and it returns a non-#f value if the identifier is bound.
Change pfsh to treat a parenthesized form as a shell command only if the function position is an unbound identifier, so the following program works to list all files twice:
#lang pfsh (define (list-all) (ls -l)) (list-all) (list-all) To make pfsh more like other shell languages, it would be nice if
ls -l
with no parentheses would list all files. One way to make that work is to say that parentheses are implicit when a source line has multiple forms all on the same line, without any parentheses around the source line. With that rule, then
whoami ls -l (whoami) (ls -l) would work to run whoami twice and list files twice (interleaved).
We could implement this rule by writing a character-level parser, but it turns out that syntax objects from the default S-expression have enough information to implement implicit parentheses.
Change pfsh to allow implicit parentheses by changing #%module-begin to add them. You can determine when syntax objects are on the same line by using the syntax-line function. You can infer that parentheses are present when syntax-e produces a value for which pair? produces a true value.
Changing #%module-begin isn’t really the right idea, as we explore in the next exercise, but try this bad idea, first.
The problem with adding implicit parentheses in #%module-begin is that it confuses two layers: The existence of tokens on the same line is properly a reader-level decision, since it’s about sequences of characters.
As an illustration of the problem, consider this program:#lang racket/base (require (for-syntax syntax/parse syntax/strip-context)) (define-syntax (main-submodule stx) (syntax-parse stx [(_ word) #:with word (strip-context #'word) #'(module main pfsh echo word)])) (main-submodule hello) There’s a little subtlety here in making hello have the right binding by using strip-context, but making it have the right source location to line up with echo would be much more trouble.The racket/base language is not set up for implicit parentheses, so a better and more consistent strategy for pfsh is to make implicit parentheses part of the reader. Move your strategy in the previous exercise from #%module-begin to the reader. You can use #:wrapper1 in syntax/module-reader to adjust the result that the reader would otherwise return.