Racket-on-Chez Status:   January 2018 [extended version]
Approach
Current Functionality
Performance:   Benchmark Sanity Check
Performance:   Startup
Performance:   Expander and Compiler
Performance:   Distribution Build
Changes to Chez Scheme
Outlook and Plans

Racket-on-Chez Status: January 2018 [extended version]

Matthew Flatt
with thanks to collaborators: Kent Dybvig, Andy Keep, Gustavo Massaccesi, Sarah Spall,
  Sam Tobin-Hochstadt, and Jon Zeppieri

Unless youre especially interested in implementation details, see the shorter version of the report on the Racket blog. Additions here are marked with “[extended].”

It’s been almost a year since we decided to try running Racket on Chez Scheme, and it has been almost six months since my last status report. As of RacketCon 2017, DrRacket could start up on Chez Scheme (but just barely, and just in time for the demo).

The current implementation is now considerably more complete. DrRacket still doesn’t run well, but Racket-on-Chez can build a full Racket distribution in reasonable time and space.

With the current Racket-on-Chez compilation strategy, runtime performance is plausible on traditional benchmarks, but cross-module optimization is needed to bring the results fully in line with our expectations. Startup and load times are much slower than we would like. Compile/build times are a factor of 4 slower than the current version of Racket.

Glossary of implementations:
  • Racket the current Racket release, version 6.x

  • racket7 mostly the same as Racket, but using a new, Racket-implemented macro expander; the source is in the https://github.com/racket/racket7 repo

  • Chez Scheme a completely different implementation of Scheme that we’re trying to use as a replacement for part of Racket

  • Racket-on-Chez Chez Scheme plus additional layers to make it implement the same language as Racket; the source is also in the https://github.com/racket/racket7 repo

Approach

Our overall goal is to improve Racket’s implementation and make it more portable to different runtime systems. To a first approximation, improving Racket and making it more portable means “less C code.” Building on a Chez Scheme is a promising means toward that end.

The picture on the right illustrates the current Racket distribution. The parts in blue are implemented in Racket, while the part in red is implemented in C. The green block is Scribble documentation source.

 

The numbers next to each block are a rough lines-of-code counts, and the blocks are scaled to those counts. The collects block represents the content of the main "collects" directory; along with the core block, it’s roughly the code in a Minimal Racket distribution. The pkgs block is the part of the main distribution that is available in separate packages but bundled with the main distribution. The docs block is spread across the same packages as the pkgs code.

 

Porting Racket to Chez Scheme is all about the core part. The goal is to make no changes to existing Racket and Scribble code, although we’ll inevitably have to make small changes to support multiple Racket implementations.

 

image

Let’s zoom into the core part. (From now on, all the pictures on the right are at this scale.) The picture breaks the C code in Racket’s current implementation into several major subsystems. In reality, the implementation is not organized into such tidy layers, but the picture is conceptually close to the truth.

 

The large builtins block implements primitive data structures and their operations, such as numbers (fixnums, bignums, flonums, exact rationals, complex numbers), strings, characters, byte strings, lists, vectors, and hash tables. Most other blocks are self-explanatory.

 

The expander block implements the macro expander, reader, and module system. More than any other part of the current stack, the expander part is better implemented in Racket.

 

image

That was exactly the work of 2016: building a new expander to replace the C-implemented expander. You can use that racket7 variant of Racket now by building from the http://github/racket/racket7 repo.

 

This new expander hasn’t made it into the release, because a few interactions haven’t been sorted out: the macro stepper, the demodularizer, and to some degree the decompiler. The expander also runs at about half the speed of the C-implemented expander, and that performance difference was part of the motivation to investigate a new runtime system.

 

Although the compiler+JIT block is drawn the same as before in this picture, it’s a slightly different compiler, because it doesn’t know about modules or syntax objects. Instead, the compiler works in terms of linklets, which are lambda-like blocks of code that import and export variables. That separation of expansion/modules from compilation helps enable the transition to a different compiler and runtime system.

 

image

And that brings us to Chez Scheme. The most obvious feature in the picture on the right is that Chez Scheme’s C-implemented kernel is a fraction of the implementation, while most of Chez Scheme is more sensibly written in Scheme.

 

Chez Scheme’s expander is considerably simpler than Racket’s. Although Chez Scheme’s expander is hygienic and hugely influential on Racket’s expander, Racket’s macro and module system does a lot more.

 

Chez Scheme’s compiler, meanwhile, does a lot more than Racket’s compiler+JIT. The builtins functionality in Chez Scheme is similar to Racket’s builtins.

 

Chez Scheme’s GC (at the bottom) is tiny and elegant. It lacks some functionality of Racket’s GC, but mostly it lacks Racket’s historical complexity for trying to cooperate directly with C.

 

image

That brings us, finally, to the Racket-on-Chez picture. It starts with the Chez Scheme stack and adds a rumble block, which extends Chez Scheme to match the current Racket’s GC through control+structs blocks.

 

Mostly, rumble is about immutable hash tables, delimited continuations, applicable structures, structure types properties, and impersonators/chaperones. We’ve given this layer of functionality the name Rumble, because it’s useful to have a name for the language that hosts other layers that are built using Racket.

 

The threads, io, and regexp blocks are entirely new implementations of the corresponding blocks in the current Racket implementation. The rktio block is used by io, and it’s the same as in current Racket: a portability layer over OS facilities that is similar to (but more extensive than) code that is in Chez Scheme’s kernel.

 

image

The expander block (the same as for racket7) sits on a small schemify adaptor. The schemify layer converts Racket linklets into plain Scheme functions, adjusting expressions as needed to support applicable structures, to enforce left-to-right evaluation, and to implement Racket’s allocation model for local functions (i.e., lift to avoid closure allocation whenever possible).

Naturally, the Racket-on-Chez implementation is bootstrapped by applying the expander and schemify layers to themselves and the other Racket-implemented parts of the stack, and then compiling the schemified linklets using the Chez Scheme compiler. A thin main driver on top handles command-line arguments to load modules and start the readevalprint loop.

Current Functionality

DrRacket starts up in Chez Scheme, and just starting DrRacket exercises many Racket language constructs and libraries. The largest Racket-on-Chez demonstration to date is building all of the packages and documentation of the main Racket distribution. Building packages and documentation, again, covers more Racket functionality than you may expect, because our macros do interesting things, such as typechecking and generating bitmap images. Also, the build process is too long and complex to tolerate leaks or major inefficiencies.

Still, many pieces are buggy, and closing the gap is mostly a matter of plowing through the Racket test suites.

A few large pieces are still missing:[extended]

Besides unsupported functionality, there are currently a few incompatibilities with Racket. Most incompatibilities are in the foreign-function interface. The following is an incomplete list, but it illustrates the kinds of differences that remain.

Performance: Benchmark Sanity Check

The overall compilation story is not yet right in Racket-on-Chez, partly because there’s no cross-module optimization. Functions like map, memq, and assv need cross-module optimization to be compiled well in Racket-on-Chez. Benchmarks sometimes do not rely on cross-module optimization, however, and preliminary benchmark measurements can be useful as a sanity check on the ways that schemify mangles individual modules.

Benchmarks are in the racket-benchmark package whose source is in the racket7 repo. The benchmarks run in Chez Scheme as R6RS libraries. The difference between Racket and racket7 is in the noise.

The table below shows a handful of rough benchmark results. You should not take the results seriously, but they line up with my overall impression of where we are.

 

  

Rough and unscientific benchmark numbers

  

 

 

  

msecs

  

deriv

  

dynamic

  

sboyer

  

maze

  

nfa

  

 

 

  

Chez Scheme

  

770

  

image

  

360

  

image

  

1270

  

image

  

930

  

image

  

2100

  

image

  

 

 

  

Racket-on-Chez

  

1560

  

image

  

1030

  

image

  

1470

  

image

  

1520

  

image

  

2200

  

image

  

 

 

  

Racket

  

2030

  

image

  

560

  

image

  

2230

  

image

  

1330

  

image

  

4100

  

image

  

 

All benchmarks were run in safe mode and without Gustavo’s work-in-progress addition to the Chez Scheme compiler that can eliminate some runtime checks and related dead code.

Performance: Startup

Startup time for Racket-on-Chez is determined by Chez Scheme’s base startup time (to load boot files) plus time to load the expander and other parts of the Racket-on-Chez stack. The load times shown below for racket/base and racket include the startup time and add the additional time needed to load the many compiled files that are parts of those libraries.

For Racket and racket7, bytecode parsing can be delayed at the granularity of a procedure, so that a procedure’s bytecode isn’t parsed until the procedure is called. Delayed parsing is enabled by default, and the -d flag disables it.

 

  

Startup/load time

  

 

 

  

msecs

  

startup

  

-l racket/base

  

-l racket

  

 

 

  

Racket

  

16

  

image

  

50

  

image

  

220

  

image

  

 

 

  

Racket -d

  

16

  

image

  

80

  

image

  

500

  

image

  

 

 

  

racket7

  

50

  

image

  

110

  

image

  

340

  

image

  

 

 

  

racket7 -d

  

50

  

image

  

190

  

image

  

650

  

image

  

 

 

  

Chez Scheme

  

  

image

  

170

  

image

  

  

image

  

 

 

  

Racket-on-Chez

  

440

  

image

  

550

  

image

  

1200

  

image

  

 

 

  

Racket-on-Chez/JIT

  

440

  

image

  

550

  

image

  

1600

  

image

  

 

Compared to the current Racket version, starting up and loading takes longer in racket7, because racket7 has to load embedded bytecode for the expander’s implementation, and because the bytecode encoding of linklets has not been refined as much.

The Chez Scheme row in the table doesn’t actually load racket/base, but that’s the closest reasonable column. That row corresponds to just starting Chez Scheme with its compiler—so it’s a signficantly smaller language than racket/base, but it’s more than the Racket startup rows. It’s also a lower bound on the Racket-on-Chez startup time.

Loading Racket-on-Chez takes significantly longer, still, due to larger and slower-to-load machine code for both the Racket-on-Chez stack and libraries compiled with it. Although individual linklets within a module (roughly, individual phases) are parsed on demand, there is no delayed parsing at the granularity of procedures, so the work performed by loading is closer to the Racket -d and racket7 -d rows.

The last row above, Racket-on-Chez/JIT, refers to a variant of Racket-on-Chez that does not compile whole linklets. Instead, it keeps individual lambdas in source form until called, and it compiles each called lambda on demand. Load times go up, mostly because code is compiled as needed, but also because the source-fragment representation is even larger than machine code right now. The Racket-on-Chez/JIT approach does not look the most promising, so far, but it’s also not crazy. With more effort and tuning (especially in the representation of source fragments), it may be a viable approach and more in line with the way the current Racket implementation works.

Clearly, load time is a direction for further effort.

Performance: Expander and Compiler

The expander runs at about the same speed in Racket-on-Chez and racket7, but the linklet-compilation step takes much longer in Racket-on-Chez. As a result, compilation time for Racket-on-Chez dominates expansion proper for a task like racket -cl racket/base.

Reported times in this section use a Racket-on-Chez stack that is compiled in unsafe mode and run on a 2013 MacBook Pro 2.6GHz Core i5. Unsafe mode improves expand and read performance by about 10%.

 

 

Steps within racket -cl racket/base

 

 

 

 

msecs

 

expand

 

schemify

 

compile

 

eval

 

read

 

 

 

 

racket7

 

2511

 

image

 

 

image

 

662

 

image

 

235

 

image

 

275

 

image

 

 

 

 

Racket-on-Chez

 

2398

 

image

 

917

 

image

 

2852

 

image

 

390

 

image

 

227

 

image

 

 

 

 

Racket-on-Chez/JIT

 

2441

 

image

 

974

 

image

 

1044

 

image

 

515

 

image

 

229

 

image

 

 

For Racket-on-Chez/JIT, compile times go down, showing that about 1/3 of the code that is compiled during expansion is also run during expansion. The fraction would be lower for most programs; the racket/base library is unusual in building up layers of macros to implement itself. The schemify step is certainly a target for improvement, since its job is much simpler than the compile step’s job.

[extended]Racket-on-Chez compile times are longer than racket7 compile times because the Chez Scheme compiler does more. About half of the Racket compiler’s work corresponds to Chez Scheme’s cp0 pass, and Racket’s compiler/JIT lacks a general-purpose register allocator. Here’s a table of cumulative times spent in Chez Scheme compiler passes while loading racket/base from source (summing roughly to the 2852 millisecond total in the previous table, although there’s noise so that the numbers won’t match exactly):

 

  

Chez Scheme compiler passes during racket -cl racket/base

  

 

  

 

 

  

  

msecs

  

 

 

  

cpletrec

  

35

  

image

  

 

 

  

np-expand-primitives

  

37

  

image

  

 

 

  

np-place-overflow-and-trap

  

40

  

image

  

 

 

  

np-optimize-pred-in-value

  

46

  

image

  

 

 

  

np-expose-allocation-pointer

  

48

  

image

  

 

 

  

np-impose-calling-conventions

  

51

  

image

  

 

 

  

expose-overflow-check-blocks!

  

56

  

image

  

 

 

  

np-expand-hand-coded

  

60

  

image

  

 

 

  

do-live-analysis! *

  

87

  

image

  

 

 

  

assign-new-frame! *

  

87

  

image

  

 

 

  

np-expose-basic-blocks

  

101

  

image

  

 

 

  

assign-registers! *

  

105

  

image

  

 

 

  

do-unspillable-conflict! *

  

116

  

image

  

 

 

  

finalize-frame-locations! *

  

116

  

image

  

 

 

  

do-spillable-conflict! *

  

128

  

image

  

 

 

  

finalize-register-locations! *

  

130

  

image

  

 

 

  

expand

  

149

  

image

  

 

 

  

cp0

  

200

  

image

  

 

 

  

select-instructions!

  

304

  

image

  

 

 

  

np-generate-code

  

351

  

image

  

 

 

  

other short passes total

  

504

  

image

  

 

The steps marked with * are primarily about register allocation, and those total to 770 milliseconds.

That factor-of-2 slowdown relative to racket7 is compounded by the Racket-implemented expander being twice as slow as the current Racket’s C-implemented expander:

 

  

Loading from source with racket -cl

  

 

 

  

msecs

  

racket/base

  

racket

  

 

 

  

Racket

  

1830

  

image

  

21700

  

image

  

 

 

  

racket7

  

3490

  

image

  

38140

  

image

  

 

 

  

Racket-on-Chez

  

7060

  

image

  

70093

  

image

  

 

 

  

Racket-on-Chez/JIT

  

5400

  

image

  

55050

  

image

  

 

The gap between the Racket-implemented expander and the C-implemented expander is the one that we most hoped to close by moving to Chez Scheme. As part of the Racket-on-Chez stack, the expander is already compiled in a reasonable way (i.e., no cross-module optimization issues), so it’s not clear how to improve. Still, I still think the expander can be made to run faster in Chez Scheme, and we’ll keep trying.

Performance: Distribution Build

The current version of Racket-on-Chez builds on a 64-bit machine with a peak memory use around 1.25 GB, which is on a similar scale to the current Racket release’s peak memory use around 1 GB.

The following plots show, on the same scale, memory use over time when building a distribution. Each plot has two black lines (easiest to distinguish in the last plot): the top black line describes peak memory use, since it’s before a major GC, while the bottom black line is closer to live-memory use, since it’s after a major GC. The orange region is for compiling libraries (each vertical line starts a collection), the blue region is for running documentation, the green region is for rendering documentation, and the final pink region is for re-rendering documentation as needed to reach a fix-point for cross references. Memory use is expected to grow modestly during the blue region, since the builds collecting cross-reference information for use in the green region. Memory use should grow minimally in the orange and green regions (as a side-effect of caches).

These graphs report builds on a Linux 3.4GHz Core i7-2600 machine with the Racket parts of the Racket-on-Chez stack compiled in safe mode.

Racket

C-implemented expander

 

 

racket7

Racket-implemented expander

 

 

Racket-on-Chez

Same expander as racket7

 

The plots illustrate that, while memory use is similar, build times are much longer. Expander and compiler performance largely explains the difference.

The build time certainly can be improved some. Maybe build times can be improved significantly, or maybe slower build times will seem worthwhile for a more maintainable implementation of Racket and/or for faster generated code.

Changes to Chez Scheme

[extended] There are many good virtual machines in the world, but none starts remotely as close to Racket as Chez Scheme—because they’re both Schemes, and because Racket historically has taken so many cues from Chez Scheme (if not executed them as well). Still, we expected to make modest changes to Chez Scheme to fully support Racket.

Currently, running Racket-on-Chez requires the fork at http://github.com/mflatt/ChezScheme. With one exception that will be remedied eventually, all of the changes in the fork have been submitted as pull requests at http://github.com/cisco/ChezScheme. Nine or so pull requests have been merged, and eight or so are currently open. In addition, Kent and Andy invested significant effort to improve the register allocator to help support Racket.

For the record, here is a summary of the main changes:

Outlook and Plans

It would be nice if porting to Chez Scheme made every aspect of Racket magically faster. It hasn’t done that, but we have plenty of room for improvement; the performance results to date are a lower bound on performance, not an upper bound.

Keep in mind that the original goal is not to have a faster Racket, but a better-implemented Racket with acceptable performance. The Racket-on-Chez implementation is far more maintainable and flexible than the current Racket implementation, so maybe we’re half-way there after one year of work.

Some immediate next steps are clear: [extended]

That’s a couple of months’ more work, at least.

Suppose that the result has good runtime performance compared to the current Racket implementation, but it’s still slower to compile and load by about a factor of 4. (I think that’s a fairly likely result, given the current trajectory.) Then what? I’m not sure, but I’ll keep trying and give it the rest of the second year.

You can try Racket-on-Chez from https://github.com/racket/racket7/. See racket/src/cs/README.txt in that repo for more information.