[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MzScheme Special Form -> Native Numerical Code?



Matthias-

Sounds neat! Love to see it. This is also one way of addressing the
issue Greg Pettyjohn raised in his previous message about a "safe" way
to crash and burn.
Sounds like your students must be having fun.

However, I might point out that there is still no substitute for being
able to really "crash and burn" at machine speeds to keep you on your
toes. The nice thing about a system like Forth (when used as an OS) is
that it reboots so quickly. Perhaps Scheme->MachineCode might be most
interesting for students on the presumably quick rebooting
MzScheme-on-OSKit version?

Also, since I don't know your virtual instruction set you use, assuming
it is not an actual one, perhaps student projects could be to make
simplified instruction subset simulators for the most common processors
MzScheme runs on, with special emphasis for numerical handing? :-) Then
they are just a step away from really generating native code.

If you are not aware of this already, you might want to check out:
  http://www.eecs.harvard.edu/~nr/toolkit/
"The New Jersey Machine-Code Toolkit helps programmers write
applications that process machine code---assemblers, disassemblers, code
generators, tracers, profilers, and debuggers. The toolkit lets
programmers encode and decode machine instructions symbolically.
Encoding and decoding are automated based on compact specifications. The
toolkit is a joint project of Mary Fernández and Norman Ramsey."

This is a very large effort and consists of code, data, and several
hundred
of pages of documentation. It includes specifications for MIPS, SPARC,
Alpha, PPC, and Pentium processors. There are extensive tools with
source to take those specifications to generate code (right now in C and
Modula-3) for machine language interpreters, disassemblers, and more.
The toolkit is written in a combination of Icon and Standard ML.

I've been through the NJML toolkit source and tools some, and there is a
lot there to work through. I do think there is a lot there worth
learning -- it is a very elegant approach to representing the
functionality of CPUs. I never actually tried to use the tools to
generate code though.  

Not that much of the kit doesn't involve generating binary code so much
as generating code in other high level languages that can disassemble
and assemble instructions. However, if you are working directly in a
language like Scheme, many of the ideas of abstractly representing
machine architectures might still be applicable, along with the work to
actually encode specific machine architectures in that representation.
The New Jersey Machine-Code Toolkit is in ML & Icon, but why not use
some of the same ideas in MzScheme?
(As an alternative, one could add a library for the NJML toolkit to
ouput MzScheme to automatically generate tools, which is probably what
the NJML toolkit authors might prefer, but somehow I'd rather have
MzScheme interpreting the processor architecture specifications
directly.)

The educational analogy I would use for giving MzScheme for example a
native-code disassembler is that it is like giving MzScheme users a
"telescope" for looking at the computing world around them (as well as a
"microscope" for examining MzScheme's own VM/EXE code). From there it
might be a short(?) step to having MzScheme assemble (and run) its own
machine code for those platforms.

I corresponded with one of the authors a year or so ago about using NJML
for dynamic compilation in Squeak, and here is part of what Norman wrote
(in reply to another person's comments on the Squeak list):
>  > Now I understand better.  I think that what Alan, Paul, and the other
>  > Squeak folk would like to see is probably the ability to generate the VM or
>  > Plugins through the toolkit, or even to do Just-In-Time native code
>  > compilation through the toolkit. 
> 
> This would be great, but be warned that you need lots more than the
> Toolkit to get as far as dynamic compilation.  The Toolkit really only
> helps you get the last mile: when you already know which machine
> instructions you want, it will spit out the right binaries for you.
> 
> We are doing some work in compilation that might lead to something a
> little more interesting.  I imagine something like the following:
> 
>   1) Begin with the Squeak subset that is currently translated to C.
> 
>   2) Translate it to register transfer lists.
> 
>   3) Massage the register transfer lists such that each one can
>      be represented as a single instruction on the target machine.
> 
>   4) Map the RTLs to machine instructions
> 
>   5) Emit the binary
> 
> I've got a student who worked on item 4 last semester who is in the
> middle of his PhD qualifying exam.  Once he passes (optimism!) I'm
> thinking of encouraging him to look into item 3 as a problem that
> might lead to a thesis (it also has applications in binary translation
> and compiler generation).  I'd be interested to know what you all
> think of this idea.
> 
> It might make an interesting proof of concept to go right from item 1
> to machine-dependent RTLs (or even machine instructions).

I haven't kept up with the latest work on it though.

-Paul Fernhout
Kurtz-Fernhout Software 
=========================================================
Developers of custom software and educational simulations
Creators of the Garden with Insight(TM) garden simulator
http://www.kurtz-fernhout.com

Matthias Felleisen wrote:
> 
>   By the way, the educational motivation for generating and calling native
>   code from MzScheme is that then TeachScheme! can then be easily extended
>   into classes for advanced students on low level machine architecture.
>   Students could learn how a real processor works by poking at it through
>   MzScheme interface functions. Need to add a new language level then --
>   "crash and burn horribly". :-) But as I've heard it said, "If you can't
>   crash it, you're not the one doing the driving..."
> 
> I have developed a module on studying machines within the TeachScheme!
> framework. At Rice students write an assembler for a machine simulator
> and extend the machine simulator by the end of the first semester. It's all
> a safe way to crash and burn -- w/i DrScheme.
> 
> The chapter will appear as additional material on our HtDP web site
> when things calm down.
> 
> -- Matthias