[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: bytecode unification for scripting (and other!) languages



Shriram,

I had previously seen your Seasonal Lisp Machine page, but I had forgotten
that it was yours when I mentioned Lisp chips in my message.  Perhaps I
remembered subconsciously...

I was speaking loosely when I associated compiling to machine code with
interoperability problems.  In theory, there may not be a fundamental
connection there, but practical implementation issues do exist, and these
are reflected in all current languages and tools.

> Machine code vs bytecode is entirely orthogonal to this capability.
> That is simply a question of how code is distributed and run.  A VM
> could just as well dynamically load DLLs or .o files or whatever --
> binaries, in short, that can reside in the same address space.

In fact, this is essentially what Microsoft's COM does.  It doesn't achieve
the desired level of interoperability, though, largely because COM support
is typically "tacked on" to languages, rather than integrated at a low
enough level to make interoperability completely transparent.

One can think of bytecode, in this interoperability context, as a way of
forcing compiler writers to speak a common language.  Compiling to a common
bytecode removes the possiblity that a compiler writer might add some little
twist to the binary output of his compiler, that would cause
interoperability problems with other compilers.

So as with my alleged connection between compiling to machine code and
interoperability problems, there is a similar practical requirement that a
common runtime use a common bytecode, aside from the ease of implementation
reasons you mention.

Anyway, these are minor points.

> CORBA does something much more complex.  It lets programs interoperate
> *across* address spaces, even permitting them to interoperate across
> physical machines and networks.  I don't think its complexity is
> comparable to what we're talking about.

An important feature of CORBA is interoperability between languages.  Using
a common bytecode and runtime addresses this in a slightly different way.
It so happens that using a common bytecode and runtime also simplifies the
issues involved in crossing address spaces.  Java's RMI is an example of
this.  .NET provides a similar capability.  As such, the JVM and .NET both
address the same problem that CORBA was designed to solve, but they do so in
a way that provides better transparency and greater simplicity.

> (2) The real problem here is coming up with a unified bytecode that
> includes enough to please everyone.

Say "unified bytecode and runtime", and I'll agree :)

> What is interoperability?  It's the ability to share DATA between
> programs in different languages.  Data are values in a run-time system
> (at least in value-oriented languages, such as scripting languages).
> So what does it take to share these?  The languages (really, their
> implementations) need to agree on a common space in which to create
> values, a common technique for reclaiming values, and a common manner
> for representing values that can be shared between these languages.

Which brings me right back to my point.  The data that can be shared between
.NET languages is limited to basic data types, and objects which conform to
.NET's object semantics.  This excludes a few data types which I think are
important (and I hope I'm not the only one): closures would be the simplest
and perhaps least controversial example; continuations are another, less
controversial (because of implementation overhead).  Of course, these types
could be implemented in languages which compile to .NET, but that would
limit interoperability between the languages that support these types.

> But please, tell me where the innovation is here.

Perhaps the best summary of the sort of applications I'm interested in are
on the Kali Scheme page (http://www.neci.nec.com/PLS/Kali.html) - borrowing
their bullet list just to provide some representative examples:

* User-level load balancing and migration.
* Incremental distributed linking of code objects.
* Parameterized client-server applications.
* Long-lived parallel computations.
* Distributed data mining.
* Executable content in messages over wide-area networks (e.g. the
World-Wide Web)

A related example can be found in the use of continuations to provide
transparent web server session support by PS3I
(http://youpou.lip6.fr/queinnec/VideoC/ps3i.html).  Another example along
these lines that interests me is the use of continuations to simplify the
coding of systems which use asynchronous I/O (e.g. Unix AIO implementations
or Win32's completion ports).

Although you might argue that these examples have been done already, and
therefore are not innovations, it seems to me that a project which extended
these capabilities to multiple languages, and allowed them to use those
features interoperably, would certainly advance the state of the art,
including in many ways that may not be obvious now.

You might think that this would be a project more suited to the commercial
world than academia; I'm not so sure.  The commercial world will be happy
with .NET and its ilk for some time to come.  That doesn't mean that
something better and more useful isn't possible, or shouldn't be developed.

> Designing a VM is a surprisingly painful and difficult
> experience.  Even all the expertise of Sun and MS haven't dealt with
> the many complex issues that arise (see Bill Pugh's amazing work on
> memory models, for instance) -- and that's even when dealing
> explicitly with only one language (as in the JVM case).  It seems a
> lot more productive to copy the hard work than try to create a new
> context in which *the same work* needs to be done all over again.

I don't dispute the difficulty of the task, and that was part of the point I
was originally trying to make: I agree that using .NET is a pragmatic
choice, but that doesn't mean there aren't benefits to be had by taking
things further.

(I'm sorely tempted to start quoting Kennedy's moon speech, which was
(coincidentally?) delivered at Rice University: "But why, some say, the
moon? Why choose this as our goal? And they may well ask why climb the
highest mountain? Why, 35 years ago, fly the Atlantic? Why does Rice play
Texas?"  Do I even need to mention the bit about "not because they are easy,
but because they are hard"?  ;o)

I wouldn't argue against "embracing and extending" Microsoft's work by
extending the CIL and CLR to support the features I'm talking about.  That
presupposes an open implementation that can be extended in the first place,
though (and raises compatibility issues).  Failing that, perhaps something
along the lines described by Greg Pettyjohn could work - a layer above
CIL/CLR which supports more advanced features.

> I'll tell you where I think the win is going to be: in designing
> domain-specific VMs.  Look at what the Curl folks have done; it's very
> neat work.  Or pick another programming language -- Macromedia Flash.
> (Indeed, Flash is reputed to now be in 97% of the world's browsers.)
> I just don't see "scripting" (which I think is a bogus notion anyway)
> as a driving domain.  You're much more likely to get far by picking
> something interesting and creating a VM for it.  Use a *use*, rather
> than some (particularly in this case, ill-defined) *technology*, as
> the driving force.

The use of the term "scripting" came from Michael Vanier original message.
I contend that the issues here go far beyond scripting.  Real software
projects are increasingly implemented in multiple languages, some of which
may be domain-specific, and these languages have to communicate.  Systems
are also increasingly distributed, raising the cross-address-space and
cross-network requirements.

These requirements that are only weakly addressed by current technology.
The sorts of features that are found in special-purpose language
implementations like Kali and PS3I could really simplify the development of
such systems, and provide greatly improved capabilities to those systems.

Sadly, these features are doomed to remain obscure, as long as they are
confined to use within individual, special-purpose implementations of
particular academic languages.  This isn't just about making these features
available to the commercial world, either - you're concerned about
reinventing wheels, but how much of that goes on in the name of achieving
things that have already been done in the context of some other language?  A
virtual machine/runtime/bytecode/whatever which gave language implementors
access to such features "for free" would be enormously beneficial.

Anton