[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about CMUCL type inference



On the cmucl-help@cons.org mailing list, on Thu, Aug 16, 2001 at
07:17:10PM +0200, Nicolas Neuss wrote:

> In a discussion on the PLT Scheme mailing list about type inference I
> asked Shriram Krishnamurti about the relation of his ideas to CMUCL
> type inference:
> 
> http://www.cs.utah.edu/plt/mailarch/plt-scheme-2001/msg00799.html
> 
> He answers (not yet in the archive, so I repeat it here):
> 
> ----------------------------------------------------------------
> Nicolas Neuss wrote:
> 
> > You certainly know the type inference which CMUCL is using.
> 
> No, I don't.  That is, I know it's there, and I've seen examples of
> it, but I don't understand how it works.  (I've read the manual.  This
> gives me several examples, but no spec.  Maybe the manual has changed
> since I last read it?)
> 
> I don't know what mechanism they use to implement type inference, and
> how it stays tractable in the presence of union types and range types
> and so forth.  I don't fully understand "dynamic type inference"; it
> looks more like "static flow analysis" to me.  Certainly what they
> accomplish looks impressive -- eg, if-splitting followed by
> optimizations -- but how?
> 
> Surely someone on this list can point me to a technical paper that
> explains exactly what they do?  I'd be much obliged.
> 
> Shriram
> 
> ----------------------------------------------------------------
> 
> I guess people on this list are most suited to answer his question.
> Could someone do so - best cc-ing also to
> plt-scheme@fast.cs.utah.edu ?

I'm not aware of anything published about the CMU CL type inference
system.

I'm not sure what "dynamic type inference" would mean, and I agree
that what's going on looks like (my compiler non-guru's idea of)
static flow analysis. Most of the type information seems to flow from
  * type declarations in the user source code (of course)
  * type declarations built into the CMU CL system (mostly with the
    DEFKNOWN macro)
  * type inference based on argument types, mostly through things
    defined with the DERIVE-TYPE variant of the DEFOPTIMIZER macro
Then the type information is used by various compiler transforms
and primitive operations which have type constraints on their
operands.

I don't think there's a single simple answer to the question about how
the type inference stays tractable in the presence of union types and
so forth. Two parts of the answer are:
  1. The algebra of types is implemented in a hand-rolled object
     system. The methods for some operations (e.g. type intersections,
     and to some extent type unions) punt, representing the result of 
     the operation as a type which is sort of opaque, so that all
     further operations on it yield other opaque types, or 
     "don't know" answers to boolean queries. This punting tends to
     limit the complexity of nonopaque types. If you want to see how
     it does it in a particular case, you can search the sources
     for uses of the macro DEFINE-TYPE-METHOD, which is used to
     define the methods. Unfortunately the hand-rolled object system
     is (in my opinion) a mite confusing. (I wish it were CLOS, but
     since I'm pretty sure it predates CLOS by many years, that's
     of course unreasonable.) Still, you can see a lot of what's going
     on by looking at the methods without understanding the method
     combination system in any detail.
  2. I think the optimizer uses some sort of cutoff to stop refining
     type information after a certain number of passes. I can't give
     you a citation for this (to code or documentation), but I
     think I've seen it somewhere, and it seems to explain why
     the system doesn't want to optimize some complicated code for
     me, and finding and changing this limiting behavior is (somewhere
     fairly far down) on my to-do list. I suspect that this may be
     important in order to keep the optimizer from getting stuck
     in problems where it refines e.g. (RATIONAL 0 1/2) to 
     (RATIONAL 0 1/4) on the first pass, then to (RATIONAL 0 1/8)
     on the next pass, then to (RATIONAL 0 1/16) on the next pass,
     and so on ad infinitum without being brilliant enough to jump
     to the limit (RATIONAL 0 0).
My answer to #1 is based on a fairly good understanding of the type
system, since I rewrote the implementation of type intersections for
the SBCL fork of CMU CL. My answer to #2 is, obviously, based on much
more superficial knowledge of the optimizer.

The system claims to adhere the principle that "declarations are
assertions", and in fact mostly does, so it seems as though it would
be nice match to Scheme R4RS, which is the last version I'm familiar
with. (In Scheme, you could just run the principle backwards,
optimizing code downstream from a type check/assertion with type
knowledge based on the fact that the type check succeeded.)

-- 
William Harold Newman <william.newman@airmail.net>
"Three hours a day will produce as much as a man ought to write."
  -- Anthony Trollope
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C  B9 25 FB EE E0 C3 E5 7C