[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: using the PLT scheme lexer



doc.txt is fixed.

Mzscheme's reader will convert the symbols to lower case unless it is
explicitly put into case-sensitive mode.  The #\X notation is treated as
the character X and its case is not altered.  "X" would have the same
effect.

-Scott Owens
 
On Sun, 16 Dec 2001, Michael Vanier wrote:

> 
> OK, my bad.  There was a trivial bug in my code.
> 
> BTW there is a trivial bug in the lexer in collects/parser-tools/doc.txt:
> 
>   (define-lex-abbrevs
>    [initial (: (- a z) (- #\A #\Z) ! $ % & * / : < = > ? ^ _ ~)]
>    [subsequent (: (initial) (digit) + - #\. @)]
>    [digit (- #\0 #\9)]
>    [comment (@ #\; (^ #\newline) #\newline)])
> 
> should be
> 
>   (define-lex-abbrevs
>    [initial (: (- a z) (- #\A #\Z) ! $ % & * / : < = > ? ^ _ ~)]
>    [subsequent (: (initial) (digit) + - #\. @)]
>    [digit (- #\0 #\9)]
>    [comment (@ #\; (* (^ #\newline)) #\newline)])
> 
> Also, I'm curious as to why some characters have to be written with escape
> sequences (e.g. #\A) and some don't (e.g. a, z, !, $ etc.).  Is there an
> automatic conversion to lower case which is overridden by the #\ ?
> 
> Mike
> 
> 
> 
> > Date: Sun, 16 Dec 2001 19:53:52 -0700 (MST)
> > From: Scott Owens <sowens@cs.utah.edu>
> > 
> > The intended behavior is to match the longest possible token, so this
> > would indeed be a bug.  However, I have been unable to reproduce it
> > as follows:
> > 
> > Welcome to MzScheme version 200alpha4, Copyright (c) 1995-2001 PLT
> > Note: readline loaded
> > > (require (lib "lex.ss" "parser-tools"))
> > >
> > > (define l
> >     (lex
> >      ("-" 1)
> >      ("42" 2)
> >      ("-42" 3)))
> > > (define in (make-lex-buf (current-input-port)))
> > > (l in)
> > -42
> > 3
> > 
> > Thus a more detailed description of how to reproduce the bug would be
> > useful.  (Just send me an e-mail, don't submit a bug report)
> > 
> > -Scott Owens
> > 
> > On Sun, 16 Dec 2001, Michael Vanier wrote:
> > 
> > > 
> > > Playing around with the example lexer, it appears as if the lexer will
> > > always return the first matching token, even if that token is the prefix to
> > > a longer token which is also a valid match.  For instance, if you have two
> > > token categories, one of which matches "-" (e.g. a symbol) and one of which
> > > matches "-42" (an integer), the lexer matches "-" instead of "-42".  This
> > > is different from standard lex/flex behavior.  Is this a bug?
> > > 
> > > Mike
> > > 
> > > > Date: Thu, 13 Dec 2001 14:12:01 -0700 (MST)
> > > > From: Scott Owens <sowens@cs.utah.edu>
> > > > 
> > > > In general, the return value of the lexer is not restricted.  The token
> > > > structure is provided for interoperation with a parser, but you are free
> > > > to return whatever values you prefer.
> > > > 
> > > > Since the lexer generator is a new tool to v200, I would appreciate any
> > > > suggestions/feedback on you experience using it.
> > > > 
> > > > -Scott Owens
> > > > 
> > > > On Thu, 13 Dec 2001, Michael Vanier wrote:
> > > > 
> > > > > 
> > > > > OK, now that I've found the lexer, here's a simple question.
> > > > > 
> > > > > I can run the lexer to return a single value of type <struct:token>, but I
> > > > > can't figure out how to extract the fields from the struct.  As far as I
> > > > > can tell (and I'm a total newbie with the module system), the token struct
> > > > > type is defined in collects/parser-tools/private-lex/token.ss.  I can't
> > > > > import it directly because it already gets imported when I import the lex
> > > > > library.  However, the token accessor functions do not appear to be
> > > > > exported.  Does that mean that tokens are an opaque data type?  If I can't
> > > > > get the value of the token, the lex module is not usable by me.  I don't
> > > > > want or need to use the yacc module.  Sorry for my utter cluelessness.
> > > > > 
> > > > > Mike
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
>