[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: using the PLT scheme lexer




OK, my bad.  There was a trivial bug in my code.

BTW there is a trivial bug in the lexer in collects/parser-tools/doc.txt:

  (define-lex-abbrevs
   [initial (: (- a z) (- #\A #\Z) ! $ % & * / : < = > ? ^ _ ~)]
   [subsequent (: (initial) (digit) + - #\. @)]
   [digit (- #\0 #\9)]
   [comment (@ #\; (^ #\newline) #\newline)])

should be

  (define-lex-abbrevs
   [initial (: (- a z) (- #\A #\Z) ! $ % & * / : < = > ? ^ _ ~)]
   [subsequent (: (initial) (digit) + - #\. @)]
   [digit (- #\0 #\9)]
   [comment (@ #\; (* (^ #\newline)) #\newline)])

Also, I'm curious as to why some characters have to be written with escape
sequences (e.g. #\A) and some don't (e.g. a, z, !, $ etc.).  Is there an
automatic conversion to lower case which is overridden by the #\ ?

Mike



> Date: Sun, 16 Dec 2001 19:53:52 -0700 (MST)
> From: Scott Owens <sowens@cs.utah.edu>
> 
> The intended behavior is to match the longest possible token, so this
> would indeed be a bug.  However, I have been unable to reproduce it
> as follows:
> 
> Welcome to MzScheme version 200alpha4, Copyright (c) 1995-2001 PLT
> Note: readline loaded
> > (require (lib "lex.ss" "parser-tools"))
> >
> > (define l
>     (lex
>      ("-" 1)
>      ("42" 2)
>      ("-42" 3)))
> > (define in (make-lex-buf (current-input-port)))
> > (l in)
> -42
> 3
> 
> Thus a more detailed description of how to reproduce the bug would be
> useful.  (Just send me an e-mail, don't submit a bug report)
> 
> -Scott Owens
> 
> On Sun, 16 Dec 2001, Michael Vanier wrote:
> 
> > 
> > Playing around with the example lexer, it appears as if the lexer will
> > always return the first matching token, even if that token is the prefix to
> > a longer token which is also a valid match.  For instance, if you have two
> > token categories, one of which matches "-" (e.g. a symbol) and one of which
> > matches "-42" (an integer), the lexer matches "-" instead of "-42".  This
> > is different from standard lex/flex behavior.  Is this a bug?
> > 
> > Mike
> > 
> > > Date: Thu, 13 Dec 2001 14:12:01 -0700 (MST)
> > > From: Scott Owens <sowens@cs.utah.edu>
> > > 
> > > In general, the return value of the lexer is not restricted.  The token
> > > structure is provided for interoperation with a parser, but you are free
> > > to return whatever values you prefer.
> > > 
> > > Since the lexer generator is a new tool to v200, I would appreciate any
> > > suggestions/feedback on you experience using it.
> > > 
> > > -Scott Owens
> > > 
> > > On Thu, 13 Dec 2001, Michael Vanier wrote:
> > > 
> > > > 
> > > > OK, now that I've found the lexer, here's a simple question.
> > > > 
> > > > I can run the lexer to return a single value of type <struct:token>, but I
> > > > can't figure out how to extract the fields from the struct.  As far as I
> > > > can tell (and I'm a total newbie with the module system), the token struct
> > > > type is defined in collects/parser-tools/private-lex/token.ss.  I can't
> > > > import it directly because it already gets imported when I import the lex
> > > > library.  However, the token accessor functions do not appear to be
> > > > exported.  Does that mean that tokens are an opaque data type?  If I can't
> > > > get the value of the token, the lex module is not usable by me.  I don't
> > > > want or need to use the yacc module.  Sorry for my utter cluelessness.
> > > > 
> > > > Mike
> > > > 
> > > > 
> > > 
> > > 
> > 
> 
>