[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regexp arcana



At 12:41 PM -0700 9/28/00, Greg Pettyjohn wrote:
>OK, this may be a little off-topic, but I'll tie it in.
>
>(define (do-s s)
>   (regexp-match "\"([^\"]*)\"(.*)" s))
>

[ ... ]

>So without knowing what the roundy parens do in the regexp, let me attempt
>to figure out what
>our expression above does:

[ ... ]

I think that you got it all just right, except for the parenthesis. 
The parenthesis do not affect the matching semantics, except for 
grouping to disambiguate infix operators. So, the regexp matcher will 
return #f for the same strings when used with this regexp:

   "\"([^\"]*)\"(.*)"

and this one:

   "\"[^\"]*\"(.*)"

As you may notice if you try it out, however, the cadrs of the lists 
returned will be different. That is where the parenthesis comes in. 
The parens are used to indicate "interesting" portions of the regular 
expression. In this case, we care what is between the quotes. So, the 
cadr of the list will be the portion of the string that was between 
the lists.

Now that you know that, you might think: "but what if the parens are 
inside a * or a + or even other parens, etc"? So, there's even more 
arcana in the docs, but once you understand the gist of the parens, 
it should be pretty clear with a few experiments.

Robby