[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] r7rs-draft-6: identifiers looking as numbers

Ray Dillinger scripsit:

> I have not tested recently, but as of a few years ago, there were
> several major implementations that considered any sequence of
> printing characters which didn't parse according to the rules for
> numbers, characters, strings, etc, to be an identifier.

Yes, that's a pretty common extension.  I argued for it in WG1, but it
was shot down because it requires arbitrary lookahead to determine if
you are parsing a number or a symbol, which is bad for systems that want
to parse numbers on the fly.

> My own parser uses a rule that identifiers cannot *both* begin
> with a sign, digit or decimal point, *and* end with a digit or
> decimal point.  This is on the assumption anything which does
> both begin and end with such a character is syntax that I may
> eventually want to use for some kind of number.

That doesn't work in the general case because of the Scheme syntax for
imaginary numbers: 2+3i begins with a digit but does not end with one.

> Both of those approaches treat all of the above as legal
> identifiers.
> Also, despite what the standard says, I see specific procedures
> named 1+ and 1- used in a lot of code apparently intended to be
> portable. 

This is carried over from the definition of potential numbers (that
is, tokens which may or may not be numbers, but are in any case not
identifiers) in Common Lisp, which looks like this:

1a) must not contain characters other than 0-9 A-Z + - / . ^ _

1b) must not contain two consecutive letters

2) must contain at least one digit; letters may be digits depending on
the numeric base, but only in tokens without . in them

3) must begin with 0-9 + - . ^ _

4) must not end with + -

It is the fourth rule that allows 1+ and 1-.

> The standard has not historically forbidden such extensions to
> identifier syntax.  I don't see a compelling reason why it should
> do so now.

R5RS defines an identifier thus:

1) The first character must be one of A-Z ! $ % & * / : < = > ? ^ _ ~

2) Remaining characters may be any of those, or 0-9 + - . @

3) In addition, +, -, and ... are allowed as identifiers.

Anything that doesn't fit these simple but restrictive rules or the
precise grammar of numbers is a lexical syntax error.

A poetical purist named Cowan           [that's me: cowan@x]
Once put the rest of us dowan.          [on xml-dev]
    "Your verse would be sweeter        http://www.ccil.org/~cowan
    If it only had metre
And rhymes that didn't force me to frowan."     [overpacked line!] --Michael Kay

Scheme-reports mailing list