[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Scheme-reports] Procedural equivalence: the last debate



== Introduction ==

This is the kickoff message for a discussion of a proposal put forward
during the voting period on the ninth draft of R7RS-small.  The proposal
is to reverse one of the changes to R5RS, namely the rules for `eqv?`
and `eq?` when applied to procedures.

In the ninth draft, as in R6RS, `eqv?` is required to return `#f` if
its arguments "are procedures that would behave differently (return
different values or have different side effects) for some arguments"
(p. 30).  However, in all other cases where both arguments are procedures,
`eqv?` may return either `#t` or `#f`.

In R5RS, to which the proposal would return, the rule for returning `#f`
is the same as the one above.  However, there is an additional rule:
if both arguments to `eqv?` are procedures with the same location tag,
then `eqv?` must return `#t`.  

A location tag is defined in section 4.1.4 of R5RS thus: "Each
procedure created as the result of evaluating a lambda expression is
(conceptually) tagged with a storage location."  In particular, 
`(let ((p (lambda (x) x))) (eqv? p p))` must evaluate to `#t` by
R5RS rules, but can evaluate to either `#t` or `#f` by R6RS rules.

In the ninth draft, `eq?` is allowed to differ from `eqv?` when both
arguments are procedures.  The proposal is to withdraw this permission.

== Previous reports ==

Here is an account of the definition of `eqv?` and `eq?` in all previous
Scheme reports and standards:

R0RS and R1RS are silent, as all first-order procedures come from MacLisp.

In R2RS (p. 24), `eqv?` is defined on booleans, symbols, pairs, and
vectors (presumably strings are meant too, but they are not mentioned)
as well as exact numbers.  Under `eq?` it says that characters can be
compared with `eqv?`.  Nothing is said about procedures.

In R3RS (p. 13), `eqv?` is defined in terms of an approximation to a
notion called "operational equivalence", which is in turn defined on
booleans, symbols, numbers, characters, pairs, vectors, strings, the empty
list, and procedures.  For all non-procedure types, `eqv?` implements o.e.
However, `eqv?` is allowed to return #f even on o.e.  procedures that were
created at different times (sc. by different invocations of `lambda`).

In IEEE (p. 23, or p. 31 of the PDF), in R4RS (p. 13), and in R5RS
(p. 17), `eqv?` is defined on the same types, both positively and
negatively for the first time, discarding the explicit notion of o.e.
These versions also talk about locations rather than mutation for pairs,
etc., and this language is extended to procedures: if they have the same
(virtual) locations, they are `eqv?`.  On the other hand, if they behave
differently, they are not `eqv?`.

In R6RS (p. 37), `eqv?` defers to `boolean=?` and to `symbol=?`, and the
rules for pairs etc. are extended to bytevectors, hashtables, records,
and ports.  (Record-type descriptors are also specified, deferring to
library section 6.3.)  Procedures that behave differently must return #f,
but in no case is procedure equivalent required to return #t.

In all reports, `eq?` is the same as `eqv?` except on numbers and
characters.  Although an example showing that `(eq? p p)` returns #t is
present from R4RS onwards, nothing is said about procedures in the prose.

== Advantages of the R6RS semantics ==

In essence, in R6RS procedures do not have identity: certain pairs of
procedures can be distinguished, but it is never the case that two
references to a procedure can be reliably be said to be the same procedure.
This means that an implementation is free to coalesce two evaluations of
lambda if they are in effect the same, as well as to break a single
evaluation into two actions.

For example, it is possible for an implementation to rewrite calls on `car`,
which is stateless, into calls on `%internal-car`, which the compiler knows
how to inline.  All references to `car` in operand position are then
rewritten as `(lambda (pair) (%internal-car pair))`.  Consequently,
`(eqv? car car)` will naturally return `#f`, since two lambdas are now
involved where in the source there was only one.  On the assumption
that calls are far more frequent than operand references, this is a win.
How much of a win it is, is a question.

== Advantages of the R5RS semantics ==

In Scheme, procedures have always had identity, and a large body of code
stretching from the original Lambda: the Ultimate papers up to the present
day depends on it.  Unfortunately, breaking the first-class status of
procedures as objects causes a number of techniques previously possible
to become unreliable.  (The following list was prepared by Alex Shinn.)

  - Using procedures as lightweight objects or actors
  - SRFI-17 (generalized set! needs to lookup procedure setter)
  - TinyCLOS generics (needs to understand method identity)
  - Elisp `add-hook' equivalents (needs to avoid duplicates)
  - Fast paths such as SRFI-1 lset-* optimized for eq?,
    or SRFI-95 sort optimized for <.  See
    https://wiki.call-cc.org/eggref/4/special-case for the
    general case of special cases.
  - Caching with procedures as (part of) the key.  Given
    a large enough cost and high enough cache hit ratio,
    caching is an arbitrarily large speedup.
  - Memoization.  The (chibi parse) parser combinator
    library memoizes combinators to achieve packrat
    parser-like performance.  The test case at
    https://code.google.com/p/chibi-scheme/source/browse/tests/parse-tests.scm#121
    runs in exponential time with memoization disabled,
    and linear time otherwise.

In short, Scheme without procedure identity is a different language altogether.

== The uncertainty of the R6RS semantics ==

While optimizations of the kind that R6RS permits are important to
encourage programmers to use higher-order abstractions and avoid manual
inlining.  However, it is not clear how often these optimizations are
applicable, and how much speedup they actually provide, in typical
Scheme code.  Furthermore, the programmer will have little intuition
about when the optimization is actually a pessimization due to code
duplication and overflowing the instruction cache or for other reasons.

In addition, some of the optimization can be provided using a cache
of lambdas; escape analysis is also useful in improving, if not fully
optimizing, such invocations.

== The detailed proposal ==

The proposed language is *exactly* that of IEEE, R4RS, and R5RS, with
the following exceptions already in place in the ninth draft:

1) Symbol equivalence defers to the new (R6RS) `symbol=?` predicate;
no operational difference.

2) To equivalence of pairs, strings, and vectors is added bytevectors
and records on exactly the same terms.

3) The R6RS-based rumble-bumble about inexact numbers that ensures that
`eqv?` is undefined when both arguments are NaN, and that `(eqv? 0.0 -0.0)`
evaluates to `#f` in systems that distinguish them otherwise.

Comments are solicited.

-- 
Long-short-short, long-short-short / Dactyls in dimeter,     John Cowan
Verse form with choriambs / (Masculine rhyme):           cowan@x
One sentence (two stanzas) / Hexasyllabically
Challenges poets who / Don't have the time.     --robison who's at texas dot net

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports