[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Procedural equivalence: the last debate

Let me start by recommending these comments by Taylan Ulrich B.:


The rest of what I'm about to write responds to that message
and to several others.


Per Bothner wrote:

> The problem is I would like to allow the following to return true
> (as it does in Kawa):
> (define (maybe-negate negp)
>    (if negp (lambda (x) (- x))
>        (lambda (x) x)))
> (eq? (maybe-negate #t) (maybe-negate #t))
> ;; Likewise for eqv?
> I.e. the compiler can realize the two lambda
> expressions don't depend on closed state, and
> so it can move them to top-level and pre-allocate them.

That optimization is legal under the R5RS semantics, the R6RS
semantics, and under the semantics I proposed yesterday.  In
fact, the R5RS rationale at the end of the specification of
eqv? explicitly sanctions the optimization you're describing.


Alaric Snell-Pym wrote:

> What is the "location tag" of a procedure? Perhaps it's the location of
> the closure-thing or, for implementations with garbage collectors that
> like to move things around, it might be an arbitrary unique ID that gets
> stowed in the closure-thing.

The location tags are conceptual, for the purpose of explaining
the semantics, and need not correspond to anything at all in an
actual implementation.  Any implementation strategy that makes
eq? return #t when given two procedures with the same conceptual
location tags is permitted by the R5RS semantics.


Shiro Kawai wrote:

> Oops, I'm probably lost... Does the current discussion suggests
> that fresh location tag must be allocated for every time lambda
> expression is evaluated, even the lambda expression doesn't close
> any mutable state?

Although some people have suggested that implementations must
actually allocate a fresh location every time the lambda is
evaluated, they're wrong.

Furthermore, the R5RS semantics allows both eq? and eqv? to
return #t when given procedures whose (conceptual) location
tags are distinct, provided those procedures would behave the
same under all circumstances.

> As Per Bothner posted in <51AE7631.4050903 at bothner.com>, lifting
> constant lambdas to upper level is the important optimization and I
> heavily rely on it.

That optimization is permitted by the R5RS, by the R6RS, and by
the semantics I proposed yesterday.

> (What I see from Will's suggestion isn't that extreme; the compiler
> can "mark" the procedure in some way so that even the optimizer
> duplicates lambda exprs (hence can't retain eq?-ness) it can still
> eqv? to each other.)

Correct.  Furthermore, the wiggle room that allows eqv? to
return #t even when procedures do not have the same (conceptual)
location tags but would still behave the same enables the
implementation strategies I described in one of my previous
messages, which don't involve any run-time marking at all
beyond normal closure allocation.


In my previous message, I discussed these three optimizations:

    A. implementing eq? by a simple pointer comparison
    B. implementing eqv? by examining the representations
        of procedures
    C. implementing delayed boxing for procedures

While responding to the following messages, I'll mention six
different strategies for implementing eq? and eqv? on procedure
values.  The first four strategies are

    strategy A: implement eq? by a simple pointer comparison,
        with eqv? delegating to eq? when both of its arguments
        are procedures

    strategy B: implement eqv? on procedures (represented as
        flat closures) by comparing the contents of those
        closures, and implement eq? by making it the same as

    strategy AB: implement eq? by a simple pointer comparison,
        but implement eqv? on procedures (represented as flat
        closures) by comparing the contents of those closures

    strategy BC: implement eqv? on procedures (represented as
        flat closures) by comparing the contents of those
        closures, implement eq? by making it the same as eqv?,
        and implement the compiler optimization that delays
        boxing of procedures until the moment of their escape

As explained in my previous message, all four of those
strategies are allowed by the R5RS semantics for eq? and eqv?.
Strategies A, B, and BC are allowed by the R6RS semantics as
well, but the R6RS semantics effectively forbids strategy AB.

While I'm at it, I'll mention these two strategies:

    strategy ABC: same as strategy AB, with the addition of
        delayed boxing for procedures

    strategy R6: eq? is the same as eqv?, and both of those
        predicates always return #f when either argument is
        a procedure

Both of those strategies are inconsistent with the R5RS, but
strategy R6 is allowed by the R6RS.  Strategy ABC would become
a viable strategy under the semantics I proposed for eq? and
eqv?, but strategy R6 would not be allowed by that semantics.


Alex Shinn wrote:

> As I see it, once a procedure escapes, the existence of any
> semantics in the language which can discriminate the procedure
> location requires it to be boxed.  This is true whether the
> discriminator is eq? or eqv?.

The location tags are conceptual.  Any implementation strategy
that satisfies the specific requirements of the relevant
standard is permissible.  Strategies AB and BC satisfy all
requirements imposed by the R5RS, R6RS, and my proposal.

> So it doesn't seem this allows the desired unboxing
> optimization.

Incorrect.  Strategy BC satisfies all requirements imposed
by the R5RS, by the R6RS, and by my proposal.


Noah Lavine wrote:

> It would make sense to me that the rules for procedures and
> records would be the same. So it would be convenient if the result of
> (eq? (lambda (x) x) (lambda (x) x))
> were the same as
> (define-record-type my-record make-my-record my-record?)
> (eq? (make-my-record) (make-my-record))
> It makes the most sense to me if both of these expressions return #f.

That may make the most sense to you, but no Scheme standard has
ever required (eq? (lambda (x) x) (lambda (x) x)) to return #f.


Gerald Jay Sussman wrote:

> And from the point of view of the semantics, consider the expression
> (let ((p <anything>)) (eq? p p)).  We thought it was obvious that 
> (eq? p p) is true independent of the kind of object p refers to.  
> Indeed, if EQ? is intended to be the finest possible equivalence
> relation on Scheme objects, then it better be reflexive, symmetric,
> and transitive, to be an equivalence relation at all.  Any other
> choice yields a much more complex semantic interpretation.

In my opinion, the R5RS (and previous documents) erred when they
referred to eq? as an equivalence predicate.  eq? is not required
to be reflexive when applied to numbers or characters, so it isn't
really an equivalence predicate.  Its restriction to the kinds of
values on which it is required to be reflexive is an equivalence
predicate, but it isn't an equivalence predicate on its entire

That explanation makes perfect sense, and isn't at all complex.

> On the other hand, we also recognized that numbers were special: that
> two floats or two bignums would not be sensible to compare with EQ?.
> So we just didn't think of numbers as objects -- they were something
> else.  Of course, it is usually not appropriate to compare two
> (inexact) floats for equality at all!

In my opinion, treating numbers (and characters) as non-objects
yields a more complex semantics than treating them as first class
objects that the eq? procedure may not be able to compare reliably.

> Because of the assumption that procedures are objects with identity, I
> have lots of code that uses EQ? on procedure objects.  For example, I
> often put "sticky notes" on a procedure that gives additional
> information about the procedure, using an associative mechanism that
> uses EQ? and weak tables.  My programs can examine the sticky notes on
> a procedure to make it "referentially translucent" and thus help do
> some reasoning about it.  So a change to the specification of EQ?
> will make lots of work for me.  I have to chase down exactly where it
> matters and where it doesn't.  (Remember that MEMQ, ASSQ, DELQ, etc.
> all depend on EQ?.)

If that example is typical, then the R7RS library mechanism
will save you.

All you'd have to do is construct your own (sussman base)
version of (scheme base), importing (sussman base) wherever
you need the procedures exported by (scheme base).  That's
only two extra characters of typing per import, so it's not
significantly more work than importing (scheme base).

You would define (sussman base) by importing (scheme base)
with an "except" declaration to exclude eq?, memq, and assq.
Your (sussman base) library would export every variable name
exported by (scheme base), which is easy because you can just
copy the list in R7RS appendix A.  Your (sussman base)
library would also include the following definitions:

    (define eq? eqv?)
    (define memq memv)
    (define assq assv)

In practice, you probably wouldn't even have to do that much.
Whenever a new standard for Scheme has made previously correct
code incorrect or non-portable under the new standard, most
implementors have tried to make that transition easier.

Even if my proposal were to be adopted, most implementations
would continue to use some variation of strategies A, B, AB,
or BC.  Since all of those strategies are allowed by the R5RS,
and your code is presumably correct under the R5RS, your code
would continue to work in most implementations without change.

Only a few implementors would adopt strategy ABC (or some other
strategy that's inconsistent with R5RS semantics), and they'd
be among the more sophisticated implementors.  It would be no
real trouble for them to provide an alternative version of
(scheme base) in which eq? behaves exactly the same as eqv?.

In addition, implementors who use strategy ABC by default would
probably give you a way to turn off optimization C.


John Cowan wrote:

> I have yet to be convinced that `eq?` actually is more efficient
> than `eqv?` on the safe types.

If there were no performance advantage to optimization A,
strategy A would be a lot less common, and strategy BC would
be a lot more common.

On my 4-year-old MacBook Pro:

(define (cowan-test-eq n x y)
  (cond ((eq? x y) 'done)
        ((= n 0)
         (cowan-test-eq n 'a 'a))
         (cowan-test-eq (- n 1) x y))))

(define (cowan-test-eqv n x y)
  (cond ((eqv? x y) 'done)
        ((= n 0)
         (cowan-test-eqv n 'a 'a))
         (cowan-test-eqv (- n 1) x y))))

> (time (cowan-test-eq 100000000 'a 'b))
Elapsed time...: 423 ms (User: 422 ms; System: 0 ms)

> (time (cowan-test-eqv 100000000 'a 'b))
Elapsed time...: 2177 ms (User: 2177 ms; System: 1 ms)



Scheme-reports mailing list