[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?"



Ray Dillinger mentions:
> structural isomorphism

The mathematical definition of isomorphism is that there exists a one-to-one mapping between the objects in the first set and the objects in the second set that preserves some property.  (Here the property would be "which pointer-slots in the objects in the structure point to which other objects in the structure, or to which atomic objects".)  I think that's what you mean?  By the way, given that definition, isomorphism is trivial to implement with an eq-hash, using O(n) space and O(n) hash insertions and lookups: you establish the mapping naively as you look at each pair of objects.  See http://paste.lisp.org/display/131368 for a sketch implementation in Racket.

... I must say, I assumed that equal? did in fact mean "isomorphic/one-to-one mapping", not "(possibly infinite) unfoldings".  I was astonished to hear that it was the latter, and that #0=(x . #0#) was equal? to #0=(x x . #0#), but Racket confirms it.  At first, I couldn't imagine why it would be this way, but now I realize: the list ((x) (x)) is not isomorphic to the list (#0=(x) #0#), yet they are considered equal?.

... I am tempted to suggest, as you seem to do, that we take a hybrid approach and have "equal?" degrade to "isomorphic?" when a cycle is detected (the way "write" is probably going to degrade to "write-shared" when a cycle is detected).  However, that is ugly, and has expectably ugly consequences.  In particular, it would imply that, if a = '(6 3), b = (list a a), c = '((6 3) (6 3)), and d = #0=(1 #0#), then we have (equal? b c) and (equal? d d) but not (equal? (cons b d) (cons c d)).

(Likewise, (write b) and (write c) would both look like ((6 3) (6 3)), but (write (cons b d)) and (write (cons c d)) would probably look like (((6 3) (6 3)) . #0=(1 #0#)) and ((#0=(6 3) #0#) . #1=(1 #1#)), respectively.  For this and related reasons, you should probably not use "looks the same under 'write'" to define equal?, unless you intend to either adopt "degrades to isomorphic?" semantics or absolutely specify the output of "write".)

...I still think "equal?, but degrades to isomorphic?" is the most useful function for my purposes, but eh, whatever.  Incidentally, I can imagine a version of equal? that... come to think of it, this is probably in fact what you meant by your description of structural isomorphism.  Basically, you would treat any irreducible group of objects--a group such that you could follow pointers from any object in the group to any other object in the group--as a separate kind of data structure (with an ordered list of external pointers), and conceptually replace that group of objects (with its downward pointers) with an instance of the new struct-type (with the same downward pointers).  For example, #0=(<ptr1> <ptr2> . #0#) might become (cyclic-list-2 <ptr1> <ptr2>), while #0=(<ptr1> <ptr2> #0# <ptr3> . #0#) might become (generated-struct-name <ptr1> <ptr2> <ptr3>).  In this way, you can convert arbitrary structures into trees, and compare them with naive equal?.  Of course, it'd probably be rather complicated and expensive to do this... but it seems to be a complete and internally consistent specification, and it should satisfy any invariant about equal? that you could come up with when working without cycles.  --And it would think that #0=(1 . #0#), (1 . #0=(1 . #0#)), and #0=(1 1 . #0#) were all different, while the unfolding equal? would think the opposite.

Actually, come to think of it, there is one fact about equal? that is true for all non-cyclic structures that isn't true under unfolding: if (equal? a b), then not (equal? (cons <something> a) b).  Therefore, it seems the "equal?, replacing irreducible cycles with structs" approach is semantically superior.

Is it worth arguing for?  I dunno.  It seems difficult to implement efficiently, though suggestions are welcome.  (Can you quickly partition n objects into groups such that each object lies in a group where you can follow pointers from any element of a group to any other element of the group, but that you can't go from one group to another and back?  That's the start.)  Meanwhile, Will Clinger has a reference implementation of "unfolding", Racket and probably others have full implementations, and R6RS specifies it.  Is it worth bothering about?  Meh.

All this is sort of an unnatural problem anyway.  Comparing compound structures based on their structure and leaves, rather than object identity, is something you tend to do in the absence of set-ca/dr! (where you care nothing for object identity, and might even use "hash-consing" to share as much structure as physically possible), while cycles can only arise through the use of set-ca/dr!.  Perhaps it's a bit much to expect a single function to do well in both contexts; indeed, equal? originally didn't handle cycles at all, and the desire for a safe version seems to have been the reason for the "unfolding" extension in the first place.

Perhaps we can be satisfied with making a possibly arbitrary and dumb, but consistent (and reflexive and symmetric and transitive, and not too hard to implement efficiently), choice for how to extend equal?'s semantics to cycles.  But let it be known that there are alternatives.

----

By the way, if the "(possibly infinite) unfoldings" definition is considered confusing, I'll throw out some suggestions for a precise definition:
- "Compound structures x and y are not equal? if there exists a sequence of accesses made by composing [car, cdr, vector-ref, struct slot accessors, and anything else] that, when applied to x and y respectively, yields two objects that either have distinct types or are both atomic and not eqv?.  Otherwise, they are equal?."

- "x and y are equal? if there exists a many-to-one mapping of the compound objects in x and the compound objects in y to elements of a set z, such that map(x) = map(y) and, for any compound same-typed elements A and B from x and y respectively, map(A) = map(B) only if the contents of each slot of A is either eqv? to, or maps to the same element of z as, the corresponding slot of B."  This is essentially an inlined mathematical definition of equivalence classes, as Clinger's code suggests:

- "x and y are equal? if there is a way of partitioning all the objects of x and y collectively into separate groups ["equivalence classes"]--such that we say A = B if and only if A and B belong to the same group or A and B are both atoms and are eqv?--so that x = y and, for any elements A and B from x and y respectively, if A = B, then each slot in A = corresponding slot in B."
--John Boyle
Science is what we understand well enough to explain to a computer. Art is everything else we do. --Knuth



On Thu, Aug 30, 2012 at 12:30 PM, Ray Dillinger <bear@x> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/30/2012 11:57 AM, Ray Dillinger wrote:

> The behavior I would prefer, however, is that outlined by Per
> Bothner.
>
> To state it precisely in terms clear enough for the standard and
> also calculated to tell people exactly how to implement it:
>
> "If one operand has a cycle and the other does not, then the
> predicate returns #f.  If all have cycles, then iff the printed
> representations of the structures are identical up to the point
> when all have entered their cycles at least twice, then the
> predicate returns #t."

Actually, as I consider the above, there is one case where I'm
not comfortable with this definition. Vectors can have elements
whose printed representation is not part of the "infinite
printed representation" if there is a cycle that starts before
those elements.  Consider two arrays, each of which has itself
as its own second element:

V1 ==>   #1=#(1 #1 6)
V2 ==>   #2=#(1 #2 9)

(equal? V1 v2) ==> ?

The infinite printed representation is identical, well past the
point where both have entered their cycle at least twice. Both
would show "#(1 #(1 #(1 #(1 #(1 #(1 ......"  But they can't be
considered equal? because

(equal? (vector-ref V1 2) (vector-ref V2 2)) => #f

It requires cycle-aware print to distinguish the written
representation.

So, given the above, I think it's necessary that equal? should
detect structural isomorphism (written representation under
cycle-aware printing is equal) rather than generation of
identical infinite sequences ("infinite" written representation
up to second entry of cycle is equal).

                                        Bear
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQP79pAAoJEAOzWkqOibfNvzIH/jZBCxouUxtXCDHPvzGMzYwP
dLj+1cNmUGLLLmCyuFyCYtR9y3H52zfKRoWUI9FXEpkgFsVSd5trWmpvs9Q+1het
yY7NfnVYRCftFxRHoCBNKxax/L4X24AWrFlaLfLuy8T93VgJbMgG5R9fnUokeeqZ
bhh8Yy22mjOHuAWitqsG7DqwsbYUeybReh4X29bQyEZce+z65US4MyHpTHMGGA4D
v6jkHgum2BXNqebHPjTBvJ3YY0EtzKv3/Cu8R5BNHPJlly6yCiUYyyydcfZfVkgZ
KfKLCL/b4fe0uCtcBWio8vCl/7q4OkqM/O+bifx1qhwl6AUanVOhyVSIfHS9/9Y=
=dhoB
-----END PGP SIGNATURE-----

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports