[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?"



John Cowan writes:
> Informally, we can say that your first two expressions can both be
> characterized as "a list with an infinite number of x's"
-snip-
> That's just a way of encoding the paradoxical fact that one plus infinity
> is still equal to infinity.  If you don't like that, avoid infinite
> structures or define your own equality.

I might define some kind of binary tree structure, where each node has a pointer to its parent--or to the root of the tree.  I would characterize this as a very finite graph, and would not consider a tree with two equal? nodes added to it as the same as a tree with only one node.

Also, I might use a circular list to maintain a free-list of some kind of object, null-ifying all fields of objects as they were "free"d and put back on the list, and--though this is a complaint about write or read, not equal?--I would be rather confounded if I wrote all data structures in the program to a file, restarted the program, read everything back in, and found that there was 1 object on the free-list rather than 10,000, or vice versa.  True, I should have used "write-shared", but I can imagine some users learning that "write"/"read" seemed to handle everything including cycles just fine, and then being totally mystified at this behavior.

"infinite list" is only one valid interpretation; there are others.

> That's a non-conformant implementation of R7RS `write`, which is required
> to use labels only in the presence of cycles, not in the presence of
> structure-sharing.

Hah.  Do you mean to say that an implementation should detect the *exact* amount of datum-labeling that is necessary to represent a list without infinite loops, and do no more than that?  For example:

In this discussion, circular-list is defined thus:
  (define (circular-list . args)
    (let ((u (apply list args)))
      (set-cdr! (last-pair u) u)
      u))

So, according to you, a correct implementation of "write" should be able to do this:
- If a = '(1 2) and b = (circular-list a a), then b writes as #0=((1 2) (1 2) . #0#), not as #0=(#1=(1 2) #1# . #0#).  Because the two (1 2) structures don't need to be shared.
- If a = (circular-list 1 2) and b = (list a a), then b writes as (#0=(1 2 . #0#) #1=(1 2 . #1#)), not as (#0=(1 2 . #0#) #0#).  Because the two pointers to the object a in the list b don't actually need to be shared, although they still require datum labels within themselves.
- If a = (list 1 (cdr b)) and b = (circular-list 3 a) [which could be accomplished through further assignment], then... we might naively write b as #0=(3 . #1=((1 #1#) . #0#)), but it would save a datum label to duplicate the first cons cell, as in (3 . #0=((1 #0#) 3 . #0#)).  So maybe the latter should be preferred (though I don't think you actually said this).

It seems that figuring out a minimal level of sharing requires identifying sub-portions of the graph of objects that don't directly point to each other once we've given names to key nodes--it's close to making "write" perform the whole "replacing irreducible cycles with structs" approach I suggested.  Furthermore, if you want it to figure out the absolute minimum number of datum labels required to print a structure, that's a nontrivial problem, and I think is well beyond the scope of what anyone wants "write" to do.

The obvious thing to do is just to indiscriminately use write-shared for the whole structure once a single cycle has been found; this is what Racket does, and I would be amazed if an implementation did anything else.

It seems our choices are:
- Specify exactly how "write" manages to minimize sharing in the presence of cycles.  This seems to mean implementation complification, bad performance, and little benefit.
- Specify that "write" must act like "write-shared" in the presence of cycles.  This seems to be the easiest to implement and seems already done.
- Specify merely that "write" must produce something that, when read back in, will be equal? to the results of "write-shared".  I imagine "read" will remain simple (will not, for example, read "(1 1 1 . #0=(1 . #0#))" into a single cons cell--basically, it assumes the output was made by "write-shared").  Under the "infinite unfoldings" definition of equal?, it would then be permissible for "write" to produce any amount of "(1 1 1 1 1 ..." before realizing "Oh crap there's a cycle" and printing some datum labels.  ... I can see only one advantage to this--a possible performance improvement from being lazy about checking for cycles, as was mentioned at the beginning of this email thread.  A disadvantage is that, as I mentioned earlier, structures may get a lot bigger after being written and read back in.  (And in general, anyone who uses cyclical structures will probably mutate them further, and if they don't remain "isomorphic" after writing and reading--meaning mutations will have different effects--that will probably mystify and frustrate the user.)

I think the disadvantages of the third option are serious and intolerable.  I do think "write" will have to check for cycles at the beginning--or write to a buffer, and if it grows suspicious of cycles (say, when a 4K buffer is full) and discovers they exist, will have to throw the buffer away and start again [actually you can just overwrite at the start--or write directly into the destination port, since write-shared will not need to backtrack, and leave the write-buffer for future use].  Maybe that would be acceptable?  So "write" remains basically as is, and "equal?" remains as is.  (Yeah, sorry, David Wheeler.  Maybe the write-buffer suggestion will help.)

> Fortunately, `equal?` is not in any way encoded into Scheme, other than
> its use in defining `assoc` and `member` (and these now allow you to
> provide your own equality predicate).

That's good.  By the way, when hash-tables are specified, will it be possible to make a hash-table with an arbitrary equality predicate?
--John Boyle
Science is what we understand well enough to explain to a computer. Art is everything else we do. --Knuth



On Fri, Aug 31, 2012 at 8:21 AM, John Cowan <cowan@x> wrote:
John Boyle scripsit:

> ... I must say, I assumed that equal? did in fact mean
> "isomorphic/one-to-one mapping", not "(possibly infinite) unfoldings".  I
> was astonished to hear that it was the latter, and that #0=(x . #0#) was
> equal? to #0=(x x . #0#), but Racket confirms it.  At first, I couldn't
> imagine why it would be this way, but now I realize: the list ((x) (x)) is
> not isomorphic to the list (#0=(x) #0#), yet they are considered equal?.

Informally, we can say that your first two expressions can both be
characterized as "a list with an infinite number of x's" and are thus
equal; the last two expressions are similarly "a list of two elements,
each of which is a list containing x".

> ... I am tempted to suggest, as you seem to do, that we take a hybrid
> approach and have "equal?" degrade to "isomorphic?" when a cycle is
> detected (the way "write" is probably going to degrade to "write-shared"
> when a cycle is detected).

That's a non-conformant implementation of R7RS `write`, which is required
to use labels only in the presence of cycles, not in the presence of
structure-sharing.

> ...I still think "equal?, but degrades to isomorphic?" is the most useful
> function for my purposes, but eh, whatever.

Fortunately, `equal?` is not in any way encoded into Scheme, other than
its use in defining `assoc` and `member` (and these now allow you to
provide your own equality predicate).  It's a straightforward matter to
roll a predicate that does exactly what your particular application needs.

> Actually, come to think of it, there is one fact about equal? that is true
> for all non-cyclic structures that isn't true under unfolding: if (equal? a
> b), then not (equal? (cons <something> a) b).

That's just a way of encoding the paradoxical fact that one plus infinity
is still equal to infinity.  If you don't like that, avoid infinite
structures or define your own equality.

> Meanwhile, Will Clinger has a reference implementation of "unfolding",
> Racket and probably others have full implementations, and R6RS
> specifies it.

Indeed, Chez, Vicare, Ypsilon, Mosh, and IronScheme all implement it
correctly.  For whatever reason, Larceny returns #f at the REPL.

> By the way, if the "(possibly infinite) unfoldings" definition is
> considered confusing, I'll throw out some suggestions for a precise
> definition:

Ack.  I think your definitions are more confusing than the R6RS version.
What I really don't understand is what "regular trees" means in the
phrase "(possibly infinite) unfoldings are equal as regular trees."
I know what regular tree grammars are, but what are regular trees?


--
Clear?  Huh!  Why a four-year-old child         John Cowan
could understand this report.  Run out          cowan@x
and find me a four-year-old child.  I           http://www.ccil.org/~cowan
can't make head or tail out of it.
        --Rufus T. Firefly on government reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports