[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Scheme-reports] Cycle detection problems: #442, #338, "equal?"



I think the latest draft 6 ballot resolutions (http://trac.sacrideo.us/wg/wiki/WG1Ballot6Results) on circular data structures still have problems; below are two simple alternatives that I think would solve them.


First, trac #338 has the display procedure "generate loops like write", but it appears to me that the later resolution in #442 means that the resolution of #338 started with a false premise.  After all, the write procedure no longer generates "loops like write", as was assumed by #338 vote.  So #338 needs revisiting.

Second, trac #442's resolution imposes a large overhead that is completely unnecessary to meet the stated goal (termination).  The comments in #442 make it clear that for ordinary "write", the primary thing some people cared about was termination in the presence of cycles.

But if that were the goal, I would have expected the specification to permit far more efficient implementations than it appears to permit.  For example, I would have expected the spec to permit an approach I'll call the "suspicious counter" implementation:
* Run equal?/write/display for a while, and recurse *without* tracking cycles. Instead, have a running counter of the number of cons/vectors that have been iterated so far.  The running counter can be handled with tail recursion, so this counting would be very efficient.
* If the counter exceeds some "suspicious" value, THEN start tracking to see if we have a cycle (this tracking might be done with a hashtable, not available in the spec).

This would still impose a significant overhead, because ANY cycle-detection system imposes a massive overhead, but on average its overhead would be FAR smaller than the one currently mandated by #442 and #338 in most cases.

However, the resolution of #442 (write+simple+shared) seems to forbid a "suspicious counter" implementation.  The current resolution of #442 still requires labels when there's a cycle, even though it seems unlikely that the user actually wants (or cares) about labels. If the user wanted labels, the user would have called write-shared.  I recommend that write-shared be used when you actually want labels, and if the purpose is just to ensure termination of write and display with cycles, then just require that they terminate in the presence of cycles and say *nothing* *else* other than what happens when a cycle is detected.  In particular, don't require labelling; that would forbid a "suspicious counter" implementation.  This would make "write" and "display" more consistent with "equal?", since "equal?" already only requires termination in the presence of cycles.  I think it's reasonable to have write and display raise a condition; I/O routines already have the possibility of tr
 ouble .

In addition, there seems to be a major gap in the definition of "equal?" involving cycles.  It currently says, "Even if its arguments are circular data structures, equal? must always terminate."  That additional sentence adds a MAJOR new performance and implementation cost on "equal?" compared to R5RS.  But what, exactly, is that procedure supposed to do if a cycle is detected?  Return #f?  Throw something?  Either?  This additional requirement imposes a major new performance cost, but the spec doesn't even promise to do something if it happens.  That's a costly effort to produce an undefined result.  (My preference would be #f, f "equal?" has cycle detection.  People expect equal? to just return true or false, and few people would put guards around "equal?").

I think redefining these standard functions, which *explicitly* did not guarantee termination in the presence of cycles in R5RS and before, is unfortunate.  I understand the rationale.  But cycles are exceedingly rare, and detecting them has a significant overhead.  The spec is setting up a situation where NON-compliant Scheme will be demonstrably faster than compliant ones in the normal case, because there is no standard way to request the traditional semantics (WITHOUT cycle detection).  In particular, cycle detection would typically be done by a hash table (not in the standard!) or via lists (sloooow), both of which impose creating and growing data structures, etc.  Why impose such a big overhead on "normal" procedures for an extremely unusual case, especially since Scheme for decades has *specifically* made clear that users couldn't count on cycle detection in these procedures?  You don't need to be Neo to use traditional "write" and "equal?".  Scheme already has many way
 s to get into infinite loops; this is an expensive way to prevent an odd case while not really providing additional safety.

Don't get me wrong, I think cycle detection is valuable. But these cycle-detection procedures have fundamentally different same space and time properties from the R5RS procedures.  New procedures, with different semantics and impacts, should have new names!  Those new procedures should have the semantics above, where they ONLY are guaranteed to terminate in the presence of cycles and where the result of "equal?" in cycles is actually defined.

Alternatively, if you really do want to change equal?/write/display to do cycle detection (compared to R5RS), I think it is *critical* that the spec do the following:
* In "equal?", change "Even if its arguments are circular data structures, equal? must always terminate" to "Even if its arguments are circular data structures, equal? must always terminate; equal? must return #f if it terminates due to a cycle."
* "write" should specifically say: "Shared structure is not represented using datum labels (see write-shared)."
* Both "write" and "display" should say: "This procedure must always terminate; if a circular data structure is detected, a condition is raised."

Please do NOT require that write and display print labels; the user didn't request them.  And similarly, what exactly equal? will do if given a cycle needs to be clear, or users can't depend on it.

In general, I like how R7RS is shaping up; this seems to be a sore spot in an otherwise really promising spec.  I especially can't wait for a library and exception system that might actually *PORT* across Schemes.

Please re-examine #338, #442, and "equal?" in this light.

Thanks very much.

--- David A. Wheeler

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports