[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Write procedure is not backwards compatible



> We definitely cannot use the SRFI-38 semantics, which
> provide no equivalent to write-simple (i.e. no procedure
> which guarantees to be fast).

I see a use case, where a user generates code with shared structure and "write"s it to a file, then expects to be able to read that code back in and execute it.

(define (make-plus-_expression_ arglist)
  `(lambda ,arglist (+ ,@arglist)))

Natural implementations of "write" that handle cycles would probably just dumbly handle all shared structure the same way.  It would probably get written like this:

(write (make-plus-_expression_ '(x y)))
=> (lambda #0=(x y) (+ . #0#))

But I have the impression that this would not be legal code even in R7RS, let alone any previous Scheme versions.  R7RS draft 6 says: "It is an error for a ⟨program⟩ or ⟨library⟩ to include circular references except in literals"; this does not specifically allow or disallow shared structure that isn't circular.  Racket, for one, mercilessly rejects the above lambda-_expression_ (or any #n= syntax) in a program, although it can be read in with "read", in which case it prints it back as simply '(lambda (x y) (+ x y)).

Now, either you could specify (and implementers would implement) that shared structure in code must be accepted; or the user would need to "write" code without the shared-structure syntax.  I suspect the first option would be an annoying pain to put into a "small language" implementation, especially because they'd probably want to disallow circular code (Shivers's "circularize" notwithstanding, see http://www.ccs.neu.edu/home/shivers/newstyle.html ), and detecting cycles adds more complexity (but is doable, as I shall describe soon).

Therefore, let us consider the second option, of "write"ing code without the #n= syntax.  Obviously write-simple would accomplish this--but it would loop infinitely if passed a circular structure.  The user might never generate circular objects, but then he might encounter them occasionally (especially with other-user-submitted content).  It seems like a bad kind of bug, where you don't expect anything to go wrong, and it almost never does--but when it does, it manifests as an infinite loop.  It would be nice if the language had rock-solid primitives that just handled this stuff safely for you...

I submit that it would be useful to have an "attempt-to-write-simply" procedure (name is up for grabs), which would work like write-simple except that it would detect cycles and would throw an error if it found any.  (This could even displace "write-simple", which might survive merely as an unsafe operation used for efficiency reasons.)  You give the criterion "guarantees to be fast".  Yes, that is desired... Well, here's what I've come up with:

Given a structure with a total of n objects (anything that contains no pointers can be considered an indivisible object for our purposes here), which have m pointers to each other (note m ≥ n-1), and given some hash table-like thing with O(log n) insertion and lookup, you can determine whether there is a cycle in O(n log n + m) steps and O(n) total size of intermediate objects (which all become garbage).

The algorithm I have in mind is simply to construct a reference-counted copy of the graph of objects, then destroy the root node and perform a reference-count garbage collection.  If any objects (or their representations in the graph) are not destroyed by the end of this process, then there is a cycle and we throw an error; otherwise, there is no cycle and it is safe to use write-simple.

- Trace the graph of interesting (possibly pointer-containing) objects and collect them all in the hash table.  O(n log n + m) time, O(n) space here; this stuff is discussed in SRFI 38.
- Associate every object with a reference count, initially zero.  This could be accomplished by replacing each x in the table with (cons 0 x).  O(n) time and space.
- For each object (iterate over the table), take each pointer in it and increment the reference count of what it points to.  O(m) time.
- "Destroy" the root node: for each pointer in it, decrement the reference count of what it points to, and if the refcount of the child object reaches zero, push it onto a stack of objects to destroy.  (In fact, you shouldn't just destroy the root object; check if its refcount is zero, and if not, then you've found a cycle already.)  Iterate until the stack is empty.  O(m) time, potentially O(n) stack space.
- Look through the table and see if you find an object whose refcount isn't zero.  (Or, in the previous step, maintain a count of how many objects you destroyed, and see if that's less than the number of entries in the table.)  O(n) (or O(1)) time.

This requires something like a hash table, as SRFI 38 describes.  I'm just assuming O(log n) lookup and insertion, and O(n) total space, to store n key-value pairs.  Can you require implementations to make good hash tables?  Or maybe you describe this as an optional feature?

Performance: This process requires tracing all pointers in the objects a total of three times each--3m.  One could combine the first three steps--making reference counts as you found each object--to reduce this to two times, so 2m.  It generates a hash table full of n objects, n cons cells, and up to n stack space.  And probably once or twice it'll spend "n log n" time in table creation, and two or three times it'll do n lookups in the possibly log n table.

I think it's impossible to do much better than this in general.

So, how are we doing?  Is this cost acceptable?  Well, "write-simple" must follow every pointer (hence O(m ≥ n-1) time, versus my O(n log n + m)--seems comparable except maybe for gigantic structures), and in most cases, the structure to be printed already resides in memory and takes up O(n) space (versus my *additional* O(n) space).  The cost of cycle detection would only be unacceptable if, say, you were trying to write a structure that was bigger than, say, a third of all remaining memory.  Programs that specialize enough that a constant factor of 3 or 4 (in space; I'd guess 10-100 in time) matters, or that write structures that are stored in memory-mapped files on the disk (or swap space), can use "write-simple" and worry about cycles themselves.

Any other objections?  A nice thing about "write-simple" is that it can probably be made to generate zero garbage--just grow and shrink the program stack, and write characters into a buffer (possibly specially handled).  This tends to mean fewer GC pauses and more deterministic performance.  But that is something that should be solved through other means: real-time garbage collection methods.  (I will take the opportunity to mention Henry Baker's early work: http://www.pipeline.com/~hbaker1/RealTimeGC.html )  The last thing that occurs to me is that "write-simple" can start printing characters immediately, while this "attempt-to-write-simply" procedure would do a lot of work before it printed anything.  Again, a user who writes big enough objects that the time difference is noticeable, in an environment where the difference is important, will probably work to understand why his "write" is slow and learn about cycles and "write-simple" and handle them himself.

I think difficulty of (internally) implementing eq hash tables or an equivalent is the only real problem here.  Others know better than I about that.  So I'll end here, with a pointer back to my "I submit that" paragraph.
--John Boyle
Science is what we understand well enough to explain to a computer. Art is everything else we do. --Knuth



On Sun, Jul 1, 2012 at 3:04 PM, Alex Shinn <alexshinn@x> wrote:
On Mon, Jul 2, 2012 at 6:26 AM, Jonathan Rees <jar@x> wrote:
> Ouch! If true, I second this comment. Backward compatibility for write is pretty important. Consider the case of running an R5RS (or R7RS) program in an R7RS implementation to generate a file that will be consumed by an R5RS implementation.
>
> There is no mention of this incompatibility in the section "Language changes since R5RS".

Ticket #442 added to vote on this in the next ballot.

Requiring the output of R7RS programs to be readable
by R5RS programs is forwards compatibility, not backwards,
and is unachievable the moment you make any changes
to the lexical syntax (e.g. with bytevector literals).

You can make the case that passing simple lists of
simple data-structures should be compatible between
any Scheme-like system.  The WG will discuss and
vote on this.  The community is encouraged to provide
additional feedback and examples.

We definitely cannot use the SRFI-38 semantics, which
provide no equivalent to write-simple (i.e. no procedure
which guarantees to be fast).

--
Alex

>
> Jonathan
>
> On Jul 1, 2012, at 12:36 AM, Marc Feeley wrote:
>
>> Formal Comment
>>
>> Submitter's name: Marc Feeley
>> Submitter's email: feeley at iro.umontreal.ca
>> Relevant draft: r7rs draft 6
>>
>> Type: defect
>> Priority: major
>> Relevant section of draft: 6.13.3. Output
>>
>> Summary: Write procedure is not backwards compatible
>>
>> R7RS introduces a new output procedure called write-simple, which has
>> the same semantics as the R5RS write procedure.  On the other hand,
>> the R7RS write procedure handles shared structures differently than
>> the R5RS.  For example :
>>
>>   (let ((x (list 1 2))) (write (list x x)))
>>
>>       displays ((1 2) (1 2)) in an R5RS system
>>   and displays (#0=(1 2) #0#) in an R7RS system
>>
>> To preserve backwards compatibility, it is the version of the write
>> procedure which uses datum labels which should have a different name.
>> In fact SRFI-38 has specified the name write-with-shared-structure for
>> this output procedure.  This name should be maintained since it has
>> been implemented with that name in some Scheme systems.
>>
>>
>> _______________________________________________
>> Scheme-reports mailing list
>> Scheme-reports@x
>> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
>
>
> _______________________________________________
> Scheme-reports mailing list
> Scheme-reports@x
> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports