[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Write procedure is not backwards compatible



Time for some benchmarking.

Using Racket's hash-eq data structure, I implemented basic versions of write/safe and write/fast, using the reference-counting thing I suggested earlier--basic in the sense that they only handle pairs and atoms, not vectors or other compound structures.  I simply used Racket's built-in "write" to write atoms (which were all small integers).

I construct structures that look like this:
(((1 1)) ((2 2)) ... ((n n)))
and print them out 1,000,000/n times, so that a total of 4,000,000 cons cells (multiply-counted) get printed out each time.  (I don't count the time it takes to construct the cons cells.)

write/fast consistently takes 4500-4700 milliseconds to print all of these structures out, which makes sense, because it does the same amount of work on each cons cell, no matter how long the tail of the list is.  Over the range of structures with 40 cons cells to those with 40,000 cons cells, "write/safe" is generally 10% slower than "write/fast".  At 400,000 cons cells, "write/safe" takes 33% longer (6100 msec), and at 4 million, 65% longer (7600 msec).  This seems consistent with my estimate of "O(n) -> O(n log n), or perhaps O(n), depending on exactly how hash-eqs work".  And incidentally, at 4 cons cells, write/fast took 5400 and write/safe took 6400 msec (20% slower).

I also tested Racket's built-in "write".  "write" took about 900 msec at the 40, 400, and 4000-cell structures, 1000 msec at the 40,000-cell structure, 2600 msec at the 400,000-cell structure, and 4600 msec at the 4 million-cell structure; and 1700 msec at the 4-cell structure.  Note that Racket's "write" will detect cycles and, if it finds any, will act like write/shared--so in a sense it already has the performance characteristics of write/safe:

arc> (let xs $.range.10 (= xs.2 $.range.3 xs.4 xs.2) ($.write xs))
(0 1 (0 1 2) 3 (0 1 2) 5 6 7 8 9)
arc> (let xs $.range.10 (= xs.2 $.range.3 xs.4 xs.2 cdr:last-pair.xs xs) ($.write xs))
#1=(0 1 #0=(0 1 2) 3 #0# 5 6 7 8 9 . #1#)

I will reproduce the code of write/fast and write/safe here, and the code and my interactions with the Arc REPL (used to drive the underlying Racket implementation) are all here: http://paste.lisp.org/display/130430

(define-syntax push
  (syntax-rules ()
    ((push x xs)
     (set! xs (cons x xs)))))
(define-syntax pop
  (syntax-rules ()
    ((pop xs) (let ((x (car xs)))
                (set! xs (cdr xs))
                x))))

(define (write/safe x)
  (if (not (pair? x))
      (write/fast x)
      (let ((refcount (make-hasheq)))
        (let ((xs null))
          (hash-set! refcount x 1)
          (push x xs)
          (do ()
            ((null? xs))
            (let ((u (pop xs)))
              (let ((a (car u)) (b (cdr u)))
                (when (pair? a)
                  (if (hash-ref refcount a #f)
                      (hash-set! refcount a (+ 1 (hash-ref refcount a)))
                      (begin (hash-set! refcount a 1)
                             (push a xs))))
                (when (pair? b)
                  (if (hash-ref refcount b #f)
                      (hash-set! refcount b (+ 1 (hash-ref refcount b)))
                      (begin (hash-set! refcount b 1)
                             (push b xs))))))))
        ;now for gc
        (let ((xs (list x))
              (n (hash-count refcount)))
          (do ()
            ((null? xs))
            (let* ((u (pop xs))
                   (h (- (hash-ref refcount u) 1)))
              (hash-set! refcount u h)
              (when (zero? h)
                (when (pair? (car u)) (push (car u) xs))
                (when (pair? (cdr u)) (push (cdr u) xs))
                (set! n (- n 1)))))
          (if (zero? n)
              (write/fast x)
              (error "There must be a cycle somewhere" x))))))

(define (write/fast x)
  (if (pair? x)
      (begin (display #\()
             (write/fast-rest x))
      (write x)))

(define (write/fast-rest x)
  (write/fast (car x))
  (cond ((pair? (cdr x))
         (display #\space)
         (write/fast-rest (cdr x)))
        ((null? (cdr x))
         (display #\)))
        (else (display " . ")
              (write (cdr x))
              (display #\)))))

My conclusion is that it doesn't make much of a difference unless you lack a good hash-eq implementation.

--

John Cowan writes:

> Lastly, write/small-fast is perfectly safe if you know you don't construct
> cycles, and lots of programs do know that (if they don't use set-car! or
> set-cdr!, for example).

Unsafe fixnum operations are perfectly safe if you know you're not using any integers that could be outside of your implementation's fixnum limit, and many programs do know that at least in some places (if they iterate over an integer that starts at zero and is incremented until it reaches an integer that is the length of a data structure small enough to be stored in memory, for example).  It would appear that the cost of the checks done by write/safe is comparable to the cost of checking for non-fixnums--or non-numbers, for that matter--or array bounds-checking.

Hmm... but how are things like bounds-checking handled?  On the documentation of (vector-ref vector k), r7rs-draft-6 says: "It is an error if k is not a valid index of vector."  I believe that language is permissive of, for example, segfaulting and crashing on an invalid index.  The same philosophy applied to "write/safe" would seem to indicate writing "It is an error to pass a circular structure to write/safe", which means a conforming implementation could just loop infinitely, and the name "write/safe" would be misleading--should be "write/simple" or something.  Well, at least that is different from not saying anything at all about the circular case.

Also, the cases are not fully analogous, because there is also write/shared and the default behavior of "write" to think about.  Hmm... But it does seem that the Scheme standard does not and should not talk about unsafe procedures, or about safe versions of procedures.  write/fast should not exist; there might be write/shared, and there might be write/safe (named something like write/simple, perhaps, to which "it is an error" to pass cycles), and there might be Racket's "write" that acts like write/safe but degrades to write/shared on cycles.  Any of these might be declared the default "write" (you'd vote on that, I'm not sure which I'd prefer), and the user should be able to use any of them (with parameters or with differently-named procedures).
--John Boyle
Science is what we understand well enough to explain to a computer. Art is everything else we do. --Knuth



On Sat, Jul 7, 2012 at 11:50 AM, Aaron W. Hsu <arcfide@x> wrote:
John Cowan <cowan@x> wrote:

> Aaron W. Hsu scripsit:
>
> >     write/safe
> >     write/shared
> >     write/dangerous-stupid-do-not-use-unless-you-think-you-are-neo
>
> Less tendentiously, write/small-fast.

Have we done any benchmarks to see how fast this really is in practice?
It seems like we are assuming it is faster, but I do not remember seeing
anything indicating that it was really significantly faster than
WRITE/SAFE. IMO, the lack of safety of WRITE/SMALL-FAST is hard to justify.

> I assume that the first one writes datum labels for cycles but not
> for shared structure?  That would seem to violate the whole notion of
> read-write equivalence: you write out a value with shared structure as
> a datum, but you read it back as a value without shared structure.

It is WRITE/SMALL-FAST when it is safe to do so, but otherwise it is
WRITE/SHARED. Of course if violates read/write equivalence, but so does
normal WRITE with regards to shared structure. As has been pointed out,
read-write equivalence is useful, but it is not the only valid or useful
way of thinking about these things. In particular, backwards compatibility
and pretty printing of code that can be copied back in to a REPL and so
forth. Most of the time you do not need or want to have read-write
equivalence because other things are of more value, such as easier reading,
for humans.

I prefer that we get rid of WRITE/SMALL-FAST and leave only WRITE/SAFE as
WRITE and WRITE/SHARED as either a parameter or an extra procedure. If it
matters that much that they want an unsafe WRITE, then let implementations
provide this separate of the standard, or let them have an implementation
flag that makes WRITE unsafe for cycles. That seems the better approach to
me.

--
Aaron W. Hsu | arcfide@x | http://www.sacrideo.us
Programming is just another word for the lost art of thinking.

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports