I agree with the conclusions of Alaric Snell-Pym, Alex Shinn, Sussman, and others who prefer the R5RS closure-identity semantics. I regard this aspect of the R7RS-small draft as an unfortunate accident.
However, I would point out that an R7RS-small-conforming implementation is nevertheless free to *provide* R5RS closure-identity semantics, and to advertise that it does so (perhaps with a feature identifier). Furthermore, WG2 could require the R5RS semantics in R7RS-large--it would be *true* that every program that conformed to the R7RS-small specifications (and relied only on guaranteed features) would also conform to the R7RS-large specifications. And I suspect that there will be a standard "r7rs-large" feature identifier, which would then suffice to establish "yes, closure identity works" even in the absence of a standardized name for the closure-identity feature itself.
Might this be an acceptable hack?
Meanwhile, I think I have something to add to Alex Shinn's description of the issues in question. I will go into more detail, draw comparisons, make some arguments, suggest alternate approaches, and conclude that the R6RS semantics are less helpful than one might have thought. As a disclaimer, my knowledge about compilers is that of an amateur and student, and I have not written one myself.
Let's look at optimizing Will Clinger's example. At some point during compilation, there is usually a phase where the magic construct of lambda gets turned into the allocation of a mundane data structure--a closure, which is usually a record containing a machine code pointer and the variables that the closure saves. Thus, the definition of g will look something like this:
(define (g i)
(let ((h (make-closure ...)))
(if (= i 0)
(if (= i m)
(list h)
#f)
(h (- i 1)))))
At this point[1], we may beta-expand the call to h, as many have said. (According to Andrew Appel, it is beta-contraction when you substitute the body of a non-escaping procedure in its only calling place, which is guaranteed to reduce code size since the original copy of the procedure may be deleted. If there are multiple calling places or if the procedure escapes, then code size may end up increasing--hence beta-expansion, which can lead to infinite unrolling of loops if you're not careful--compilers must use conservative heuristics to decide whether to beta-expand.) If we do so, the result will look like this:
(define (g i)
(let ((h (make-closure ...)))
(if (= i 0)
(if (= i m)
(list h)
#f)
(let ((j (- i 1)))
(if (= j 0)
#t
(g (- j 1)))))))
And now we observe that h is only used in the then-branch of the if. We apply this principle: If "constructor" is a nice, safe procedure that doesn't cause side effects (including raising exceptions), and if "var" is not used (or modified) in the <test> or <else> expressions, and if <test> doesn't modify any of "x y ..." or leak continuations[2], then it is permissible to shift the binding around as shown below. Note that this applies to any constructor, be it "cons", "make-closure", "vector", and so on:
(let ((var (constructor x y ...)))
(if <test>
<then>
<else>))
; ==>
(if <test>
(let ((var (constructor x y ...)))
<then>)
<else>)
In this case, a compiler should know that its = function doesn't modify anything, and it can hoist the binding of h into the then-clause. Then an h-closure will only be created once out of the n times the g function is called, which will make Will Clinger happy. :-) Incidentally, we can hoist the h-binding into the then-branch of the second "if", making the closure created only when it is immediately going to be used:
(if (= i 0)
(if (= i m)
(let ((h (make-closure ...)))
(list h))
#f)
(let ((j (- i 1)))
(if (= j 0)
#t
(g (- j 1)))))
It remains to be seen whether existing compilers can do this.[3] I think I have shown it to be relatively straightforward (compared to what else one might expect of a compiler that aspires to give good performance), although here my lack of firsthand knowledge may demonstrate itself.
Now. As Alex Shinn says, this is not the strongest example in favor of the R6RS semantics. I will now address his example. After partial closure conversion and after beta-expansion of the call to h, we get this:
(define (g i)
(let ((h (make-closure ...)))
(when (= i n)
(debug h))
(if (= i 0)
(if (= i m)
(list h)
#f)
(let ((j (- i 1)))
(if (= j 0)
#t
(g (- j 1)))))))
We see that there are two places where h is referenced, each within an unlikely branch of an if. As Alex Shinn says, either of them may cause h to "escape" and get stored somewhere, for arbitrary use by the rest of the program. If we are to maintain object identity, these had better both refer to the same h.
Our devil's advocate says that if we can break object identity, then it might be nice to make the "h" in "(debug h)" refer to a different copy of h, achieved by duplicating the constructor _expression_, like this:
(define (g i)
(let ((h (make-closure ...)))
(when (= i n)
(let ((h2 (make-closure ...)))
(debug h2)))
(if (= i 0)
(if (= i m)
(list h)
#f)
(let ((j (- i 1)))
(if (= j 0)
#t
(g (- j 1)))))))
In that case, h is left being used in only one place, and we will be able to hoist its binding all the way down into the "(list h)" _expression_. Then both "make-closure" expressions will reside in if-branches that are rarely executed, and the loop will almost never have to construct a closure. Our devil's advocate rests his case.
Well. First, I would argue (as others have) that one can make exactly the same argument about *any* data structure. Replace "make-closure" with "cons", "vector", "string-append", "reverse", or many other things. It is quite true that if you didn't have to maintain object identity for X other data structure, then you could make this splitting optimization and your code would run faster. One might argue that closures are special in one way or another, but I would be unimpressed.
Second, there are, in fact, general ways to avoid unnecessary allocation of these data structures [and note that this technique applies to all data structures, not just closures--I'm a big fan of uniformity]. Consider this scheme: we introduce another variable, "h-allocated", and initially bind it and "h" to #f. Then we ensure that all code-paths that lead to "h" include this _expression_:
(unless h-allocated
(set! h (make-closure ...))
(set! h-allocated #t))
As a matter of fact, if we know that h should only ever be bound to a closure, then we can make the variable h do double duty. (Even in the general case, where h could be bound to any runtime value, we can create a special value like a gensym that will only exist if h has not been initialized yet. Note that the compiler can be quite sure that this special value will not escape.) Thus:
(define (g i)
(let ((h #<uninitialized>))
(when (= i n)
(when (eq? h #<uninitialized>)
(set! h (make-closure ...)))
(debug h))
(if (= i 0)
(if (= i m)
(begin (when (eq? h #<uninitialized>)
(set! h (make-closure ...)))
(list h))
#f)
(let ((j (- i 1)))
(if (= j 0)
#t
(g (- j 1)))))))
This slightly increases code size, and does still require installing the #<uninitialized> value in the variable "h" (which might occupy a register or a stack slot) on each iteration of the loop. I suspect this is exactly what Sussman alluded to. This is essentially having "make-closure" or whatever other constructor be lazy. [If you're worried about continuations and threads, use CMPXCHG to install the closure into h, and if someone beats you to the punch, use their value.]
Making all allocations lazy would be a pessimization, of course--laziness has overhead and is only useful if the object is often not needed. A compiler would have to be taught how to recognize cases where it made sense to introduce the laziness--e.g. if the main path of a loop didn't use the structure, and the only paths that did use it already used other slow operations, then one could conclude that adding the laziness would introduce a small slowdown in the worst case and a large speedup in the optimistic case. A runtime that collected information on how often each path was taken would, of course, be well equipped to make good decisions.
But, of course, in the best case this is still a bit slower than if you just broke object identity and had separate allocs. The R6RS semantics give the compiler strictly more power. I argue here that the extra power it gets, compared to the cost imposed on the programmer's brain and our aesthetics, is not significant--for example, contra what I get from some previous rationales, it is *not* the only seemingly-practical way to turn a loop that always allocates memory into a loop that almost never allocates memory. (If someone feels like testing Larceny again, how does my "#<uninitialized>" version compare to the default version and to the R6RS-modified version of Alex Shinn's example? Use #f for #<uninitialized>. The performance will probably be somewhat different than if the tweaking was done as an integral part of the compiler, but oh well.)
-- John Boyle
----
[1] The erudite reader may point out that (a) after closure-conversion, closures are usually passed their envs or even themselves as an extra argument, (b) often variables are renamed to avoid conflicts before any of this inlining is done, (c) often the code is converted to continuation-passing style or some similar form before any of this, and so on. Luckily, these issues need not be paid attention in this example.
[2] At least, I assume it is a problem if your code binds x to a new cons cell and then saves a continuation, but later calls to the continuation bind x to new cons cells that aren't all identical to the first one. The Racket compiler disagrees with me on this.
http://pastebin.com/7MmkGA1n I suspect this is a bug.
[3] Racket, for one, appears able to allocate 0 bytes per call when you pass an even argument to the following function [it allocates 32 bytes, its size of cons cell on a 64-bit machine, when passed odd arguments]:
(define (meh x)
(let ((y (cons x 3)))
(if (even? x)
x
y)))
This works also for binding y to "(vector x 10)". Curiously, if you instead bind y to "(lambda (u) (+ x u))", it allocates 32 bytes per call on both even and odd arguments. It would appear that the Racket compiler thinks of closures as being different kinds of objects than conses and vectors; methinks the Racket devs will have to duplicate some functionality in their compiler to optimize that case. (I don't particularly intend to criticize Racket, by the way; I know more criticizable things about it than about other implementations because I use it the most.)