[Scheme-reports] Errors (?) in 7.2, Formal Semantics

I'm trying to grok the formal semantics section, 7.2, of r7rs small.

It seems to me there are numerous small errors, but perhaps I'm just confused.

I don't see any errata for the formal section on the web page; and I couldn't find a better place to submit bugs....so, here goes:

;;;;
;; car-internal
;;;;

As written, the declared type of car-internal is "_expression_-value to
_expression_-continuation to command-continuation" (E->K->C).

However, the definition starts with "lambda ewk", and results in calling hold on two
args (and that results in a command-continuation). So the type of the
definition is "_expression_-value to dynamic-point to _expression_-continuation
to command-continuation" (E->P->K->C).

Based on how car-internal is called, the fix appears to be: simply delete
the w in the definition (that is, delete the \omega). In other words, the
type statement is correct, but the definition is not.

;;;;
;; valueslist
;;;;

As-is the definition of valueslist resembles:

;; E->K->C
(define valueslist
(lambda (e k)
(if (pair? e)
(cdr-internal e
(lambda (e*) ; (0)
(valueslist e* ; (1)
(lambda (e**)
(car-internal e
(single (lambda (e+) ; (2)
(k (list* e+ e**)))) ; (3) list*=cons
))
))
)
;; not a pair
(if (null? e) (k (list)) (error "non-list argument to
*values-list*"))
)))

At (1), e* will be a (singleton) sequence of values, but values-list does
not accept a sequence of values as its first argument. Since car-internal
and cdr-internal work by sending a singleton sequence to their
_expression_-continuation, it seems that (1) should be:

(valueslist (first e*) ...) ; (A)

Alternatively, (0) could be:

(single (lambda (e++) (valueslist e++ ...)) ; (B)

That is perhaps more pleasing, since it would be symmetric with how
car-internal is called at (2).

On the other-hand, calling internal functions (bypassing type-checks)
suggests complete confidence, and so perhaps bypassing
argument-count checking (from single) is desirable as well. If so,
then (A) is preferable, and for symmetry, likewise (2) should be:

(lambda (e***) (k (list (first e***) e**) )) ; (A')

--

Incidentally, list->values seems more Scheme-y than valueslist.

;;;;
;; list
;;;;

In list, both list and cons are called. Both of those are supposed to take
a dynamic point as input, but neither use within list provides such an
input.

Supposing that the dynamic point doesn't need to change, then an \omega has
to be inserted before (single ...) and before the last occurrence of \kappa.

I don't (yet?) grok dynamic points, so that suggested fix
may very well be wrong; it's just the minimal thing that ensures type
correctness.

;;;;
;; cons
;;;;

The definition of cons flips the order of \omega and \kappa in its use of
twoarg:

\twoarg(\lambda \eps_1 \eps_2 \kappa \omega...

should be

twoarg(\lambda \eps_1 \eps_2 \omega \kappa ...

;;;;
;; lambda (rest-arg)
;;;;

While it's perfectly clear what is meant, the line that declares that the
meaning of "(\lambda I \gamma* E_0)" is the meaning of "(\lambda (. I)
\gamma* E_0)" is more or less breaking the rules. That is, "(. I)" is not
legal syntax. Or, well, the abstract syntax (7.2.1) declares that such is
valid, but the concrete syntax (7.1.3) specifically contradicts.

In an evaluation context, <> \concat <I> or (cons (list) (list I)) is
legitimate, in contrast with "(lambda (() . I) ...)"; that fact can be better
exploited to avoid needlessly cheating with the abstract syntax.

Namely, one can factor out the common meaning of the two cases at the
meta-level rather than at the syntactic level, something like:

\meaning{(\lambda (I* . I) \gamma^* E_0)} = \foo I* I \gamma^* E_0

\meaning{(\lambda I \gamma^* E_0)} = \foo <> I \gamma^* E_0

Obviously \foo needs a better name, perhaps \listify-args.

;;;;
;; Builtins
;;;;

"add" and "less" are orphans. "eqv" may also be. (They are unreferenced
in the rest of the formal semantics.) (When the semantics require such
operations, "+", "<", and so forth of the meta-language are used directly.)

Perhaps just delete them? One could posit that the initial environment
extends "\rho('+) = add" and "\rho('<) = less", which perhaps makes their
definitions more interesting to a reader. Still, there are many further
required primitives; why do these particular primitives get special
treatment?

Worse, the definitions given aren't (entirely) correct.

That is, implementations are specifically licensed (elsewhere in the
report) to implement less than the mathematical ideal; but the definitions
given don't appear to allow such restrictions.

Addition (object-level), in particular, will, in any implementation,
eventually run out of memory (for bignums), or otherwise fail to live up to
the mathematical ideal ([signalling?] errors on fixnum overflow, for
implementations without bignums). However, as written, the formal
semantics imply that object-level addition can only fail due to wrong
number (from twoarg) or wrong type of arguments.

I suppose one could take the stance that the meta-function "in" (injection
into an object-level domain, in this case, numbers) is where
implementations are allowed to hang restrictions. But then it wouldn't be
a meta-function; it would be a meta-procedure. (Since understanding the
meaning of such injection failures, as written, calls for a procedural
reading of the meta-language. Contrast with "new": "in" lacks the "+
{error}" hook in its range, and its uses lack the explicit branch to
implementation-dependent behaviour.)

It seems the formal semantics go to great lengths to provide a purely
functional meta-language; it seems a cop-out to attempt to gloss over
injection failures using a procedural mechanism.

From what I've seen of other treatments of formal semantics, the trick
seems to be to add a type of primitive object-level functions with
unspecified behaviour and constituents. ("delta rules"? Or is it "gamma
rules"?).

That is, one could have semantic types such as:

O_2 = E \times E \to E "primitive binary operations"
O_1 = E \to E "primitive unary operations"

which could be injected to be members of F using an appropriate
meta-function. Something like:

inject = \lambda o . \sequence{L_o}{\lambda e* \omega \kappa . \kappa (o e*)} in F

For example, L_+ is what the symbol '+ would be bound to in the base
environment, and the base store would then map L_+ to <inject +, false>.
(There isn't a need to dance around with "new" to get locations for the
builtins, of course.)

("false" here means that the initial bindings would be immutable, which I
think is correct for R7.)

;;;;
;; Rebinding lambda (Keywords?!)
;;;;

The formal semantics do not capture the meaning of:

(let ((lambda list) (a 1) (b 2)) (lambda (list a b) (list a b)) )

The proper meaning is '((1 2) (1 2)). But as written, (after desugaring
the let) the formal interpretation results in a procedure equivalent to the
one defined by:

(define apply/2 (lambda (x y z) (x y z)))

That is because, according to the _expression_ semantic function, the
interpretation of symbols "lambda", "if", and "set!" are immune to
rebinding. Or rather, the interpretation of lambda-forms, if-forms, and
set!-forms are immune --- the symbols themselves can be rebound. That
results in a very mild case of lisp-2-ism:

(lambda (lambda) (lambda (x) lambda))
;; the K-combinator according to the current formal treatment.

One can try to wriggle out of this by arguing that the formal semantics are
at the level of entirely desugared scheme; that is, one could try to push
upon the macro-expander the responsibility to properly implement lexical
scope for lambda, if, and set!.

Since Schemes' macro-expanders are, nowadays, hygienic by default, they
*can* be called upon to rewrite:

(define evil ((lambda (lambda a b) (lambda (list a b) (list a b))) list 1
2))

into something resembling:

(define evil ((lambda (lambda_ a_ b_) (lambda_ (list a_ b_) (list a_ b_)))
list 1 2))

But that's strange. One of the most important things the formal semantics
exists to explain is the meaning of lexical scope (and how that interacts
with continuations, errors, ...).

As it stands, the definition of lambda does precisely that in a completely
general and correct fashion for all but three identifiers. For that
matter, given the chance, the existing formal implementation of lambda
would work in those cases, too; except that the meaning meta-function jumps
the gun.

If instead those three cases were written along the following lines, then
the formal semantics would completely capture the Scheme spirit:

\meaning{(lambda E*)} = \lambda \rho .
\syntax{lambda} \notin \domain{\rho}
\then \base-lambda E* \rho
\else \lambda \omega \kappa .
\hold (\lookup \rho \syntax{lambda}) ...
where
\base-lambda = \lambda E* . \lambda \rho \omega \kappa ...

(I.e., check for rebinding, then go to either the identifier case, or the
initial meaning.)

I would not be surprised if there was some more elegant way to implement the semantic branch.

;;;;
;; Lambda (clarity)
;;;;

The definition of lambda uses "in E" to project meta-functions into
object-functions. "in F" would make it clearer what is going on, in
particular, what the significance of the pair is. Note that almost every other use of
symbolic meta-cons and friends (\langle\rangle, \down, \dag) has to do with
argument passing rather than procedure construction.

call/cc also builds an object-level closure, and likewise its use of "in E"
would be clearer as "in F".

A meta-function "make-closure: L \to (E* \to P \to K \to C) \to F" would be
clearer and better still, of course. (At least, having the code respect
some sort of encapsulation of a closure type rather than manipulating its
guts directly is often regarded as superior from a software engineering
standpoint.)

The only other uses of meta-cons and friends is in the continuation-walking
code, which is not confusing, because they are being used to manipulate
sequences proper (rather than being used to fake up a record type).

-Will