[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Scheme-reports] I/O redux

This email contains follow-up on some old points on I/O, and a few new ones. 

1. [Old] Re my proposal for STANDARD-{INPUT,OUTPUT,ERROR}-PORT. John Cowan (I think) felt that these were useless. I'm not a big one for rebinding/mutating current I/O streams; I prefer normally to use ports directly, or write small blocks of code that use WITH-{INPUT-FROM,OUTPUT-TO}-FILE. However, in a messy enough program that's constantly switching current ports all over the place, it's convenient to be able to access the standard ports directly. Obviously, three lines of code at the beginning of the program will capture them, but I'd still like to see them brought into the standard. I don't feel strongly on this matter, but thought I'd give it a second kick at the can. 

Also, are these ports always defined? Is it possible that CURRENT-INPUT-PORT might not be set at all, or might have value #f, in some cases? If I'm not mistaken, a Windows executable has no standard input or output. 

I would also put in a weak suggestion for CONSOLE-INPUT-PORT and CONSOLE-OUTPUT-PORT, for situations where the I/O has to be from/to the REAL terminal, with a proviso that these might not be available under some implementations (i.e., their value is #f). I don't feel strongly about this, but thought I'd toss it in for consideration. 

2. [OLD] The fact that IEEE Scheme is required to be a subset of WG1 is sufficient reason to include CHAR-READY? and U8-READY?. However, given the difficulty of implementing them correctly in many environments, it's also reasonable to discourage programs from using them. A careful reading of the CHAR-READY? entry shows that it's possible that CHAR-READY? returns #f when there actually is a character available [*], which exactly matches the case where you can only find out whether there's any data by attempting to read. This is either accidental or a brilliant example of VERY careful language lawyering!  I would suggest clarifying this point by adding some remark about some environments making it extremely difficult to implement CHAR-READY reliably, so it might return #f when a character is available, and adding a similar remark to the U8-READY? entry.

[*] Technically, CHAR-READY? is to return #f `otherwise', when no character is available. However, nobody can distinguish the case where CHAR-READY? outright lies, claiming there's nothing there when there is, from the case where there really WAS no character available, and then 1 zeptosecond later one appeared. It is not stated that, at the moment a character would be read successfully, CHAR-READY?, if called, would have to return #t. 

[I assume that few if any implementations would use non-blocking I/O just so they can support CHAR-READY? correctly.]

3. [Old] I had suggested adding a remark that some implementations support other kinds of sources and sinks beside files (and devices). John remarked that this is addressed in the first paragraph of §6.7. That says that other kinds of ports besides binary and character might be provided, which is a different point. My remark was aimed at conveying that an implementation might provide other kinds of binary/character ports that the procedures in §§6.7.2 and 6.7.3 will handle. 

4. [Old] I had expressed confusion about the notion that binary ports inherently support character operations. This morning I had an epiphany on this subject. To me, a `binary port' is a port that is used to read or write successive octets, while a `character port' contains additional encoding support, even if it's just end-of-line translation. Thus in C-derived I/O systems one might do a fopen(filnam, "r") for character reading, and fopen(filnam, "rb") for binary reading.

This is NOT how these terms are used in the Report! A binary port is one whose backing store (on disk or elsewhere) contains octets, while a character port (e.g., a string port) has a backing store containing Scheme characters. The term `binary' doesn't refer to reading or writing in binary mode, but to the type of backing store the port uses. This is implied, but not stated, by the current wording, leaving people like me relatively free to misunderstand the point. 

Short of changing the terminology, which may not be practical, perhaps a sentence or two defining these terms more precisely could be added. 

5. [New] §6.7.1, bottom of col 2, p. 45. WITH-INPUT-FROM-FILE and WITH-OUTPUT-TO-FILE are defined, but should not WITH-ERROR-TO-FILE also be added? 

6. [New] Most implementations provide a procedure named something like READ-LINE that reads the next line from an input port. Processing a file by lines is an extremely common paradigm, and should therefore be supported. (I can rant at great length about why this should be here, but I'll spare you my ranting on this point unless you think it's needed :). 

7. [New] What happens if both READ-CHAR and READ-U8 are used on the same port? I can envision several possible answers. 

  A. legal 
  B. `it is an error'
  C. `an error is signalled' 
  D. implementation-defined, might be an error in some or all cases

The example I've been thinking of is a UTF-8 encoded file in which one reads the first octet of a character via READ-U8 and then attempts to do a READ-CHAR. 

If those were the options, I'd vote for D, which allows the implementation to provide additional ways of resynchronizing (e.g., by rewinding the file) that are outside the scope of WG1. B is also fine; C is implementable; one just needs a tri-state variable (neutral/char/u8) in each port, but I'd question the point of doing this. I'm not sure that A makes any sense. 

I don't much care which option (or some other one) is selected, but it's important to say what happens.

Writing doesn't suffer from this problem, I'm not sure if symmetry is important or not. 

8. [New] §6.7.4: LOAD/INCLUDE. Some implementations use LOAD's argument to name a file, others do some kind of path search, or do some other transformation on the name. Gambit, for example, uses a prefix of ~~/ to signify looking in the Gambit directory. I suggest replacing `_filename_ should be a string naming an existing file containing Scheme source code' with `An implementation-dependent operation is used to transform _filename_ into the name of an existing file containing Scheme source code'. Whether the parameter name should still be _filename_ is not for me to say. 

-- vincent
Scheme-reports mailing list