[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] (read|write)-char [was Opinion about R7RS]



    Thanks for your message, I'm answering late, but was far from any  
computer for 3 days.

John Cowan <cowan@x> wrote:

> Jean-Michel HUFFLEN scripsit:
>
>> The description of the "read-char" function does not mention any
>> encoding.
> (...)
>> Is there a default encoding?
>
> The default encoding is implementation-dependent.
> (...)
> The difficulty is that the implementation language being used may not
> expose such a list (neither Java nor C# does so, for example).

    I understand what you mean. However, the fact that neither Java  
nor C# does so is a *defect*, from my point of view. I think that  
Scheme should not inherit such defects. Besides, if we consider the  
R7RS draft, an implementation of Scheme may provide calculations about  
the full range of Unicode characters - recognising accented letters by  
means of the "char-alphabetic?" function, for example - but be limited  
to ASCII characters when reading or writing files. From my point of  
view, that is nonsense.

    In addition, a tremendious progress of R7RS is that some  
situations can be handled more easily than in R(5|6)RS. For example,  
if "open-input-file" cannot open a port, an error is signalled and can  
be handled as such, whereas pathological cases for "open-input-file"  
were unspecified in R(5|6)RS.

    Let us come back to "write-char", the previous implementations of  
this function just had to check that the character was valid w.r.t.  
the ASCII encoding. Now, if we open an output port with the Latin-1  
encoding, some valid characters of Unicode cannot be written: what  
happens in such a case? If we know the encoding used, we can deal with  
such situation, but if we don't... If you look into other languages,  
you can tell me that CPython can deal with such situation, whereas  
Jython cannot. Anyway, I would hope that most of Scheme  
implementations would be able, too...

    Personally, I think that the "minimal" requirements should be:
    - if a Scheme interpreter deals with the full range of Unicode, it  
should be able to read and write files encoded w.r.t. UTF-8;
    - otherwise, it should read and write ASCII files.
    Of course, Scheme implementations are free to deal with other  
encodings such that Latin-1, Latin-2, etc.

    Since UTF-8 is becoming a standard - more than other encodings  
such that UTF-16, for example - this convention does not seem to me to  
be heretic.

> (...)
>> Let us consider the "char-alphabetic?" function implemented by
>> a Scheme interpreter that only implements the Latin 1 encoding.
>> In particular, it implements ASCII, so that is permitted, but it does
>> not implement the full range of Unicode.  What would be the answer
>> if this "char-alphabetic?"  function is applied to the letter "e with
>> acute accent"?  Should this interpreter deal with Unicode properties
>> as far as possible, or can it answer #f since it not Unicode-compliant.
>
> It must return #t, because all the properties of whatever characters
> the implementation provides must be correct according to Unicode.

    OK, but perhaps the draft may make precise that the Unicode-based  
rules apply even only a proper subset of Unicode is processed.

    Cheers,

J.-M.

P.S.: I wrote a short comparison of methods used for reading and  
writing files regarding different encodings in Emacs-Lisp,  
(CP|J)ython, Ruby, etc. It is in French, but if people are interested,  
I can translate it in Eglish, and put it onto my Web page.

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports