[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Scheme-reports] (read|write)-char [was Opinion about R7RS]
Thanks for your message, I'm answering late, but was far from any
computer for 3 days.
John Cowan <cowan@x> wrote:
> Jean-Michel HUFFLEN scripsit:
>
>> The description of the "read-char" function does not mention any
>> encoding.
> (...)
>> Is there a default encoding?
>
> The default encoding is implementation-dependent.
> (...)
> The difficulty is that the implementation language being used may not
> expose such a list (neither Java nor C# does so, for example).
I understand what you mean. However, the fact that neither Java
nor C# does so is a *defect*, from my point of view. I think that
Scheme should not inherit such defects. Besides, if we consider the
R7RS draft, an implementation of Scheme may provide calculations about
the full range of Unicode characters - recognising accented letters by
means of the "char-alphabetic?" function, for example - but be limited
to ASCII characters when reading or writing files. From my point of
view, that is nonsense.
In addition, a tremendious progress of R7RS is that some
situations can be handled more easily than in R(5|6)RS. For example,
if "open-input-file" cannot open a port, an error is signalled and can
be handled as such, whereas pathological cases for "open-input-file"
were unspecified in R(5|6)RS.
Let us come back to "write-char", the previous implementations of
this function just had to check that the character was valid w.r.t.
the ASCII encoding. Now, if we open an output port with the Latin-1
encoding, some valid characters of Unicode cannot be written: what
happens in such a case? If we know the encoding used, we can deal with
such situation, but if we don't... If you look into other languages,
you can tell me that CPython can deal with such situation, whereas
Jython cannot. Anyway, I would hope that most of Scheme
implementations would be able, too...
Personally, I think that the "minimal" requirements should be:
- if a Scheme interpreter deals with the full range of Unicode, it
should be able to read and write files encoded w.r.t. UTF-8;
- otherwise, it should read and write ASCII files.
Of course, Scheme implementations are free to deal with other
encodings such that Latin-1, Latin-2, etc.
Since UTF-8 is becoming a standard - more than other encodings
such that UTF-16, for example - this convention does not seem to me to
be heretic.
> (...)
>> Let us consider the "char-alphabetic?" function implemented by
>> a Scheme interpreter that only implements the Latin 1 encoding.
>> In particular, it implements ASCII, so that is permitted, but it does
>> not implement the full range of Unicode. What would be the answer
>> if this "char-alphabetic?" function is applied to the letter "e with
>> acute accent"? Should this interpreter deal with Unicode properties
>> as far as possible, or can it answer #f since it not Unicode-compliant.
>
> It must return #t, because all the properties of whatever characters
> the implementation provides must be correct according to Unicode.
OK, but perhaps the draft may make precise that the Unicode-based
rules apply even only a proper subset of Unicode is processed.
Cheers,
J.-M.
P.S.: I wrote a short comparison of methods used for reading and
writing files regarding different encodings in Emacs-Lisp,
(CP|J)ython, Ruby, etc. It is in French, but if people are interested,
I can translate it in Eglish, and put it onto my Web page.
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports