[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Sequence to sequence conversion

To: Alex Shinn <alexshinn@x>
Subject: Re: [Scheme-reports] Sequence to sequence conversion
From: Marc Feeley <feeley@x>
Date: Mon, 2 Jul 2012 08:33:50 -0400
Cc: scheme-reports <scheme-reports@x>
In-reply-to: <CAMMPzYNYdtD=peiF3xR2YHXABpTwiY5tq+03em1eUOqD=9tSeg@mail.gmail.com>
References: <1CDD13A2-4DDF-4126-A15B-8B270772ED5C@iro.umontreal.ca> <CAMMPzYNYdtD=peiF3xR2YHXABpTwiY5tq+03em1eUOqD=9tSeg@mail.gmail.com>

On 2012-07-01, at 4:39 PM, Alex Shinn wrote:

> On Sun, Jul 1, 2012 at 10:19 PM, Marc Feeley <feeley@x> wrote:
>> The R5RS has the following sequence to sequence conversion procedures:
>> 
>>    list->string, and string->list
>>    list->vector, and vector->list
>> 
>> The R7RS is adding bytevector sequences, but it does not add the conversion procedures:
>> 
>>    list->bytevector, and bytevector->list
>> 
>> What is the rationale for this inconsistency?
>> 
>> Moreover, the R7RS is adding only the first set of these conversion procedures:
>> 
>>    vector->string, and string->vector
>>    bytevector->string, and string->bytevector  (not in R7RS)
>>    vector->bytevector, and bytevector->vector  (not in R7RS)
> 
> Actually, we have the second, it's just named
> utf8->string and string->utf8 to emphasize the
> encoding used to convert to and from a bytevector.

Not really.  I expected bytevector->string to be equal to

       (lambda (bv) (list->string (map integer->char (bytevector->list bv))))

which would correspond I guess to a latin1->string functionality with your naming Scheme.

Concerning utf8->string and string->utf8, I dislike these procedures for many reasons:

1) Very minor point: the official name for this encoding is UTF-8, so it should be UTF-8->string and string->UTF-8.

2) The procedures specify in their names the character encoding to use.  But there are oodles of character encodings, so for easy extensibility to other encodings, it would be better to use a parameter as in (decode-string bytevector 'UTF-8) and (encode-string string 'UTF-8) instead of oodles of different procedures.

3) The main reason for character encodings is to perform I/O on byte-oriented streams.  Yet the only procedures having to do with character encodings in R7RS are utf8->string and string->utf8.  This seems wrong.  If textual output could be performed on binary ports and the character encoding could be specified when the port is opened (as was proposed in SRFI-91, http://srfi.schemers.org/srfi-91/srfi-91.html, and implemented in Gambit), then the procedures utf8->string and string->utf8 would be superfluous since they could be defined easily like this:

    (define (string->utf8 s)
      (let ((port (open-output-bytevector 'UTF-8)))
        (display s port)
        (get-output-bytevector port)))

Marc

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Follow-Ups:
- Re: [Scheme-reports] Sequence to sequence conversion
  - From: Ray Dillinger <bear@x>
- Re: [Scheme-reports] Sequence to sequence conversion
  - From: John Cowan <cowan@x>

References:
- [Scheme-reports] Sequence to sequence conversion
  - From: Marc Feeley <feeley@x>
- Re: [Scheme-reports] Sequence to sequence conversion
  - From: Alex Shinn <alexshinn@x>

Prev by Date: Re: [Scheme-reports] digit-value
Next by Date: Re: [Scheme-reports] Generalization of append, map, and for-each to other sequences
Previous by thread: Re: [Scheme-reports] Sequence to sequence conversion
Next by thread: Re: [Scheme-reports] Sequence to sequence conversion
Index(es):
- Date
- Thread