[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] fresh empty strings



On 01/24/2012 10:58 AM, John Cowan wrote:
> Per Bothner scripsit:
>
>> (Kawa does use separate Java classes for mutable and immutable strings,
>> though it didn't used to - and I'm thinking about adding another class to
>> support O(1) indexing of strings containing non-basic-plane characters.)
>
> There's a case to be made for using three classes for the Latin-1, BMP, and
> full Unicode repertoires.  I know of one (non-Scheme) package that does this,
> plus using java.lang.String for immutable strings.
>
> On the other hand, in fairly capacious environments like the desktop,
> it may be the Right Thing to use only 32-bit mutable strings, especially
> considering how much more common string literals are in typical Scheme code.

First, of course you have non-BMP string literals, which you'd also want
O(1) indexing.  Secondly, having the data buffer be 16-bit Unicode (either
a java.lang.String or an array of 16-bit chars) may be desirable for
interoperality - it makes the toString operation cheap.  If so, a
separate indexing table or cache may make sense.  Assuming most string
indexing will be increasing by one, a single-element cache of the
most recent (charpoint-index, buffer-index) may work well, though I don't
like a mutable cache when the string is immutable.  For immutable strings,
having an array that maps (say) every 16th code point to its buffer position
may be a good compromise between space and time.  One can do the same for
mutable strings, but of course updates are more complicated to make 
efficient.
-- 
	--Per Bothner
per@x   http://per.bothner.com/

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports