[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Scheme-reports] Seeking review of sets and hash tables proposals



On Sun, May 26, 2013 at 12:04 PM, John Cowan <cowan@x> wrote:
> Alexey Radul scripsit:
>> I am fond of the following universal lookup procedure [1]:
>>   (hash-table-search table key
>>     (lambda (value update remove)
>>       ...)
>>     (lambda (insert)
>>       ...))
>
> Nice one.  Added.

:)

>> If a set is to be implemented as a hash table, how is an appropriate
>> hash function to be derived from the equivalence predicate?
>
> This is precisely why we need a way to bind them together.  Currently,
> however, going beyond the Five Equivalences for sets/bags is an option
> implementations need not support.

Why not just make it a data structure?  Call them ontologies, or
signatures (after the concept from universal algebra), and say that a
constructor for a hash table (or a set) accepts an ontology over the
objects that it is to contain.

What's an ontology?  Well, for a hash table or a set, an ontology must
have a member called 'equivalence-predicate', which must implement an
equivalence predicate over all objects that might be used as keys in
the hash table (or elements of the set, respectively).  For a hash
table, the ontology must also have a member called 'hash-function',
which must implement a hash function compatible with the equivalence
predicate of that ontology.  The ontology could also contain other
information, such as: a member called 'total-order', which implements
comparison with some particular interface, that must be compatible
with the equivalence predicate; a member called 'monoid-product',
which implements an associative binary operation (also compatible with
the equivalence); a member called 'monoid-unit' which is an identity
element for the monoid product; etc.  A clever implementation of sets
could then examine the ontology and use a backing hash table if the
ontology contains a hash function, or a backing binary tree if the
ontology contains a total order, or fall back to lists if the ontology
contains neither of those things.

An implementation of hash tables or sets could provide several
predefined ontologies, appropriate for the equivalence predicates eq?,
eqv?, equal?, string=?, and string-ci=?.  The predefined ontologies
need not be transparent, so the hash function corresponding to eq?
can remain hidden.  The proposal need not require an implementation to
provide mechanisms for the construction of user-defined ontologies,
though it could perhaps specify what those mechanisms should look like
if present (I'm thinking alist->ontology).  If disjointess of types
and uniformity with user-defined ontologies (if any) are not required,
the standard ontologies could even be represented by their equivalence
predicates, reducing to the minimal implementation of the current
proposal but with clearer nomenclature.

Credit where credit is due: I acquired this idea from Taylor Campbell.

>> If the programmer's intent is that the hash table's values can be
>> accessed by traversing the hash table, then the entry should be kept.
>> If the programmer's intent is that the only way to use the hash table
>> to get one of the values is by looking up the key, then the entry
>> should be removed and the key and value garbage collected.
>
> I don't understand the first kind of intention: can you give a concrete
> example?  The way it looks to me, non-ephemeral hash tables are just
> (possibly tolerably) buggy versions of ephemeral ones.

I assume you mean _weak_ but non-ephemeral hash tables.  Strong hash
tables are certainly useful.  I don't actually have a concrete example
in mind of a situation where the right data structure was a hash table
whose keys would be held by weak pointers but whose entries would not
be removed if the key were accessible only through the value.

Bruno Haible published a nice article about this:
http://www.haible.de/bruno/papers/cs/weak/WeakDatastructures-writeup.html
I have two takeaways from that:

1) the word "weak" by itself is not specific enough, because people can
   interpret it as either "ephemeral" or "weak non-emphemeal"/

2) "weak non-ephemeral" may not ever be the right semantics in
   practice, but if the programmer knows that the values will never
   contain the keys, it may be acceptable; and it is easier and (at
   least somewhat) more efficient to implement.  Whether that
   constitutes an argument for standardizing it is a separate
   question.

Consequently I suggest clarifying what collection of weakness
semantics the proposal specifies mechanisms to ask for.

>> It seems as though the in-place mutation of hash-table-map! would be
>> considerably more useful (for efficiency) if the proc were not allowed
>> to change the keys to inequivalent ones.
>
> The original idea, which is not well explained, is that the association
> being processed by map! is conceptually removed and the new association
> inserted in its place.  But you're right that the user really shouldn't
> be allowed to mutate the keys a hash table contains while scanning it.
>
> So I've removed `hash-table-map` in favor of `hash-table-update-each!`
> which is like `hash-table-for-each` except that the value returned is
> used to update the association.  For the same reason I have also removed
> `hash-table-remove!` from the draft; what it wants to do should be done
> with `hash-table-delete-keys!`, which is allowed to be efficient.

OK, that makes sense.

>> It also seems that leaving the key alone would be a common use case
>> for hash-table-map; is it worth specifying a variant that accepts
>> this?  Or specifying that the procedure given to hash-table-map may
>> return either one value or two, with the second being assumed to be
>> the key if absent?
>
> I don't see that knowing this allows you to optimize anything.

Neither do I, but I can just see myself defining the helper function

(define (hash-table-map-value eq hash proc merger hash-table)
  (hash-table-map
   eq hash
   (lambda (key value)
     (values key (proc key value)))
   (lambda (key1 val1 key2 val2)
     (values key1 (merger key1 val1 key2 val2)))
   hash-table))

every time I want to use the hash tables from this proposal.

>> I am surprised by the proposal to allow the merger argument to the
>> various hash table manipulation procedures to return an arbitrary
>> new key as a result of a collision.  What purpose does this serve?
>> Its cost strikes me as high: what if the merger returns a key that is
>> not equivalent to either of its input keys?  What if that key conflicts
>> with a different key already present in the result being built?  This is
>> also inconsistent with the mergers specified for hash-table-union!
>> and hash-table-intersection!.
>
> Those are pure value mergers, but they now take the keys as arguments
> as well, since they may provide useful context.

This response clarifies things a little, but it doesn't actually
address my main question.  What purpose is served by allowing the
merger in hash-table-map to return a new, potentially inequivalent
key?  What if that new key has already been returned by a previous
call to proc (or a previous call to merger)?  It seems like this will
require chaining calls to merger to resolve additional collisions.
What new liberty does that hair give?

>> Why is the failure argument to hash-table-replace! mandatory?  If not
>> a typo, that seems like a useless irregularity.
>
> Because the default failure continuation is to signal an error, so
> `(hash-table-replace hash-table key)` would be the same as
> `(hash-table-ref hash-table key)`, which doesn't mutate anything.

However, a uniform interface for similar functions is very valuable in
a higher-order language.  I may not always know at a call site whether
I have ref, replace, or one of the others, and it would be very
convenient for them to accept exactly the same parameters with the
same semantics.

~Alexey

_______________________________________________
Scheme-reports mailing list
Scheme-reports@x
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports