[lug] Hash vs list

Sun Apr 23 00:51:55 MDT 2006

On Sat, Apr 22, 2006 at 06:44:15PM -0600, George Sexton wrote:
>A binary search within an array could give a pretty good result. If the
>output of the hash function is not really good then a binary search would
>find a record in approximately 20 comparisons.

Note, the original author didn't say "sorted list", just list.  The
in_array function I believe just iterates, so on average it requires a half
million comparisons.

Hashing, of course, requires that you create the hash, but that's probably
cheaper than sorting the list unless you need the list sorted for some
other reason.

But, to the original poster, here's the thing: benchmark it for your
specific needs and see what it looks like.

It can also vary quite a lot.  In some conditions, iterating can be quicker
than using an index, this is why PostgreSQL will often avoid using indexes
when a table is small.  It also depends on the implementation, the "Judy"
algorithm can apparently be good for some things, but when I mentioned it
to some of the Python know-it-alls they speculated that Python's hashes
would be faster because they were optimized quite heavily for small
dictionaries.  Internally, Python does that all the time, so it's heavily
optimized for that.

For your use, an associative array is probably what you want for doing the
lookups.

Thanks,
Sean
-- 
 Whenever possible, steal code.
                 -- Tom Duff
Sean Reifschneider, Member of Technical Staff <jafo at tummy.com>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability