86
Chapter 5
I
Hash tables
Recap
To recap, hashes are good for
• Modeling relationships from
one thing to another thing
• Filtering out duplicates
• Caching/memorizing data instead of making your server do work
Collisions
Like I
said earlier, most languages have hash tables. You don’t need to
know how to write your own. So, I won’t talk about the internals of hash
tables too much. But you still care about performance!
To understand
the performance of hash tables, you first need to understand what
collisions are. The next two sections cover collisions and performance.
First, I’ve been telling you a white lie. I told
you that a hash function
always maps different keys to different slots in the array.
In reality, it’s almost impossible to write a hash function that does this.
Let’s take a simple example. Suppose your array contains 26 slots.
And your hash function is really simple: it assigns a spot in the array
alphabetically.
87
Collisions
Maybe you can already see the problem. You
want to put the price of apples in your hash.
You get assigned the first slot.
Then you want to put the price of bananas in the hash. You get assigned
the second slot.
Everything is going so well! But now you
want to put the price of
avocados in your hash. You get assigned the first slot again.
Oh no! Apples have that slot already! What to do? This is called a
collision
: two keys have been assigned the same slot. This is a problem.
If you store the price of avocados at that slot, you’ll
overwrite the price
of apples. Then the next time someone asks for the price of apples,
they will get the price of avocados instead! Collisions are bad, and you
need to work around them. There are many
different ways to deal with
collisions. The simplest one is this: if multiple keys map to the same
slot, start a linked list at that slot.