Chapter 5
I
Hash tables
In the worst case, a hash table takes O(
n
)—linear time—for everything,
which is really slow. Let’s compare hash tables to arrays and lists.
Look at the average case for hash tables. Hash tables are as fast as arrays
at searching (getting a value at an index). And they’re as fast as linked
lists at inserts and deletes. It’s the best of both worlds! But in the worst
case, hash tables are slow at all of those. So it’s important that you don’t
hit worst-case performance with hash tables. And to do that, you need
to avoid collisions. To avoid collisions, you need
• A low load factor
• A good hash function
Load factor
The load factor of a hash table
is easy to calculate.
Hash tables use an array for storage, so you count the number of
occupied slots in an array. For example, this hash table has a load factor
of
2
/
5
, or 0.4.
Note
Before you start this next section, know that this isn’t required reading. I’m
going to talk about how to implement a hash table, but you’ll never have
to do that yourself. Whatever programming language you use will have an
implementation of hash tables built in. You can use the built-in hash table
and assume it will have good performance. The next section gives you a
peek under the hood.
91
Performance
What’s the load factor of this hash table?
If you said
1
/
3
, you’re right. Load factor measures how many empty slots
remain in your hash table.
Suppose you need to store the price of 100 produce items in your hash
table, and your hash table has 100 slots. In the best case, each item will
get its own slot.
This hash table has a load factor of 1. What if your hash table has only
50 slots? Then it has a load factor of 2. There’s no way each item will
get its own slot, because there aren’t enough slots! Having a load factor
greater than 1 means you have more items than slots in your array.
Once the load factor starts to grow, you need to add more slots to your
hash table. This is called
resizing
. For example, suppose you have this
hash table that is getting pretty full.
You need to resize this hash table. First you create a new array that’s
bigger. The rule of thumb is to make an array that is twice the size.
|