CQH102: Hash Tables – Fast Lookups with Smart Tradeoffs

Imagine being able to find a piece of data instantly—no matter how big your collection is. That’s the power of hash tables. They’re the secret behind dictionaries, caches, databases, and even some video game mechanics.

But with great speed comes some complexity. In this post, we’ll explore what hash tables are, how they work, what can go wrong, and when simpler alternatives might actually be better.

What is a Hash Table?

A hash table stores data in key-value pairs, where a special function called a hash function turns a key (like a name or ID) into an index where the value is stored.

Think of it like a magic filing cabinet: you give it a label, and it instantly tells you where to find the folder.

Python Example: Using a Dictionary

# Python dictionaries are built on hash tables
student_scores = {
    "Alice": 95,
    "Bob": 87,
    "Charlie": 92
}

print(student_scores["Bob"])  # Output: 87

How Hash Tables Work

The key is passed through a hash function to get an index.
The value is stored at that index in an array.
When looking up the key, the same hash function gives you the index instantly.

But what if two keys hash to the same index? That’s called a collision.

Collision Resolution Strategies

Collisions are inevitable, so hash tables need ways to handle them:

1. Chaining

Each index holds a list of items, and collisions are stored in the same bucket.

# Visually: index 3 → ["Alice", "Zack"] if both hash to 3

Simple and easy to implement
Can degrade to O(n) if many keys land in the same bucket

2. Open Addressing (Linear Probing)

If a spot is taken, search for the next open slot.

Index 3 taken? Try 4. Still taken? Try 5...

Keeps everything in one array
Can lead to clustering, slowing down performance

3. Double Hashing

Use a second hash function to decide how far to jump when there’s a collision.

Reduces clustering but more complex

Performance Characteristics

Operation	Average Case	Worst Case
Insert	O(1)	O(n)
Search	O(1)	O(n)
Delete	O(1)	O(n)

When the hash function is good and the table isn’t too full, performance is near-instant.
When collisions pile up or the table becomes overloaded, performance drops.

Why a Good Hash Function Matters

A hash function should:

Spread keys evenly across the table.
Be fast to compute.
Avoid patterns or clustering.

Poor hash functions can turn a hash table into a very slow list.

Tradeoffs – When Simpler is Better

While hash tables offer incredible speed, they also come with:

Overhead from hash computations
Extra memory usage
Complexity from collision handling

For small datasets, something like binary search on a sorted array might actually be faster and simpler to implement. Hash tables shine when:

You need constant-time lookups
You’re managing lots of data
You’re okay with a bit of memory overhead

Real-World Examples

Dictionaries in Python – Instant access to words and meanings
Caching systems – Storing and retrieving computed results
Game development – Mapping game object states and positions
Databases – Indexing records for fast querying

Challenge Time! ⚔️

Build a simple hash table class in Python using chaining to resolve collisions. Try inserting 10 items and then retrieving them. What happens when two keys collide?

Bonus: Try writing your own hash function—what happens when you tweak it?

Coming up next: Trees – where we’ll explore how to organize data in a branching, hierarchical way!