This project is a C++ implementation of hash tables, including both open addressing and closed addressing (chaining) strategies. It is built with a focus on correctness, clarity, and testability. This was made for service learning project to educate lower divison students.
Included are:
- A fully templated, reusable HashTable base class.
- Specialized implementations for OpenAddressingHash and ClosedAddressingHash.
- Comprehensive unit tests written using Google Test (GTest) framework.
The unit tests verify:
- Correctness of insert, search, and removal operations
- Handling of collisions
- Proper resizing behavior
- Prevention of duplicate insertions
- Rehashing functionality
Hash tables are data structures used to perform insert, search, and delete operations in constant average time — O(1). They are also known as:
- Associative arrays
- Hash maps
- Dictionaries
- Unordered maps
But before we implement them, we need to understand a few core concepts.
Imagine you're trying to determine if something is unique. A computer can't "look" at something visually like we can — instead, it uses hashing, which maps input values to a unique numerical representation.
There are many ways to hash data. In this guide, we focus on modulo hashing, and later in C++ we'll use std::hash
combined with modulo.
The modulo operator (%
) gives us the remainder after division.
If we have:
X = 4172025
(some student ID)Table size = 25
Then:
Hash(X) = X % 25 = 4172025 % 25 = 1
- Table size = 26, value = 1023 → Index?
- Table size = 67, value = 643 → Index?
- Table size = 21, value = 276 → Index?
Hash tables can outperform:
- Linked lists: O(N) search
- Balanced BSTs: O(log N) search, insert, delete
With hashing, we can reach O(1) performance in average cases for insert, search, and delete. However, we must handle collisions carefully.
A collision happens when two different inputs hash to the same index.
If our table size is 25:
Hash(4172025) = 1
Hash(1234526) = 1
This is a collision.
There are two primary ways to handle collisions:
- Open Addressing
- Closed Addressing (Chaining)
In open addressing, all elements are stored directly in the array. On collision, we probe to find the next available slot.
Index = (Hash(key) + i) % Table Size
Where:
i
starts at 0 and increments with each collision
We use tombstones to handle deletions, marking removed slots as REMOVED
.
We use an abstract base class HashTable
, and then implement OpenAddressingHash
that handles:
OPEN
,FILLED
,REMOVED
statesstd::hash
as our hash function- Collision resolution with linear probing
Function Search(data):
Compute hash index = Hash(data) % table size
For i from 0 to table size:
index = (hash + i) % table size
If slot is OPEN:
Break (data not in table)
If slot is FILLED and value == data:
Return value
Return null
Function Removal(data):
Compute hash index = Hash(data) % table size
For i from 0 to table size:
index = (hash + i) % table size
If slot is OPEN:
Break (data not in table)
If slot is FILLED and value == data:
Mark as REMOVED
Decrement count
Return value
Return null
Function resize():
new_capacity = table.size * 2
Create new_table of size new_capacity, filled with (default, OPEN)
For each (value, status) in old table:
If status is FILLED:
Compute new index and insert into new_table with linear probing
Replace table with new_table
Function Insert(data):
If load factor > LOAD_SIZE:
resize()
Compute hash index = Hash(data) % table size
For i from 0 to table size:
index = (hash + i) % table size
If FILLED and value == data:
Return false (duplicate)
If not FILLED:
Insert (data, FILLED)
Increment count
Return true
Return false
This is what most people think of when they hear "hash table with buckets."
- Each slot (bucket) in the table holds a data structure (commonly a linked list).
- This structure can be anything:
list
,set
, even aBST
.
Structure in Bucket | Worst Case | Notes |
---|---|---|
Linked List | O(K) | Simple, may degrade to linear |
Balanced Set (e.g. std::set) | O(log K) | Better search but slightly slower inserts |
Where K = number of items in a single bucket
We again extend HashTable<T>
, using:
std::vector<std::list<T>>
as the tablestd::find
to simplify operations
Function Search(data):
hash index = Hash(data) % table size
bucket = table[hash index]
If data in bucket:
Return data
Else:
Return null
Function Removal(data):
hash index = Hash(data) % table size
bucket = table[hash index]
If data in bucket:
Store value
Remove from bucket
Decrement count
Return value
Else:
Return null
Function resize():
newCap = table size * 2
Create new table of size newCap (vector of lists)
For each bucket in old table:
For each item in bucket:
Compute new index
Insert into new table
Replace old table
Function Insert(data):
If (count / table size) > LOAD_SIZE:
resize()
hash index = Hash(data) % table size
bucket = table[hash index]
If data not in bucket:
Add to bucket
Increment count
Return true
Else:
Return false
- Open Addressing gives tighter memory usage but can suffer from clustering and performance degradation in edge cases.
- Closed Addressing offers consistent performance, with more flexibility in how buckets store data.
- Both methods target O(1) average-case time complexity, but worst cases differ depending on the implementation.
- This is one of the first composite data structures you’ll encounter — combining hashing with internal storage like lists or trees.