Skip to content

XOR-SABER/HashTableStudyGuide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hash Table Study Guide


About This Project

This project is a C++ implementation of hash tables, including both open addressing and closed addressing (chaining) strategies. It is built with a focus on correctness, clarity, and testability. This was made for service learning project to educate lower divison students.

Included are:

  • A fully templated, reusable HashTable base class.
  • Specialized implementations for OpenAddressingHash and ClosedAddressingHash.
  • Comprehensive unit tests written using Google Test (GTest) framework.

The unit tests verify:

  • Correctness of insert, search, and removal operations
  • Handling of collisions
  • Proper resizing behavior
  • Prevention of duplicate insertions
  • Rehashing functionality

What Are Hash Tables?

Hash tables are data structures used to perform insert, search, and delete operations in constant average time — O(1). They are also known as:

  • Associative arrays
  • Hash maps
  • Dictionaries
  • Unordered maps

But before we implement them, we need to understand a few core concepts.


What Are Hashes?

Imagine you're trying to determine if something is unique. A computer can't "look" at something visually like we can — instead, it uses hashing, which maps input values to a unique numerical representation.

There are many ways to hash data. In this guide, we focus on modulo hashing, and later in C++ we'll use std::hash combined with modulo.


Modulo Hashing

The modulo operator (%) gives us the remainder after division.

Example:

If we have:

  • X = 4172025 (some student ID)
  • Table size = 25

Then:

Hash(X) = X % 25 = 4172025 % 25 = 1

Practice Questions:

  1. Table size = 26, value = 1023 → Index?
  2. Table size = 67, value = 643 → Index?
  3. Table size = 21, value = 276 → Index?

What's the Fastest Data Structure?

Hash tables can outperform:

  • Linked lists: O(N) search
  • Balanced BSTs: O(log N) search, insert, delete

With hashing, we can reach O(1) performance in average cases for insert, search, and delete. However, we must handle collisions carefully.


Collisions

A collision happens when two different inputs hash to the same index.

Example:

If our table size is 25:

Hash(4172025) = 1
Hash(1234526) = 1

This is a collision.

There are two primary ways to handle collisions:

  • Open Addressing
  • Closed Addressing (Chaining)

Open Addressing

In open addressing, all elements are stored directly in the array. On collision, we probe to find the next available slot.

Formula (Linear Probing):

Index = (Hash(key) + i) % Table Size

Where:

  • i starts at 0 and increments with each collision

We use tombstones to handle deletions, marking removed slots as REMOVED.


Open Addressing – C++ Implementation

We use an abstract base class HashTable, and then implement OpenAddressingHash that handles:

  • OPEN, FILLED, REMOVED states
  • std::hash as our hash function
  • Collision resolution with linear probing

Open Addressing: Search Pseudocode

Function Search(data):
    Compute hash index = Hash(data) % table size
    For i from 0 to table size:
        index = (hash + i) % table size
        If slot is OPEN:
            Break (data not in table)
        If slot is FILLED and value == data:
            Return value
    Return null

Open Addressing: Removal Pseudocode

Function Removal(data):
    Compute hash index = Hash(data) % table size
    For i from 0 to table size:
        index = (hash + i) % table size
        If slot is OPEN:
            Break (data not in table)
        If slot is FILLED and value == data:
            Mark as REMOVED
            Decrement count
            Return value
    Return null

Open Addressing: Resize Pseudocode

Function resize():
    new_capacity = table.size * 2
    Create new_table of size new_capacity, filled with (default, OPEN)
    For each (value, status) in old table:
        If status is FILLED:
            Compute new index and insert into new_table with linear probing
    Replace table with new_table

Open Addressing: Insert Pseudocode

Function Insert(data):
    If load factor > LOAD_SIZE:
        resize()
    Compute hash index = Hash(data) % table size
    For i from 0 to table size:
        index = (hash + i) % table size
        If FILLED and value == data:
            Return false (duplicate)
        If not FILLED:
            Insert (data, FILLED)
            Increment count
            Return true
    Return false

Closed Addressing (Chaining)

This is what most people think of when they hear "hash table with buckets."

  • Each slot (bucket) in the table holds a data structure (commonly a linked list).
  • This structure can be anything: list, set, even a BST.

Tradeoffs:

Structure in Bucket Worst Case Notes
Linked List O(K) Simple, may degrade to linear
Balanced Set (e.g. std::set) O(log K) Better search but slightly slower inserts

Where K = number of items in a single bucket


Closed Addressing – C++ Implementation

We again extend HashTable<T>, using:

  • std::vector<std::list<T>> as the table
  • std::find to simplify operations

Closed Addressing: Search Pseudocode

Function Search(data):
    hash index = Hash(data) % table size
    bucket = table[hash index]
    If data in bucket:
        Return data
    Else:
        Return null

Closed Addressing: Removal Pseudocode

Function Removal(data):
    hash index = Hash(data) % table size
    bucket = table[hash index]
    If data in bucket:
        Store value
        Remove from bucket
        Decrement count
        Return value
    Else:
        Return null

Closed Addressing: Resize Pseudocode

Function resize():
    newCap = table size * 2
    Create new table of size newCap (vector of lists)
    For each bucket in old table:
        For each item in bucket:
            Compute new index
            Insert into new table
    Replace old table

Closed Addressing: Insert Pseudocode

Function Insert(data):
    If (count / table size) > LOAD_SIZE:
        resize()
    hash index = Hash(data) % table size
    bucket = table[hash index]
    If data not in bucket:
        Add to bucket
        Increment count
        Return true
    Else:
        Return false

Final Thoughts

  • Open Addressing gives tighter memory usage but can suffer from clustering and performance degradation in edge cases.
  • Closed Addressing offers consistent performance, with more flexibility in how buckets store data.
  • Both methods target O(1) average-case time complexity, but worst cases differ depending on the implementation.
  • This is one of the first composite data structures you’ll encounter — combining hashing with internal storage like lists or trees.

Releases

No releases published

Packages

No packages published