Skip to content

Hash Map

A hash map (also known as hash table or dictionary) is a data structure that implements an associative array abstract data type, mapping keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Features

  • Key-value pair storage - maps keys to values efficiently
  • Fast operations - O(1) average time for insert, delete, and lookup
  • Automatic resizing - maintains performance by resizing when load factor is exceeded
  • Collision resolution - uses separate chaining to handle hash collisions
  • Pluggable hash functions - supports custom hash functions for different use cases
  • Generic type support - type-safe keys and values

Time Complexities

Operation Average Case Worst Case
Insert (put) O(1) O(n)
Search (get) O(1) O(n)
Delete (remove) O(1) O(n)
Contains Key O(1) O(n)
Contains Value O(n) O(n)
Iteration O(n) O(n)

Note: Worst case occurs when all keys hash to the same bucket

Basic Usage

Creating and Initializing

from algo.data_structs.hash_tables.hash_map import HashMap

# Create empty hash map with default settings
hm = HashMap()

# Check if empty
print(hm.is_empty())    # True
print(len(hm))          # 0
print(hm.size())        # 0
print(hm.capacity())    # 16 (default)
print(hm.load_factor()) # 0.0

# Create with custom parameters
hm = HashMap(initial_capacity=32, load_factor=0.5)
print(hm.capacity())    # 32

Adding Key-Value Pairs

# Put key-value pairs
hm.put("name", "Alice")
hm.put("age", 30)
hm.put("city", "New York")
print(hm.size())  # 3

# Dictionary-like assignment
hm["country"] = "USA"
hm["occupation"] = "Engineer"
print(hm.size())  # 5

# Put returns previous value if key existed
old_value = hm.put("age", 31)  # Update existing key
print(old_value)  # 30 (previous value)
print(hm.get("age"))  # 31 (new value)

# Adding multiple pairs
data = {"height": "5'8\"", "weight": "150lbs", "hobby": "reading"}
for key, value in data.items():
    hm.put(key, value)
print(hm.size())  # 8

Accessing Values

# Get values by key
print(hm.get("name"))     # Alice
print(hm.get("age"))      # 31
print(hm.get("city"))     # New York

# Get with default value
print(hm.get("phone"))              # None
print(hm.get("phone", "N/A"))       # N/A

# Dictionary-like access
print(hm["name"])         # Alice
print(hm["country"])      # USA

# Check current stats
print(f"Size: {hm.size()}")
print(f"Capacity: {hm.capacity()}")
print(f"Load Factor: {hm.load_factor():.2f}")

Removing Key-Value Pairs

# Remove by key
removed_value = hm.remove("hobby")
print(removed_value)      # reading
print(hm.size())          # 7

# Remove non-existent key
result = hm.remove("nonexistent")
print(result)             # None

# Dictionary-like deletion
del hm["weight"]
print(hm.size())          # 6

# Clear all elements
hm.clear()
print(hm.size())          # 0
print(hm.is_empty())      # True

Search Operations

hm = HashMap()
hm.put("apple", 1.50)
hm.put("banana", 0.75)
hm.put("cherry", 2.00)
hm.put("date", 3.00)
hm.put("elderberry", 0.75)  # Duplicate value

# Check if key exists
print(hm.contains_key("apple"))     # True
print(hm.contains_key("grape"))     # False

# Check if value exists
print(hm.contains_value(1.50))      # True
print(hm.contains_value(0.75))      # True (multiple keys can have same value)
print(hm.contains_value(5.00))      # False

# Using 'in' operator (checks keys)
print("banana" in hm)               # True
print("grape" in hm)                # False

# Get value with error handling
try:
    price = hm["mango"]  # KeyError for non-existent key
except KeyError:
    print("Mango not found in inventory")

Iteration and Traversal

hm = HashMap()
hm.put("red", "#FF0000")
hm.put("green", "#00FF00")
hm.put("blue", "#0000FF")
hm.put("yellow", "#FFFF00")

# Iterate over keys (default iteration)
print("Keys:")
for key in hm:
    print(key)  # red, green, blue, yellow (order not guaranteed)

# Iterate over keys explicitly
print("Keys (explicit):")
for key in hm.keys():
    print(key)

# Iterate over values
print("Values:")
for value in hm.values():
    print(value)  # #FF0000, #00FF00, #0000FF, #FFFF00

# Iterate over key-value pairs
print("Key-Value pairs:")
for key, value in hm.items():
    print(f"{key}: {value}")

# Convert to lists
keys_list = list(hm.keys())
values_list = list(hm.values())
items_list = list(hm.items())

Working with Different Types

String Keys and Values

str_hm = HashMap()
str_hm.put("greeting", "Hello")
str_hm.put("farewell", "Goodbye")
str_hm.put("thanks", "Thank you")

print(str_hm.get("greeting"))   # Hello
print("thanks" in str_hm)       # True
print(str_hm.size())           # 3

Numeric Keys

num_hm = HashMap()
num_hm.put(1, "one")
num_hm.put(2, "two")
num_hm.put(42, "answer")

print(num_hm.get(1))    # one
print(num_hm.get(42))   # answer
print(num_hm.size())    # 3

Mixed Value Types

mixed_hm = HashMap()
mixed_hm.put("string", "text")
mixed_hm.put("integer", 42)
mixed_hm.put("float", 3.14)
mixed_hm.put("list", [1, 2, 3])
mixed_hm.put("none", None)

print(mixed_hm.get("string"))   # text
print(mixed_hm.get("integer"))  # 42
print(mixed_hm.get("list"))     # [1, 2, 3]
print(mixed_hm.contains_key("none"))  # True (even with None value)

Tuple Keys

tuple_hm = HashMap()
tuple_hm.put((1, 2), "coordinate")
tuple_hm.put(("name", "age"), "person_data")
tuple_hm.put((2023, 12, 25), "Christmas")

print(tuple_hm.get((1, 2)))              # coordinate
print((2023, 12, 25) in tuple_hm)        # True

Custom Objects as Values

class Person:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age

    def __str__(self):
        return f"Person({self.name}, {self.age})"

people_hm = HashMap()
people_hm.put("alice", Person("Alice", 30))
people_hm.put("bob", Person("Bob", 25))

alice = people_hm.get("alice")
print(alice)  # Person(Alice, 30)
print(alice.name)  # Alice

Hash Map Resizing and Performance

Automatic Resizing

# Create small hash map to demonstrate resizing
hm = HashMap(initial_capacity=4, load_factor=0.5)
print(f"Initial capacity: {hm.capacity()}")  # 4

# Add elements up to threshold (4 * 0.5 = 2)
hm.put("key1", "value1")
hm.put("key2", "value2")
print(f"Before resize - Capacity: {hm.capacity()}, Load Factor: {hm.load_factor():.2f}")

# Adding one more triggers resize
hm.put("key3", "value3")
print(f"After resize - Capacity: {hm.capacity()}, Load Factor: {hm.load_factor():.2f}")

# Verify all data is still accessible
for i in range(1, 4):
    print(f"key{i}: {hm.get(f'key{i}')}")

Load Factor Management

hm = HashMap(initial_capacity=8, load_factor=0.75)

print(f"Empty - Load Factor: {hm.load_factor()}")  # 0.0

for i in range(6):
    hm.put(f"item{i}", f"value{i}")
    print(f"After {i+1} items - Load Factor: {hm.load_factor():.2f}")

# Load factor stays below 0.75 due to automatic resizing

Hash Functions and Collision Handling

Using Custom Hash Functions

from algo.algorithms.hashing.hash_functions import FNVHash, DJB2Hash, MurmurHash3

# Create hash map with FNV hash function
fnv_hm = HashMap(hash_function=FNVHash())
fnv_hm.put("test", "value")
print(f"Hash function: {type(fnv_hm.get_hash_function()).__name__}")

# Switch hash functions (rehashes all data)
djb2_hash = DJB2Hash()
fnv_hm.set_hash_function(djb2_hash)
print(f"New hash function: {type(fnv_hm.get_hash_function()).__name__}")
print(f"Data still accessible: {fnv_hm.get('test')}")  # value

# Create with MurmurHash3
murmur_hm = HashMap(hash_function=MurmurHash3(seed=42))

Analyzing Hash Distribution

hm = HashMap(initial_capacity=8, load_factor=0.9)

# Add data to potentially cause collisions
for i in range(15):
    hm.put(f"key_{i:02d}", f"value_{i}")

# Get bucket distribution
distribution = hm.get_bucket_distribution()
print(f"Bucket distribution: {distribution}")
print(f"Total items: {sum(distribution)}")

# Get detailed statistics
stats = hm.get_hash_statistics()
print(f"Collisions: {stats['collisions']}")
print(f"Max chain length: {stats['max_chain_length']}")
print(f"Average chain length: {stats['average_chain_length']:.2f}")
print(f"Load factor: {stats['load_factor']:.2f}")

Error Handling

hm = HashMap()

# KeyError for non-existent keys with [] operator
try:
    value = hm["nonexistent"]
except KeyError:
    print("Key not found when using [] operator")

# Safe access with get() method
value = hm.get("nonexistent")  # Returns None
print(value)  # None

value = hm.get("nonexistent", "default")  # Returns default
print(value)  # default

# KeyError when deleting non-existent key
try:
    del hm["nonexistent"]
except KeyError:
    print("Cannot delete non-existent key")

# Safe removal with remove() method
result = hm.remove("nonexistent")  # Returns None
print(result)  # None

# Invalid initialization parameters
try:
    invalid_hm = HashMap(initial_capacity=0)
except ValueError as e:
    print(f"Invalid capacity: {e}")

try:
    invalid_hm = HashMap(load_factor=1.5)
except ValueError as e:
    print(f"Invalid load factor: {e}")

String Representation

hm = HashMap()
hm.put("name", "Alice")
hm.put("age", 30)
hm.put("city", "New York")

# String representation (like dict)
print(str(hm))   # {name: Alice, age: 30, city: New York}

# Detailed representation for debugging
print(repr(hm))  # HashMap(size=3, capacity=16, load_factor=0.19, hash_func=FNVHash)

# Empty hash map representation
empty_hm = HashMap()
print(str(empty_hm))  # {}

Common Patterns

Building from Dictionaries

# From Python dictionary
data = {"a": 1, "b": 2, "c": 3}
hm = HashMap()
for key, value in data.items():
    hm.put(key, value)

# Using update method
hm1 = HashMap()
hm1.put("x", 10)

hm2 = HashMap()
hm2.put("y", 20)
hm2.put("z", 30)

hm1.update(hm2)  # Merge hm2 into hm1
print(hm1.size())  # 3

Caching and Memoization

# Simple cache implementation
cache = HashMap()

def expensive_operation(n):
    if cache.contains_key(n):
        print(f"Cache hit for {n}")
        return cache.get(n)

    # Simulate expensive computation
    result = n * n * n
    cache.put(n, result)
    print(f"Computed and cached {n}")
    return result

print(expensive_operation(5))  # Computed and cached 5 -> 125
print(expensive_operation(5))  # Cache hit for 5 -> 125

Counting Occurrences

def count_words(text):
    word_count = HashMap()
    words = text.lower().split()

    for word in words:
        current_count = word_count.get(word, 0)
        word_count.put(word, current_count + 1)

    return word_count

text = "the quick brown fox jumps over the lazy dog the fox"
counts = count_words(text)

for word, count in counts.items():
    print(f"{word}: {count}")

Grouping Data

# Group people by age
people = [
    ("Alice", 25), ("Bob", 30), ("Charlie", 25),
    ("Diana", 30), ("Eve", 35)
]

age_groups = HashMap()
for name, age in people:
    if not age_groups.contains_key(age):
        age_groups.put(age, [])

    age_groups.get(age).append(name)

for age, names in age_groups.items():
    print(f"Age {age}: {names}")

Configuration Management

# Application configuration
config = HashMap()
config.put("database_url", "postgresql://localhost:5432/mydb")
config.put("debug_mode", True)
config.put("max_connections", 100)
config.put("timeout", 30.0)

def get_config(key, default=None):
    return config.get(key, default)

# Usage
db_url = get_config("database_url")
debug = get_config("debug_mode", False)
port = get_config("port", 8080)  # Uses default

Use Cases

When to Use Hash Maps

Good for:

  • Fast key-value lookups
  • Caching and memoization
  • Counting and frequency analysis
  • Grouping and categorization
  • Configuration storage
  • Database indexing
  • Symbol tables in compilers

Examples:

# User session management
sessions = HashMap()
sessions.put("session_123", {"user_id": 456, "login_time": "2023-12-01"})

# Inventory management
inventory = HashMap()
inventory.put("SKU001", {"name": "Widget", "quantity": 100, "price": 9.99})

# DNS resolution cache
dns_cache = HashMap()
dns_cache.put("example.com", "93.184.216.34")

# Database record cache
record_cache = HashMap()
record_cache.put("user:123", {"name": "Alice", "email": "alice@example.com"})

When NOT to Use Hash Maps

Avoid for:

  • Ordered data requirements (use OrderedDict or specialized structures)
  • Range queries (use trees or sorted structures)
  • Memory-constrained environments with many small mappings
  • When iteration order matters

Better alternatives:

# For ordered data, use specialized ordered structures
from collections import OrderedDict
ordered_data = OrderedDict([("first", 1), ("second", 2)])

# For range queries, use sorted structures
import bisect
sorted_keys = ["a", "c", "e", "g"]
# Find keys in range

# For small, fixed mappings, use namedtuple or dataclass
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])

Performance Characteristics

Time Complexity Analysis

  • Average case performance: O(1) for basic operations
  • Worst case performance: O(n) when all keys hash to same bucket
  • Space complexity: O(n) where n is number of key-value pairs
  • Resizing cost: O(n) but amortized to O(1) per operation

Memory Efficiency

# Hash maps use additional memory for:
# - Bucket array
# - Hash table metadata
# - Chain nodes for collision resolution

hm = HashMap(initial_capacity=16)
print(f"Initial capacity: {hm.capacity()}")  # 16 buckets allocated

# Memory grows with load factor
for i in range(100):
    hm.put(f"key{i}", f"value{i}")

print(f"Final capacity: {hm.capacity()}")  # Expanded as needed
print(f"Load factor: {hm.load_factor():.2f}")  # Maintained below threshold

Collision Resolution Performance

# Separate chaining performance depends on chain length
hm = HashMap(initial_capacity=4, load_factor=0.9)  # Force collisions

# Add many items to small hash map
for i in range(20):
    hm.put(f"item{i}", f"value{i}")

stats = hm.get_hash_statistics()
print(f"Max chain length: {stats['max_chain_length']}")
print(f"Average chain length: {stats['average_chain_length']:.2f}")
print(f"Collision rate: {stats['collisions'] / stats['total_elements']:.2f}")

Hash Map Comparison and Equality

# Hash maps are equal if they contain same key-value pairs
hm1 = HashMap()
hm1.put("a", 1)
hm1.put("b", 2)

hm2 = HashMap()
hm2.put("b", 2)  # Different insertion order
hm2.put("a", 1)

print(hm1 == hm2)  # True (same content)

# Different values
hm3 = HashMap()
hm3.put("a", 1)
hm3.put("b", 3)  # Different value

print(hm1 == hm3)  # False

# Different sizes
hm4 = HashMap()
hm4.put("a", 1)

print(hm1 == hm4)  # False (different sizes)

# Comparison with non-HashMap objects
print(hm1 == {"a": 1, "b": 2})  # False (different types)

Advanced Usage

Custom Hash Function Performance Comparison

from algo.algorithms.hashing.hash_functions import (
    FNVHash, DJB2Hash, MurmurHash3, PolynomialHash
)

# Test different hash functions with same data
test_data = [(f"key_{i:03d}", f"value_{i}") for i in range(100)]

hash_functions = [
    ("FNV", FNVHash()),
    ("DJB2", DJB2Hash()),
    ("MurmurHash3", MurmurHash3()),
    ("Polynomial", PolynomialHash())
]

for name, hash_func in hash_functions:
    hm = HashMap(initial_capacity=32, hash_function=hash_func)

    # Add test data
    for key, value in test_data:
        hm.put(key, value)

    # Analyze distribution
    stats = hm.get_hash_statistics()
    print(f"{name} Hash Function:")
    print(f"  Collisions: {stats['collisions']}")
    print(f"  Max chain length: {stats['max_chain_length']}")
    print(f"  Distribution variance: {stats['distribution_variance']:.2f}")
    print()

Hash Map as Building Block

# Implement a simple LRU cache using HashMap
class LRUCache:
    def __init__(self, capacity: int):
        self.capacity = capacity
        self.cache = HashMap()
        self.access_order = []

    def get(self, key):
        if self.cache.contains_key(key):
            # Move to end (most recently used)
            self.access_order.remove(key)
            self.access_order.append(key)
            return self.cache.get(key)
        return None

    def put(self, key, value):
        if self.cache.contains_key(key):
            # Update existing
            self.cache.put(key, value)
            self.access_order.remove(key)
            self.access_order.append(key)
        else:
            # Add new
            if len(self.access_order) >= self.capacity:
                # Evict least recently used
                lru_key = self.access_order.pop(0)
                self.cache.remove(lru_key)

            self.cache.put(key, value)
            self.access_order.append(key)

# Usage
lru = LRUCache(3)
lru.put("a", 1)
lru.put("b", 2)
lru.put("c", 3)
lru.put("d", 4)  # Evicts "a"
print(lru.get("a"))  # None (evicted)
print(lru.get("b"))  # 2

Performance Comparison

Operation HashMap Python dict Binary Search Tree Array
Insert O(1) avg O(1) avg O(log n) O(n)
Search O(1) avg O(1) avg O(log n) O(n)
Delete O(1) avg O(1) avg O(log n) O(n)
Ordered Iteration No No (Python 3.7+) Yes Yes
Memory Overhead Medium Low High Low
Cache Locality Poor Good Poor Excellent

API Reference

Hash Map Implementation

A hash map (also known as hash table or dictionary) is a data structure that implements an associative array abstract data type, mapping keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

Key Features:

  • Key-value pair storage
  • Fast insertion, deletion, and lookup operations
  • Automatic resizing to maintain load factor
  • Collision resolution via separate chaining
  • Support for any hashable key type
  • Pluggable hash functions for different use cases

Time Complexities (average case):

  • Insert: O(1)
  • Delete: O(1)
  • Search: O(1)
  • Worst case: O(n) when all keys hash to same bucket

Space Complexity: O(n)

HashMap(initial_capacity=16, load_factor=0.75, hash_function=None)

Bases: Generic[K, V]

Hash Map implementation using separate chaining for collision resolution.

This implementation automatically resizes when the load factor exceeds a threshold to maintain good performance characteristics. It supports pluggable hash functions for different use cases and performance requirements.

Initialize hash map.

Parameters:

Name Type Description Default
initial_capacity int

Initial number of buckets (should be power of 2)

16
load_factor float

Maximum ratio of size to capacity before resizing

0.75
hash_function Optional[HashFunction]

Custom hash function to use (defaults to FNV hash)

None

__contains__(key)

Check if key exists (in operator support).

__delitem__(key)

Delete key-value pair (dict-like access).

__eq__(other)

Check equality with another hash map.

__getitem__(key)

Get value by key (dict-like access).

__iter__()

Return iterator over keys.

__len__()

Return number of key-value pairs.

__repr__()

Return detailed string representation for debugging.

__setitem__(key, value)

Set value by key (dict-like access).

__str__()

Return string representation.

capacity()

Return current capacity (number of buckets).

clear()

Remove all key-value pairs from hash map.

contains_key(key)

Check if key exists in hash map.

Parameters:

Name Type Description Default
key K

Key to check

required

Returns:

Type Description
bool

True if key exists, False otherwise

contains_value(value)

Check if value exists in hash map.

Parameters:

Name Type Description Default
value V

Value to check

required

Returns:

Type Description
bool

True if value exists, False otherwise

get(key, default=None)

Get value associated with key.

Parameters:

Name Type Description Default
key K

Key to look up

required
default Optional[V]

Default value if key not found

None

Returns:

Type Description
Optional[V]

Value associated with key, or default if not found

get_bucket_distribution()

Get distribution of elements across buckets (for debugging/analysis).

Returns:

Type Description
List[int]

List where index i contains count of elements in bucket i

get_hash_function()

Get the current hash function.

Returns:

Type Description
HashFunction

Current hash function instance

get_hash_statistics()

Get detailed statistics about hash function performance.

Returns:

Type Description
dict

Dictionary with hash distribution and collision statistics

is_empty()

Check if hash map is empty.

items()

Return iterator over all key-value pairs.

keys()

Return iterator over all keys.

load_factor()

Return current load factor.

put(key, value)

Insert or update a key-value pair.

Parameters:

Name Type Description Default
key K

Key to insert/update

required
value V

Value to associate with key

required

Returns:

Type Description
Optional[V]

Previous value if key existed, None otherwise

remove(key)

Remove key-value pair from hash map.

Parameters:

Name Type Description Default
key K

Key to remove

required

Returns:

Type Description
Optional[V]

Removed value if key existed, None otherwise

set_hash_function(hash_function)

Change the hash function and rehash all existing elements.

Parameters:

Name Type Description Default
hash_function HashFunction

New hash function to use

required

size()

Return number of key-value pairs.

update(other)

Update hash map with key-value pairs from another hash map.

values()

Return iterator over all values.

HashMapNode(key, value)

Bases: Generic[K, V]

Node class for hash map entries using separate chaining.