Algorithms
Rabin-Karp Algorithm
Overview
Rabin-Karp is a string matching algorithm that uses hashing to find pattern occurrences in text. Instead of comparing characters directly, it compares hash values of the pattern with hash values of text substrings.
Algorithm Steps
- Compute hash of pattern
- Compute hash of first window (first m characters of text)
- Slide the window across text:
- If hashes match: verify with character-by-character comparison (avoid false positives)
- Use rolling hash to update window hash in O(1)
Rolling Hash Formula
Using polynomial rolling hash with base d and modulus q:
- Initial:
hash = (c[0]*d^(m-1) + c[1]*d^(m-2) + ... + c[m-1]) % q - Roll:
new_hash = ((old_hash - c[old]*d^(m-1)) * d + c[new]) % q
Complexity
| Metric | Complexity | Reason |
|---|---|---|
| Time Complexity | Average case; O(nm) worst case with collisions | |
| Space Complexity | Only stores hash values and pointers |
Where n = length of text, m = length of pattern
Implementation
def rabin_karp(text: str, pattern: str) -> list[int]:
n, m = len(text), len(pattern)
if m > n:
return []
d = 256
q = 101
h = pow(d, m - 1, q)
p_hash = 0
t_hash = 0
for i in range(m):
p_hash = (d * p_hash + ord(pattern[i])) % q
t_hash = (d * t_hash + ord(text[i])) % q
result = []
for i in range(n - m + 1):
if p_hash == t_hash:
if text[i:i + m] == pattern:
result.append(i)
if i < n - m:
t_hash = (d * (t_hash - ord(text[i]) * h) + ord(text[i + m])) % q
if t_hash < 0:
t_hash += q
return resultExample
text = "ABABDABABC"
pattern = "ABAB"
result = rabin_karp(text, pattern)With d=256, q=101, m=4:
- Pattern hash: computed once
- Window slides: hash updated in O(1) using rolling formula
- Hash match at i=0: verify "ABAB" == "ABAB" ✓
- Hash match at i=5: verify "ABAB" == "ABAB" ✓
- Result: [0, 5]
Key Points
- When to use: Multiple pattern search, plagiarism detection, finding repeated substrings
- Hash collisions: Always verify matches with direct comparison
- Choice of q: Use prime number to reduce collisions
- Choice of d: Typically 256 (ASCII) or alphabet size
- vs KMP: Rabin-Karp better for multiple patterns; KMP guarantees O(n+m)
- Variants: Can search multiple patterns simultaneously by storing pattern hashes in a set