CP
Algorithms

KMP Algorithm

Overview

KMP (Knuth-Morris-Pratt) is an efficient string matching algorithm that finds occurrences of a pattern in a text in linear time. Unlike naive matching which restarts from scratch after each mismatch, KMP uses information about previously matched characters to skip redundant comparisons.

Algorithm Steps

Phase 1: Build LPS Array

The LPS (Longest Proper Prefix which is also Suffix) array stores, for each position in the pattern, the length of the longest proper prefix that matches a suffix.

  1. Initialize lps[0] = 0 (no proper prefix for single character)
  2. Use two pointers: length (length of previous longest prefix-suffix), i (current position)
  3. For each position i from 1 to m-1:
    • If pattern[i] == pattern[length]: extend the match, lps[i] = length + 1, increment both
    • Else if length > 0: fall back to lps[length - 1] and retry
    • Else: lps[i] = 0, move to next position

Phase 2: Pattern Matching

  1. Use two pointers: i for text, j for pattern
  2. While i < len(text):
    • If characters match: increment both i and j
    • If j == len(pattern): found match at index i - j, use LPS to continue
    • Else if mismatch:
      • If j > 0: use LPS to skip (j = lps[j - 1])
      • Else: increment i

Complexity

MetricComplexityReason
Time ComplexityO(n+m)O(n + m)LPS built in O(m), matching in O(n)
Space ComplexityO(m)O(m)LPS array stores m values

Where n = length of text, m = length of pattern

Implementation

def build_lps(pattern: str) -> list[int]:
    m = len(pattern)
    lps = [0] * m
    length = 0
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        elif length > 0:
            length = lps[length - 1]
        else:
            lps[i] = 0
            i += 1

    return lps


def kmp_search(text: str, pattern: str) -> list[int]:
    n, m = len(text), len(pattern)
    if m == 0:
        return []

    lps = build_lps(pattern)
    result = []
    i = j = 0

    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1

            if j == m:
                result.append(i - j)
                j = lps[j - 1]
        elif j > 0:
            j = lps[j - 1]
        else:
            i += 1

    return result

Example

LPS Construction for pattern "ABABC"

pattern = "ABABC"
lps = [0, 0, 1, 2, 0]
icharcompare withresultlps[i]
0A--0
1BA0
2AA=1
3BB=2
4CA≠, fallback to 0, ≠0
  • lps[2]=1: "A" is both prefix and suffix of "ABA"
  • lps[3]=2: "AB" is both prefix and suffix of "ABAB"

Pattern Matching

text = "ABABDABABC"
pattern = "ABABC"
lps = [0, 0, 1, 2, 0]

result = kmp_search(text, pattern)
ijtext[i]pattern[j]action
00AAmatch, i=1, j=1
11BBmatch, i=2, j=2
22AAmatch, i=3, j=3
33BBmatch, i=4, j=4
44DCmismatch, j=lps[3]=2
42DAmismatch, j=lps[1]=0
40DAmismatch, i=5
50AAmatch, continue...
...match found at i=10, j=5 → index 5

Key Points

  • When to use KMP: Large texts, long patterns, or repeated searches with same pattern
  • Python's find(): Often sufficient for small patterns (≤10 chars); uses optimized C implementation
  • LPS intuition: Tells us "if we've matched j characters and fail, we've actually matched lps[j-1] characters of a potential new match"
  • Variants: Can be adapted to count occurrences, find all matches, or check if pattern exists
  • Alternative algorithms: Rabin-Karp (rolling hash), Boyer-Moore (skip from end), Z-algorithm

On this page