KMP Algorithm
KMP Algorithm
Algorithm
The problem of String Matching
Given a string ‘S’, the problem of string matching deals with finding
whether a pattern ‘p’ occurs in ‘S’ and if ‘p’ does occur then
returning position in ‘S’ where ‘p’ occurs.
…. a O(mn) approach
String S a b c a b a a b c a b a c
Pattern p a b a a
Step 1:compare p[1] with S[1]
S
a b c a b a a b c a b a c
p a b a a
p a b a a
Step 3: compare p[3] with S[3]
S
a b c a b a a b c a b a c
p a b a a
Mismatch occurs here..
Since mismatch is detected, shift ‘p’ one position to the left and
perform steps analogous to those from step 1 to step 3. At position
where mismatch is detected, shift ‘p’ one position to the right and
repeat matching procedure.
S a b c a b a ab c a b a c
p abaa
Finally, a match would be found after shifting ‘p’ three times to the right side.
Key Differences:
Knuth-Morris-Pratt
Aspect Brute Force
(KMP)
Efficiency O(m⋅n) O(m+n)
Restarts Uses LPS array to
Handling
comparison from skip unnecessary
Mismatches
scratch checks
Requires building
Preprocessing None
an LPS array
Small text and Large text and
Best Use Case
pattern sizes pattern sizes
The Knuth-Morris-Pratt
Algorithm
Knuth, Morris and Pratt proposed a linear time algorithm for the string
matching problem.
A matching time of O(n) is achieved by avoiding comparisons with
elements of ‘S’ that have previously been involved in comparison
with some element of the pattern ‘p’ to be matched. i.e.,
backtracking on the string ‘S’ never occurs
The Knuth-Morris-Pratt
Algorithm
Components of KMP algorithm
• The prefix function, Π
The prefix function,Π for a pattern encapsulates
knowledge about how the pattern matches against
shifts of itself. This information can be used to avoid
useless shifts of the pattern ‘p’. In other words, this
enables avoiding backtracking on the string ‘S’.
• The KMP Matcher
With string ‘S’, pattern ‘p’ and prefix function ‘Π’ as
inputs, finds the occurrence of ‘p’ in ‘S’ and returns
the number of shifts of ‘p’ after which occurrence is
found.
The prefix function, Π
Compute-Prefix-Function (p)
1 m length[p] //’p’ pattern to be matched
2 Π[1] 0
3 k0
4 for q 2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k Π[k]
7 If p[k+1] = p[q]
8 then k k +1
9 Π[q] k
10 return Π
Example: compute Π for the pattern ‘p’ below:
p
a b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k=0
q 1 2 3 4 5 6 7
Step 1: q = 2, k=0 p a b a b a c a
Π 0 0
Π[2] = 0
q 1 2 3 4 5 6 7
p a b a b a c a
Step 2: q = 3, k = 0,
Π[3] = 1 Π 0 0 1
q 1 2 3 4 5 6 7
Step 3: q = 4, k = 1 p a b a b a c A
Π[4] = 2 Π 0 0 1 2
Step 4: q = 5, k =2 q 1 2 3 4 5 6 7
Π[5] = 3 p a b a b a c a
Π 0 0 1 2 3
q 1 2 3 4 5 6 7
Step 5: q = 6, k = 3
Π[6] = 1 p a b a b a c a
Π 0 0 1 2 3 0
q 1 2 3 4 5 6 7
Step 6: q = 7, k = 1 p a b a b a c a
Π[7] = 1 Π 0 0 1 2 3 0 1
Note: KMP finds every occurrence of a ‘p’ in ‘S’. That is why KMP does not terminate in step 12, rather it searches
remainder of ‘S’ for any more occurrences of ‘p’.
Illustration: given a String ‘S’ and pattern ‘p’ as follows:
S
b a c b a b a b a b a c a c a
p a b a b a c a
Let us execute the KMP algorithm to find
whether ‘p’ occurs in ‘S’.
For ‘p’ the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b A b a c a
Π 0 0 1 2 3 0 1
Initially: n = size of S = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with S[1]
S b a c b a b a b a b a c a a b
p a b a b a c a
P[1] does not match with S[1]. ‘p’ will be shifted one position to the right.
Step 2: i = 2, q = 0
comparing p[1] with S[2]
S b a c b a b a b a b a c a a b
p a b a b a c a
P[1] matches S[2]. Since there is a match, p is not shifted.
Step 3: i = 3, q = 1
Comparing p[2] with S[3] p[2] does not match with S[3]
S b a c b a b a b a b a c a a b
p a b a b a c a
Backtracking on p, comparing p[1] and S[3]
Step 4: i = 4, q = 0
comparing p[1] with S[4] p[1] does not match with S[4]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 5: i = 5, q = 0
comparing p[1] with S[5] p[1] matches with S[5]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 6: i = 6, q = 1
p a b a b a c a
Step 7: i = 7, q = 2
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 8: i = 8, q = 3
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 9: i = 9, q = 4
p a b a b a c a
Step 10: i = 10, q = 5
S b a c b a b a b a b a c a a b
p a b a b a c a
Backtracking on p, comparing p[4] with S[10] because after mismatch q = Π[5] = 3
Step 11: i = 11, q = 4
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 12: i = 12, q = 5
p a b a b a c a
S b a c b a b a b a b a c a a b
p a b a b a c a
Pattern ‘p’ has been found to completely occur in string ‘S’. The total number of shifts
that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
Running - time analysis
• KMP Matcher
• Compute-Prefix-Function (Π)
1 n length[S]
1 m length[p] //’p’ pattern to be matched
2 m length[p]
2 Π[1] 0
3 Π Compute-Prefix-Function(p)
3 k0
4q0
4 for q 2 to m
5 for i 1 to n
5 do while k > 0 and p[k+1] != p[q]
6 do while q > 0 and p[q+1] != S[i]
6 do k Π[k]
7 do q Π[q]
7 If p[k+1] = p[q]
8 if p[q+1] = S[i]
8 then k k +1
9 then q q + 1
9 Π[q] k
10 if q = m
10 return Π
11 then print “Pattern occurs with shift” i – m
12 q Π[ q]