0% found this document useful (0 votes)
30 views20 pages

Rabin Karp and KMP Algorithm

The document discusses string searching algorithms, specifically the Rabin-Karp and Knuth-Morris-Pratt (KMP) algorithms, highlighting their efficiency in finding substrings within large texts. The Rabin-Karp algorithm utilizes hashing for quick comparisons, while KMP employs a prefix table to avoid redundant checks. Both algorithms have practical applications in areas such as plagiarism detection, DNA analysis, and spam filtering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views20 pages

Rabin Karp and KMP Algorithm

The document discusses string searching algorithms, specifically the Rabin-Karp and Knuth-Morris-Pratt (KMP) algorithms, highlighting their efficiency in finding substrings within large texts. The Rabin-Karp algorithm utilizes hashing for quick comparisons, while KMP employs a prefix table to avoid redundant checks. Both algorithms have practical applications in areas such as plagiarism detection, DNA analysis, and spam filtering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A.

A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Introduction

• String Searching: Find a substring (pattern) in a large text.


• Challenge: Search efficiently in large datasets.
• Rabin-Karp Solution:
• Uses hashing for efficient matching.
• Compares hash values instead of individual characters.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Rabin-Karp Algorithm

• Hash-based efficient string-search algorithm.


• Compares pattern hash with text substrings.
• Verifies matches when hashes are identical

 Key Advantage:
• Efficient for multiple pattern searches in large datasets.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Steps of Rabin-Karp Algorithm

1. Compute hash of the pattern.


2. Compute hash of the first substring in the text.
3. Compare pattern hash with substring hash.
4. If hashes match, verify characters (to avoid collisions).
5. Slide the window by one character.
6. Use rolling hash to compute the next hash.
7. Repeat until the end of the text.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Real-Life Applications

 Plagiarism Detection
 Search Engines
 Intrusion Detection
 DNA Sequence
 Data Deduplication.
 Digital Forensics

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Complexity of Rabin-Karp Algorithm

 Best Case: 𝑂(𝑛+𝑚)


Hashes of pattern and substrings match without collisions.

 Average Case: 𝑂(𝑛+𝑚)


Few or no hash collisions occur during matching.

 Worst Case: 𝑂(𝑛×𝑚)


Hash collisions require character-by-character comparison for
each window.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Knuth-Morris-Pratt (KMP) Algorithm

 Finds occurrences of a pattern in a given text.


 Avoids redundant comparisons by using a prefix table.
 Preprocesses the pattern to optimize the search.
 Shifts the pattern intelligently after mismatches to improve
efficiency.
 Efficient pattern matching algorithm.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Steps

 Preprocessing : Construct prefix table (LPS).


 Pattern Matching : Compare pattern with text.
 Mismatch Handling : Shift pattern using LPS.
 Efficient Search : Avoid redundant comparisons.
 Continue Search : Repeat until pattern is found.
 Final Match : Return match index if found.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Example

 Text : ABABDABACDABABCABAB
 Pattern : ABABCABAB

 Steps:

1. Preprocessing Phase (LPS Table)

 Compute the Longest Prefix Suffix (LPS) array for the pattern:
Pattern: ABABCABAB
LPS Table: [0, 0, 1, 2, 0, 1, 2, 3, 4]

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
2. Pattern Matching Phase

 Start matching the pattern with the text from left to right:
 Compare A (text) with A (pattern) → Match.
 Compare B (text) with B (pattern) → Match.
 Compare A (text) with A (pattern) → Match.
 Compare B (text) with B (pattern) → Match.
 Compare D (text) with C (pattern) → Mismatch.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
3. Mismatch Handling (Shifting the Pattern)

 Use the LPS table to shift the pattern:


 LPS[4] = 0, so we shift the pattern by 3 characters, not 1.
 Continue matching from the shifted position.

4 .Final Match
1. Continue matching, and you find that the pattern occurs at index 10 in
the text.

 Output: Pattern found at index: 10

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Real life Applications

 String Searching: Quickly searches for patterns in long texts.


 Compilers: Used for searching tokens or keywords in source
code.
 DNA Analysis: Locates genetic sequences efficiently.
 Spam Filtering: Detects specific spam phrases in messages

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Advantages

 Efficient
 Fast
 Linear
 Optimal
 No Backtracking
 Reliable

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
! ! !
a nk You
Th

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1

You might also like