0% found this document useful (0 votes)
147 views4 pages

Brute Force Algorithm

The document proposes the Start-End-Mid string searching algorithm as an improvement over the brute-force algorithm. The brute-force algorithm compares characters one by one with quadratic time complexity. The Start-End-Mid algorithm first compares the start, end, and middle characters of the pattern to the text segment before doing full character comparison, avoiding full comparisons when there is an early mismatch. It finds all occurrences of the pattern in the text without preprocessing the pattern or text. The algorithm is explained with steps and a flowchart.

Uploaded by

Shima Fanissa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views4 pages

Brute Force Algorithm

The document proposes the Start-End-Mid string searching algorithm as an improvement over the brute-force algorithm. The brute-force algorithm compares characters one by one with quadratic time complexity. The Start-End-Mid algorithm first compares the start, end, and middle characters of the pattern to the text segment before doing full character comparison, avoiding full comparisons when there is an early mismatch. It finds all occurrences of the pattern in the text without preprocessing the pattern or text. The algorithm is explained with steps and a flowchart.

Uploaded by

Shima Fanissa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

24 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.

7, July 2011

An Algorithm for String Searching Based on Brute-Force


Algorithm
Rawan Ali Abdeen

Summary mismatch occurs [9][10]. The Rabin–Karp algorithm


String searching is a very important component of many computes a hash function to seek for a pattern within a
problems, including text editing, text searching and symbol given text [10]. The Boyer-Moore algorithm works by
manipulation. In this paper a string searching algorithm is searching the target string from right to left, while moving
proposed as an improvement of the brute-force searching
it left to right [9]. The Start-To-End algorithm begins the
algorithm. The algorithm is named Start-End-Mid Algorithm.
The proposed algorithm does not preprocess neither the pattern
search process by comparing the first character of the
nor the text to perform searching. pattern with the first character of the segment taken, if
Key words: they match, then it compares the last character of the
String searching, pattern, start-end-mid algorithm. pattern with the last character of the segment, if a match
occurs, then it will allow to perform character by character
matching between the segment and the pattern, for the rest
1. Introduction of the characters that remain without comparing [11].

Although we deal with data in a lot of forms, text remains


the main form to exchange information and take advantage 2. Brute-Force Algorithm
of it.
String searching sometimes called string matching is Brute-force algorithm, which is also called the “naïve” is
concerned in finding the occurrences of a substring the simplest algorithm that can be used in pattern
(called the pattern) of length m in a string (called the searching. It is probably the first algorithm we might think
text) of length n (where n ≥ m) [1-3]. of for solving the pattern searching problem. It requires no
In order to search for a pattern within a string, an preprocessing of the pattern or the text [12].
algorithm is needed to find the pattern as well as to know The idea is that the pattern and text are compared
the locations where it was found in a given sequence of character by character [8][10]; in the case of a mismatch,
characters. the pattern is shifted one position to the right and
A lot of algorithms were created to perform string comparison is repeated, until a match is found or the end
searching. Each algorithm uses a specific strategy to of the text is reached [1].
perform the search. Some need to preprocess the pattern The algorithm works with two pointers; a “text pointer” i
[4-6]. Others need to preprocess the text; also there are and a “pattern pointer” j. For all (n-m) possibly valid
algorithms that require both the pattern and the text to be shifts, pattern and text are compared; while text and
preprocessed before searching [7] and some do not pattern characters are equal, the pattern pointer is
perform preprocessing neither for the text nor for the incremented. If a mismatch occurs, i is incremented, j is
pattern. reset to zero and the comparing process is restarted. In
One of the simplest string searching algorithms is the case a match is found, the algorithm returns the position of
Brute-force algorithm. It is the least efficient way to check the pattern; if not, it returns not found message [9, 12].
whether one string occurs inside another. The worst case will happen if all the characters of the
Various string searching algorithms were created to pattern were matched with the text segment except the last
improve the Brute-Force algorithm. From those one.
algorithms: the Knuth-Morris-Pratt (KMP), Boyer- Referring to the algorithm, the outer for-loop is executed
Moore (BM) and Karp and Rabin algorithms [1][8]. Still at most n-m+1 times and the inner loop is executed at
to determine which of the algorithms is the best to use most m times. Thus, the running time (time complexity) of
depends on the application were the algorithm is to be the brute force algorithm is: O((n-m+1)m) which is
applied. O(nm) [8]. In the worst case, when n and m are equal, this
The Knuth-Morris-Pratt (KMP) algorithm uses algorithm has a quadratic running time [1].
information about the characters of the pattern to
determine how much to move along that string after a

Manuscript received July 5, 2011


Manuscript revised July 20, 2011
IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.7, July 2011 25

3. Start-End-Mid Algorithm to the next step, else if a mismatch occurs, then take the
next segment of the text and go to step 2.
This algorithm finds all the occurrences of the pattern in
Step 5: Perform character by character comparison for the
the text. It does not require performing preprocessing
rest of the characters of the pattern with the rest of the
neither for the text nor for the pattern.
characters of the segment taken. Note that, in this step if
The idea is that the first, last and mid characters of the
the pattern consists of two characters then the first and last
pattern are first compared to the corresponding first, last
characters will not be considered within the comparison,
and mid characters of the segment taken from the text, in
while; if the pattern consists of more than one character
which we start by comparing the first character in the
then the first, last and floor(length of the pattern/2)
pattern with the first character in the segment, if they
characters are not considered in the comparison process. If
match, then we continue by comparing the last character in
a mismatch is encountered while matching in any step of
the pattern with the last character in the text, if a match
the comparison, then we stop comparing and proceed with
occurs then we proceed by comparing the mid character in
the next segment for comparison and go to step 2, else
the pattern with the mid character in the text, if a match
continue comparing. If all the characters of the pattern
occurs, then the rest of the characters of the pattern will be
match with the characters of the segment then signal that
compared with the rest of the characters of the segment
the pattern was found and at which location in the text it
taken, character by character. In the case of a mismatch
was found. After that, proceed with the next segment and
while performing character by character comparison, we
repeat step 2, to search for other occurrences of the pattern
directly take the next segment of the text shifted one
in the text.
position from the previous one, else continue comparing.
If all the characters of the pattern match with the
characters of the segment then signal that the pattern was 4. Results
found and at which location in the text it was found. After
that, proceed with the next segment to find other The proposed algorithm finds all the occurrences of the
occurrences of the pattern in the text. If we have scanned pattern in the text. The improvement process for the Brute-
all of the segments of the text without matching the pattern Force algorithm within the proposed (Start-End-Mid)
with any of the introduced segments a not found signal is algorithm does not require performing preprocessing for
performed. Fig. 1 illustrates the algorithm in a flowchart the pattern as the other algorithms that have improved the
to find the first occurrence of the pattern within the text. Brute-Force algorithm do. Table 1 summaries the
algorithms that has improved the brute-force algorithm
3.1 Algorithm Steps with their time complexity.
The time complexity for the proposed Start-End-Mid
Step 1: Divide the text into segments in which the first algorithm can be detailed as follows:
segment begins from element at index 0, the second
segment begins at the element of index 1 and so on. That ‰ If the first, last and mid characters of the pattern
is each segment to be taken is shifted one character than does not match with the first, last and mid
the previous one. characters of all the segments in the text, then the
Step 2: Compare the first character of the pattern with the time complexity would be:
corresponding first character of the segment taken, if a O( ((n-m)+1) * (m-3) ).
match occurs, then go to the next step. If a mismatch
occurs, then take the next segment and repeat step 2. ‰ If the first character of the pattern does not match
the first character of all the segments in the text,
Step 3: Compare the last character of the pattern with the then the time complexity would be:
corresponding last character of the segment taken, if a O(n-m+1).
match occurs, then go to the next step, else if a mismatch
occurs, then take the next segment of the text and go to Table 2 illustrates the differences in the time complexity
step 2. between the brute-force algorithm and the proposed start-
end-mid algorithm depending on an example where the
Step 4: Check the length of the pattern. If the pattern
number of characters of the text is 14 and of the pattern is
consists of two characters then go to Step 5. While, if it
5.
consists of more than two characters then compare the
floor(length of the pattern/2) character of the pattern
with the corresponding floor(length of the pattern/2)
character of the segment taken, if a match occurs, then go
26 IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.7, July 2011
IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.7, July 2011 27

Table 1: A summary for the algorithms that has improved the brute-force algorithm with their time complexity
Preprocessing
Algorithm Time Complexity
the Pattern
Brute-Force
No preprocessing O((n-m+1)*m)
Algorithm
Start-End-Mid
No preprocessing O(((n-m)+1)*(m-3))
Algorithm
Start-to-End O(((n-m)+1)*(m-
No preprocessing
Algorithm 2))
Rabin-Karp Preprocesses the
O(nm)
Algorithm pattern
Knuth-Morris- Preprocesses the
O(n+m)
Pratt Algorithm pattern
Boyer-Moore Preprocesses the
O(nm)
Algorithm pattern

Table 2: The differences in the time complexity between the brute-force algorithm and the proposed start-end-mid algorithm depending on an example
where the number of characters of the text is 14 and the number of characters of the pattern is 5.
Brute-Force Time Start-to-End Time Start-End-Mid
Description
Complexity Complexity Time Complexity
If the pattern is not found after
performing character by character
matching for all of the segments 44 30 20
of the text with the characters of
the pattern.
If the first character of the pattern
does not match the first character 44 10 10
of all the segments in the text.
5. Conclusion
[6] Bruce W. and Watson E., A Boyer-Moore-style Algorithm
In conclusion, this paper has proposed a string searching for Regular Expression Pattern Matching, Science of
Computer Programming, 2003, 48: 99-117.
algorithm as an improvement for the brute-force algorithm
[7] Fenwick P., Fast string matching for multiple searches,
without the need to preprocess neither the pattern nor the Software–Practice and Experience, 2001, 31(9):815–833.
text. The improvement that this algorithm has offered over [8] Ohdan Masanori, Takeuchi Ryo And Satou Tadamasa, An
the brute-force algorithm is that it does not allow character Evaluation of String Search Algorithms at Users Standing,
by character matching between the segment taken from the Proceedings of the 3rd WSES International Conference on
text and the pattern only after it checks that the first, last Mathematics and Computers in Mechanical Engineering
and mid characters in the pattern match the corresponding (MCME), 2001, pp. 4231-4236, ISBN: 960-8052-35-1.
first, last and mid characters in the segment taken from the [9] Softpanorama, Searching Algorithms, 2001,
text. This process of checking improved the time of http://www.softpanorama.org/Algorithms/searching.shtml.
Accessed on 8 Jan., 2011.
searching of the brute-force algorithm.
[10] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
and Clifford Stein, Introduction to Algorithms, 3rd edition,
References 2009, MIT Press.
[1] Thierry Lecroq, Experimental Results on String Matching [11] Rawan A. Abdeen, Start-to-End Algorithm for String
Algorithms, SOFTWARE—PRACTICE AND Searching, IJCSNS International Journal of Computer
EXPERIENCE, 1995, VOL. 25(7), 727–765. Science and Network Security, VOL.11 No.2, February
[2] Stephen G., String Searching Algorithms, World Scientific, 2011, pp. 179-182.
Singapore, 1994. [12] Michael T. Goodrich and Roberto Tamassia, Algorithm
[3] Apostolico A. and Galil Z., Pattern Matching Algorithms, Design, 2002, John Wiley and Sons, Inc.
Oxford University Press, 1997. [13] Hume and Sunday, Fast String Searching, SOFTWARE—
[4] Liu Z, Du X,and Ishii N., An improved adaptive string PRACTICE AND EXPERIENCE, 1991, VOL. 21(11),
searching algorithm, Software Practice and Experience, 1221–1248.
1988, 28(2):191–198. Rawan A. Abdeen received the B.S. degree in Information
[5] Sunday D., A very fast substring search algorithm, Technology and M.S. degree in Computer Science from Al-
Communications of the ACM, 1990, 33(8):132–142. Balqa' Applied University.

You might also like