DAA - Notes-Unit-3 and 4
DAA - Notes-Unit-3 and 4
In the above figure ->Bounding Function: No girl sit in the middle of two boys
Types of Branch and Bound
1. FIFO Branch and Bound Search: This leads to a breadth-first search.
For this we will use a data structure called Queue. Initially Queue is empty.
2. LIFO Branch and Bound Search: This leads to a depth-first search
For this we will use a data structure called stack. Initially stack is empty.
3. LC (lowest/least cost) Branch and Bound Search: the branch is extended by the node which
adds the lowest additional costs, according to a given cost function.
N-Queens Problem
Nauck made an 8X8 Chessboard to find the first Feasible Solution. In a NxN square board N –
number of queens need to be placed considering three Condition ---
• No two Queens can be placed in same row, same column, and same diagonal.
Time Complexity of N-Queen Problem is O (n!)
Backtracking Algorithm
The idea is to place queens one by one in different columns, starting from the leftmost column.
When we place a queen in a column, we check for clashes with already placed queens. In the
current column, if we find a row for which there is no clash, we mark this row and column as
part of the solution. If we do not find such a row due to clashes then we backtrack and return
false.
1) Start in the leftmost column
2) If all queens are placed return true
3) Try all rows in the current column. Do following for every tried row.
a) If the queen can be placed safely in this row then mark this [row, column] as part of
the solution and recursively check if placing queen here leads to a solution.
b) If placing queen in [row, column] leads to a solution then return true.
c) If placing queen doesn't lead to a solution then unmark this [row, column] (Backtrack)
and go to step (a) to try other rows.
3) If all rows have been tried and nothing worked, return false to trigger backtracking.
BOUNDING FUNCTION
n
Time Complexity of Sum of Subset Problem: O (2 )
Example: 2
W = {4, 5, 6, 3} and M = 13
text T a b c a b a a b c a b a c
s=3
pattern P a b a a
Figure 32.1 An example of the string-matching problem, where we want to find all occurrences of
the pattern P D abaa in the text T D abcabaabcabac. The pattern occurs only once in the text,
at shift s D 3, which we call a valid shift. A vertical line connects each character of the pattern to its
matching character in the text, and all matched characters are shaded.
988 Chapter 32 String Matching
The naive algorithm finds all valid shifts using a loop that checks the condition
P Œ1 : : m D T Œs C 1 : : s C m for each of the n m C 1 possible values of s.
a c a a b c a c a a b c a c a a b c a c a a b c
Figure 32.4 The operation of the naive string matcher for the pattern P D aab and the text
T D acaabc. We can imagine the pattern P as a template that we slide next to the text. (a)–(d) The
four successive alignments tried by the naive string matcher. In each part, vertical lines connect cor-
responding regions found to match (shown shaded), and a jagged line connects the first mismatched
character found, if any. The algorithm finds one occurrence of the pattern, at shift s D 2, shown in
part (c).
KMP (Knuth Morris Pratt) Pattern Searching
The Naive pattern searching algorithm doesn’t work well in cases where we see many matching characters
followed by a mismatching character. Following are some examples.
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
txt[] = "ABABABCABABABCABABABC"
pat[] = "ABABAC" (not a worst case, but a bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more
than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea
behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of
the characters in the text of the next window. We take advantage of this information to avoid matching the
characters that we know will anyway match. Let us consider below example to understand this.
Matching Overview
txt = "AAAAABAAABA"
pat = "AAAA"
Need of Preprocessing?
An important question arises from the above explanation,
how to know how many characters to be skipped. To know this,
we pre-process pattern and prepare an integer array
lps[] that tells us the count of characters to be skipped.
Preprocessing Overview:
KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as size of pattern)
which is used to skip characters while matching.
name lps indicates longest proper prefix which is also suffix.. A proper prefix is prefix with whole
string not allowed. For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A”
and “AB”. Suffixes of the string are “”, “C”, “BC” and “ABC”.
We search for lps in sub-patterns. More clearly we focus on sub-strings of patterns that are either prefix
and suffix.
For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper
prefix which is also a suffix of the sub-pattern pat[0..i].
lps[i] = the longest proper prefix of pat[0..i]
which is also a suffix of pat[0..i].
Note : lps[i] could also be defined as longest prefix which is also proper suffix. We need to use properly at one
place to make sure that the whole substring is not considered.
Searching Algorithm:
Unlike Naive algorithm, where we slide the pattern by one and compare all characters at each shift, we use a
value from lps[] to decide the next characters to be matched. The idea is to not match a character that we
know will anyway match.
How to use lps[] to decide next positions (or to know a number of characters to be skipped)?
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
lps[] = {0, 1, 2, 3}
i = 0, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 1, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 2, j = 2
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
pat[i] and pat[j] match, do i++, j++
i = 3, j = 3
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 4, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3
i = 5, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3
i = 5, j = 2
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[1] = 1
i = 5, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j > 0, change only j
j = lps[j-1] = lps[0] = 0
i = 5, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] do NOT match and j is 0, we do i++.
i = 6, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++ and j++
i = 7, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++ and j++
We continue this way...
Knuth–Morris–Pratt algorithm { Worst case Time complexity to O(n)}
The Rabin-Karp algorithm
Rabin-Karp is another pattern searching algorithm to find the pattern in a more efficient way. It
also checks the pattern by moving window one by one, but without checking all characters for all
cases, it finds the hash value. When the hash value is matched, then only it tries to check each
character. This procedure makes the algorithm more efficient.
The Rabin-Karp algorithm makes use of hash functions and the rolling hash technique. A hash
function is essentially a function that maps one thing to a value. In particular, hashing can map
data of arbitrary size to a value of fixed size.
Hashing: Hashing is a way to associate values using a hash function to map an input to an
output. The purpose of a hash is to take a large piece of data and be able to be represented by a
smaller form.
Example 1:
The Rabin-Karp algorithm. Each character is a decimal digit, and we compute values modulo 13.
(a) A text string. A window of length 5 is shown shaded. The numerical value of the shaded
number, computed modulo 13, yields the value 7. (b) The same text string with values computed
modulo 13 for each possible position of a length-5 window. Assuming the pattern P = 31415, we
look for windows whose value modulo 13 is 7, since 31415 ≅7 (mod 13). The algorithm finds
two such windows, shown shaded in the figure. The first, beginning at text position 7, is indeed
an occurrence of the pattern, while the second, beginning at text position 13, is a spurious hit.
Time complexity:
The time complexity is O(m+n)[Non Spurious hit], but for the worst case, it is O(mn)
[Spurious hit]
Example 2:
Text=31415926535 Pattern=26 Example 3:
Hash function= 26 mod 13=0
Window of length 2 taken Text= cadba {Take ASCII Value}
31 mod 13=5 Slide Window
14 mod 13=1 Text= 99971009897
41 mod 13=2
15 mod 13=2 Pattern= db
59 mod 13=7
92 mod 13=1 Pattern= 10098
26 mod 13=0[Non Spurious hit][valid match]
65 mod 13=0[Spurious hit][not valid match] Hash Function=10098 mod 13=10
53 mod 13=1
35 mod 13=9
NP-Completeness | Set 1 (Introduction)
We have been writing about e cient algorithms to solve complex problems, like shortest
path, Euler graph, minimum spanning tree, etc. Those were all success stories of algorithm
designers. In this post, failure stories of computer science are discussed.
NP-complete problems are the hardest problems in NP set. A decision problem L is NP-complete if:
1) L is in NP (Any given solution for NP-complete problems can be veri ed quickly, but there
is no e cient known solution).
2) Every problem in NP is reducible to L in polynomial time (Reduction is de ned below).
A problem is NP-Hard if it follows property 2 mentioned above, doesn’t need to follow
property 1. Therefore, NP-Complete set is also a subset of NP-Hard set.
NP-completeness applies to the realm of decision problems. It was set up this way
because it’s easier to compare the di culty of decision problems than that of optimization
problems. In reality, though, being able to solve a decision problem in polynomial time will
often permit us to solve the corresponding optimization problem in polynomial time (using a
polynomial number of calls to the decision problem). So, discussing the di culty of
decision problems is often really equivalent to discussing the di culty of optimization
problems. (Source Ref 2).
For example, consider the vertex cover problem (Given a graph, nd out the minimum sized
vertex set that covers all edges). It is an optimization problem. Corresponding decision
problem is, given undirected graph G and k, is there a vertex cover of size k?
What is Reduction?
Let L1 and L2 be two decision problems. Suppose algorithm A2 solves L2. That is, if y is an
input for L2 then algorithm A2 will answer Yes or No depending upon whether y belongs to
L2 or not.
The idea is to nd a transformation from L1 to L2 so that the algorithm A2 can be part of an
algorithm A1 to solve L1.
Learning reduction in general is very important. For example, if we have library functions to
solve certain problem and if we can reduce a new problem to one of the solved problems,
we save a lot of time. Consider the example of a problem where we have to nd minimum
product path in a given directed graph where product of path is multiplication of weights of
edges along the path. If we have code for Dijkstra’s algorithm to nd shortest path, we can
take log of all weights and use Dijkstra’s algorithm to nd the minimum product path rather
than writing a fresh code for this new problem.
From the de nition of NP-complete, it appears impossible to prove that a problem L is NP-
Complete. By de nition, it requires us to that show every problem in NP is polynomial time
reducible to L. Fortunately, there is an alternate way to prove it. The idea is to take a
known NP-Complete problem and reduce it to L. If polynomial time reduction is possible,
we can prove that L is NP-Complete by transitivity of reduction (If a NP-Complete problem is
reducible to L in polynomial time, then all problems are reducible to L in polynomial time).
It is always useful to know about NP-Completeness even for engineers. Suppose you are
asked to write an e cient algorithm to solve an extremely important problem for your
company. After a lot of thinking, you can only come up exponential time approach which is
impractical. If you don’t know about NP-Completeness, you can only say that I could not
come with an e cient algorithm. If you know about NP-Completeness and prove that the
problem as NP-complete, you can proudly say that the polynomial time solution is unlikely to
exist. If there is a polynomial time solution possible, then that solution solves a big problem
of computer science many scientists have been trying for years.
We will soon be discussing more NP-Complete problems and their proof for NP-
Completeness.
References:
MIT Video Lecture on Computational Complexity
Introduction to Algorithms 3rd Edition by Clifford Stein, Thomas H. Cormen, Charles E.
P, NP-Complete, NP, and NP-Hard (in Short)
NP problems have their own significance in programming, but the discussion becomes quite hot
when we deal with differences between NP, P, NP-Complete and NP-hard.
P- Polynomial time solving: Problems which can be solved in polynomial time, which take time
like O(n), O(n2), O(n3). E.g.: finding maximum element in an array or to check whether a string
is palindrome or not. So there are many problems which can be solved in polynomial time.
NP- Non deterministic Polynomial time solving: Problem which can't be solved in polynomial
time like TSP (travelling salesman problem) or an easy example of this is subset sum problem:
given a set of numbers, does there exist a subset whose sum is zero? But NP problems are
checkable in polynomial time means that given a solution of a problem, we can check that
whether the solution is correct or not in polynomial time.
Now we will discuss about NP-Complete and NP-hard.
Take two problems A and B both are NP problems.
Reducibility- If we can convert one instance of a problem A into problem B (NP problem) then it
means that A is reducible to B.
NP-hard-- Now suppose we found that A is reducible to B, then it means that B is at least as
hard as A.
NP-Complete -- The group of problems which are both in NP and NP-hard are known as NP-
Complete problem.
Now suppose we have a NP-Complete problem R and it is reducible to Q then Q is at least as
hard as R and since R is an NP-hard problem. Therefore Q will also be at least NP-hard, it may
be NP-complete also.
Approximation Algorithm
Given an optimization problem P, an algorithm A is said to be an approximation algorithm for P,
if for any given instance I, it returns an approximate solution that is a feasible solution.
Approximation algorithms
1. Guaranteed to run in polynomial time.
2. Guaranteed to get a solution which is close to the optimal solution.
3. Obstacle: need to prove a solution’s value is close to optimum value, without even knowing
what the optimum value is!
Types of approximation
P -An optimization problem, A -An approximation algorithm, I -An instance of P,
A∗(I) -Optimal value for the instance I, A(I) -Value for the instance I generated by A
1. Absolute approximation: A is an absolute approximation algorithm if there exists a constant k
such that, for every instance I of P, A∗(I) − A(I) ≤ k. Example: Planar graph coloring.
2. Relative approximation: A is an relative approximation algorithm if there exists a constant k
such that, for every instance I of P, max{ A∗(I) /A(I) , A(I) /A∗(I) } ≤ k. Example: Vertex cover.
The full walk of above tree would be
1-2-1-4-1-3-1. Apply Hamiltonian
cycle so that the output is 1-2-4-3-1
Cost=10+25+30+15=80
In this case, the approximate
algorithm produces the optimal tour,
but it may not produce optimal tour
in all cases.
A full walk is lists all vertices when they are first visited in preorder, it also list vertices when
they are returned after a subtree is visited in preorder. The full walk of above tree would be 1-2-
1-4-1-3-1.
Following are some important facts that prove the 2-approximateness.
1) The cost of best possible Travelling Salesman tour is never less than the cost of MST.
(The definition of MST says, it is a minimum cost tree that connects all vertices).
2) The total cost of full walk is at most twice the cost of MST (Every edge of MST is visited at-
most twice)
3) The output of the above algorithm is less than the cost of full walk. In above algorithm, we
print preorder walk as output. In preorder walk, two or more edges of full walk are replaced with
a single edge. For example, 2-1 and 1-4 are replaced by 1 edge 2-4. So if the graph follows
triangle inequality, then this is always true.
From the above three statements, we can conclude that the cost of output produced by the
approximate algorithm is never more than twice the cost of best possible solution.
A Hamiltonian cycle, also called a Hamiltonian circuit, Hamilton cycle, or Hamilton circuit, is a
graph cycle (i.e., closed loop) through a graph that visits each node exactly once. Determining
whether such paths and cycles exist in graphs is the Hamiltonian path problem, which is NP-
complete.
35.2 The traveling-salesman problem 1113
Example:2
a d a d a d
e e e
b f g b f g b f g
c c c
h h h
a d a d
e e
b f g b f g
Time Complexity of
c c
APPROX-TSP-TOUR
h h ϴ(V ²)
(d) (e)
Figure 35.2 The operation of A PPROX -TSP-T OUR. (a) A complete undirected graph. Vertices lie
on intersections of integer grid lines. For example, f is one unit to the right and two units up from h.
The cost function between two points is the ordinary euclidean distance. (b) A minimum spanning
tree T of the complete graph, as computed by MST-P RIM. Vertex a is the root vertex. Only edges
in the minimum spanning tree are shown. The vertices happen to be labeled in such a way that they
are added to the main tree by MST-P RIM in alphabetical order. (c) A walk of T , starting at a. A
full walk of the tree visits the vertices in the order a; b; c; b; h; b; a; d; e; f; e; g; e; d; a. A preorder
walk of T lists a vertex just when it is first encountered, as indicated by the dot next to each vertex,
yielding the ordering a; b; c; h; d; e; f; g. (d) A tour obtained by visiting the vertices in the order
given by the preorder walk, which is the tour H returned by A PPROX -TSP-T OUR. Its total cost
is approximately 19:074. (e) An optimal tour H for the original complete graph. Its total cost is
approximately 14:715.
Recall from Section 12.1 that a preorder tree walk recursively visits every vertex
in the tree, listing a vertex when it is first encountered, before visiting any of its
children.
Figure 35.2 illustrates the operation of A PPROX -TSP-T OUR. Part (a) of the fig-
ure shows a complete undirected graph, and part (b) shows the minimum spanning
tree T grown from root vertex a by MST-P RIM. Part (c) shows how a preorder
walk of T visits the vertices, and part (d) displays the corresponding tour, which is
the tour returned by A PPROX -TSP-T OUR. Part (e) displays an optimal tour, which
is about 23% shorter.
Vertex Cover Problem
Notes Prepared by
Prof. Chandrakanta Mahanty