0% found this document useful (0 votes)
95 views

Design and Analysis of Algorithm: Unit 1

The document outlines the key concepts and algorithms in the design and analysis of algorithms. It covers asymptotic notations like Big-O, Big-Omega and Big-Theta for analyzing time and space complexity. Methods for solving recurrence relations like substitution, iteration and recursion trees are described. Sorting algorithms and their analysis using divide-and-conquer are discussed. Dynamic programming techniques like Knapsack and LCS are presented to optimize problems with overlapping subproblems. Graph algorithms including minimum spanning trees, shortest paths and graph coloring are covered. NP-completeness and approximation algorithms are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Design and Analysis of Algorithm: Unit 1

The document outlines the key concepts and algorithms in the design and analysis of algorithms. It covers asymptotic notations like Big-O, Big-Omega and Big-Theta for analyzing time and space complexity. Methods for solving recurrence relations like substitution, iteration and recursion trees are described. Sorting algorithms and their analysis using divide-and-conquer are discussed. Dynamic programming techniques like Knapsack and LCS are presented to optimize problems with overlapping subproblems. Graph algorithms including minimum spanning trees, shortest paths and graph coloring are covered. NP-completeness and approximation algorithms are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Design and Analysis of

Algorithm

Unit 1
Asymptotic Notations:
methods for solving recurrence relations:
Brief Review of Graphs:
Sets and Disjoint Sets, Union:
Union-Find in Disjoint-Set Union (DSU):
Sorting Algorithms:
Divide and conquer
Strassen's matrix multiplication algorithm and its analysis:
Unit 2
Greedy
Knapsack Problem
Huffman Coding:
Job Sequencing with Deadlines:
Minimum Spanning Tree (MST)
Properties of Minimum Spanning Tree:
Algorithms for Finding Minimum Spanning Tree:
Steps for Finding Minimum Spanning Tree:
Importance and Applications:
Time Complexity:
Single Source Shortest Paths (SSSP) Problem
Bellman-Ford Algorithm: Shortest Paths with Negative Weights
Backtracking: Exploring Solutions One Path at a Time
The 8 Queens Problem: Backtracking in Action
Graph Coloring with Backtracking: Exploring Colors One Vertex at a Time
Hamiltonian Cycles and analysis using backtracking:
Unit 3
DP
Ingredients of Dynamic Programming

Design and Analysis of Algorithm 1


Matrix Chain Multiplication: Optimizing Multiplications with Dynamic Programming
Longest Common Subsequence (LCS): Dynamic Programming for Matching
Sequences
Optimal Binary Search Trees (OBST) using Dynamic Programming
0-1 Knapsack Problem: Finding the Max Loot with Dynamic Programming
Traveling Salesman Problem (TSP): Dynamic Programming in a Maze of Routes
Floyd-Warshall Algorithm:
Branch and Bound: Exploring Paths to Optimality with Pruning Power
0/1 Knapsack Problem with Branch and Bound
Branch and Bound for the Traveling Salesperson Problem (TSP): Exploring Routes
with Pruning Power
Unit 4
Naïve String Matching Algorithm:
Rabin-Karp Algorithm: Hashing Your Way to Efficient String Matching
Finite Automata for String Matching:
Knuth-Morris-Pratt (KMP) Algorithm:
Demystifying the World of Computational Complexity: Basic Concepts
Polynomial Complexity:
Non-Polynomial Complexity:
NP-Completeness:
NP-Hardness:
approximation algorithms
Flow and Sorting Network
Characteristics of Sorting Networks:
Flow in Sorting Networks:
Flow in Flow Networks:
Importance and Applications:
Ford-Fulkerson Algorithm (Flow Networks):
Maximum bipartite matching
Comparison Network:
Zero-One Principle:
Bitonic Sorting Network:
Merging Network:
NOTE

Unit 1

Design and Analysis of Algorithm 2


Asymptotic Notations:
Mathematical tools used to describe the performance of algorithms as input size
grows, focusing on long-term behavior rather than exact execution times.

Common notations: Big-O, Big-Omega, and Big-Theta.

Big-O Notation (O):

Describes the upper bound of an algorithm's complexity, indicating the worst-


case scenario.

Notation: f(n) = O(g(n)) means f(n) grows no faster than g(n) asymptotically.

Example: Linear search has a time complexity of O(n), meaning its worst-case
time grows linearly with input size.

Big-Omega Notation (Ω):

Describes the lower bound of an algorithm's complexity, representing the best-


case scenario.

Notation: f(n) = Ω(g(n)) means f(n) grows at least as fast as g(n) asymptotically.

Example: Sorting a fully sorted array with insertion sort takes O(n) time, but its
best-case complexity is Ω(1) as it only needs to iterate once.

Big-Theta Notation (Θ):

Describes the tight bound of an algorithm's complexity, indicating both the upper
and lower bounds are the same.

Notation: f(n) = Θ(g(n)) means f(n) grows at the same rate as g(n)
asymptotically.

Example: Merge sort has a time complexity of Θ(n log n), meaning its average,
best, and worst-case time complexities all grow at this rate.

Time Complexity:

Measures the amount of time an algorithm takes to run as a function of input


size.

Expressed using asymptotic notations to compare algorithms for large inputs.

Design and Analysis of Algorithm 3


Space Complexity:

Measures the amount of memory an algorithm requires as a function of input


size.

Also expressed using asymptotic notations.

Key Points:

Asymptotic notations focus on long-term behavior, not exact execution times for
specific inputs.

Big-O is most commonly used to describe worst-case complexity.

Understanding asymptotic notations is crucial for comparing algorithm


performance and choosing efficient algorithms for large-scale problems.

Additional Notes:

Big-O is often used informally to describe average-case complexity, even


though technically it only represents the upper bound.

Other less common notations include Little-o and Little-omega.

Example:
Consider the following algorithm to find the maximum value in an array:
Algorithm:

1. Initialize max_value to the first element of the array.

2. Iterate through the rest of the array, comparing each element to max_value .

3. If an element is greater than max_value , update max_value to that element.

4. Return max_value .

Time Complexity: O(n) (linear time)

The algorithm performs a constant amount of work for each element in the
array, leading to a linear relationship between time and input size.

Space Complexity: O(1) (constant space)

Design and Analysis of Algorithm 4


The algorithm uses a fixed amount of extra space regardless of input size,
primarily for storing max_value .

methods for solving recurrence relations:


1. Substitution Method:

Involves guessing a solution of a specific form and then proving it by


mathematical induction.

Steps:

Guess a solution of the form T(n) = θ(n^k) or T(n) = θ(log n).

Substitute the guess into the recurrence and try to solve for the constant
factors.

Use mathematical induction to verify the correctness of the guess.

Example:

Recurrence: T(n) = 2T(n/2) + n

Guess: T(n) = θ(n log n)

Substitution and induction prove the guess is correct.

2. Iteration Method (Unrolling the Recurrence):

Involves repeatedly substituting the recurrence into itself to expand the


expression and identify a pattern.

Steps:

Write out the recurrence for a few small values of n.

Look for a pattern in the terms and try to express it as a closed-form


solution.

Prove the pattern holds for all n using mathematical induction.

Example:

Recurrence: T(n) = T(n-1) + 1, with T(0) = 0

Design and Analysis of Algorithm 5


Iteration reveals T(n) = n, which can be proven by induction.

3. Recursion Tree Method:

Visualizes the recursive calls of an algorithm as a tree to analyze its cost.

Steps:

Draw a tree representing the recursive calls.

Label each node with the cost of the corresponding function call.

Sum the costs at each level of the tree to derive a recurrence for the total
cost.

Solve the recurrence using other methods.

4. Master Theorem:

A powerful tool for solving recurrences of the form T(n) = aT(n/b) + f(n), where a
≥ 1, b > 1, and f(n) is a polynomial function.

Provides a direct solution for common cases based on the relationship between
a, b, and f(n).

Cases:

If f(n) = O(n^(log_b(a) - ε)) for some ε > 0, then T(n) = Θ(n^(log_b(a))).

If f(n) = Θ(n^(log_b(a))), then T(n) = Θ(n^(log_b(a)) log n).

If f(n) = Ω(n^(log_b(a) + ε)) for some ε > 0, and af(n/b) ≤ cf(n) for some
constant c < 1 and all sufficiently large n, then T(n) = Θ(f(n)).

5. Generating Functions:

Advanced technique involving complex analysis and power series to solve


certain types of recurrences.

Not typically covered in introductory DAA courses.

Brief Review of Graphs:


What are graphs?

Design and Analysis of Algorithm 6


Graphs are non-linear data structures consisting of two main components:

Vertices (nodes): Represent entities or objects.

Edges (links): Represent relationships between the vertices.

Types of graphs:

Directed vs. Undirected:

Directed graphs: Edges have direction (e.g., one-way streets).

Undirected graphs: Edges have no direction (e.g., friendships).

Weighted vs. Unweighted:

Weighted graphs: Edges have associated weights (e.g., distances).

Unweighted graphs: Edges have no weights.

Graph Representations:

Adjacency matrix: A two-dimensional matrix where rows and columns


represent vertices, and entries signify presence/absence of edges.

Adjacency list: Each vertex has a list of connected vertices.

Basic Graph Operations:

Traversing a graph: Visiting each vertex once, using techniques like Depth-
First Search (DFS) or Breadth-First Search (BFS).

Finding shortest paths: Determining the path with the least cost (weight)
between two vertices. Dijkstra's algorithm is a popular example.

Minimum spanning tree: Finding a subset of edges connecting all vertices


with the minimum total weight. Prim's algorithm and Kruskal's algorithm are
common approaches.

Topological sorting: Ordering vertices in a directed acyclic graph such that for
every directed edge (u, v), u comes before v in the ordering.

Applications of Graphs:

Social networks: Modeling relationships between people.

Design and Analysis of Algorithm 7


Transportation networks: Representing roads, railways, etc.

Computer networks: Analyzing network connectivity.

Project management: Visualizing project dependencies.

Bioinformatics: Modeling gene interactions.

Further Exploration:

Explore different types of graphs like bipartite graphs, complete graphs, etc.

Learn about advanced graph algorithms like maximum flow, matching


algorithms, and graph coloring.

Consider graph libraries in your preferred programming language (e.g.,


NetworkX in Python).

Sets and Disjoint Sets, Union:


Sets:

A collection of distinct elements without order or repetition.

Represented by curly braces { }.

Example: {1, 3, 5, 7}, {apple, banana, orange}.

Disjoint Sets:

Two sets are disjoint if they have no element in common.

Example: {1, 3, 5} and {2, 4, 6} are disjoint, but {1, 3} and {2, 3} are not.

Union of Sets:

The union of two sets A and B is a new set containing all elements that are in A
or B or in both A and B (no duplicates).

Represented by A ∪ B.
Example: {1, 3, 5} ∪ {2, 4, 6} = {1, 2, 3, 4, 5, 6}.

Union of Disjoint Sets:

Design and Analysis of Algorithm 8


When dealing with disjoint sets, there's a slight distinction between the standard
union and the "disjoint union".

Standard union (as described above) would result in duplicate elements if any
exist in both sets.

Disjoint union keeps the elements distinct by adding an indicator to each


element, specifying which set it originally belonged to.

This is relevant in data structures like disjoint-set unions (union-find), where


maintaining distinct set representations is crucial.

Disjoint-Set Union (Union-Find):

A data structure efficiently managing a collection of disjoint sets.

Supports operations like merging sets, finding representatives of sets, and


checking if two elements belong to the same set.

Applications include connected components in graphs, equivalence classes,


and network connectivity analysis.

Union-Find in Disjoint-Set Union (DSU):


Key Operations:

1. MakeSet(x): Creates a new singleton set containing only element x.

2. Find(x): Determines the representative element (leader) of the set containing x.

3. Union(x, y): Merges the sets containing elements x and y.

Implementation:

Parent-Pointer Representation:

Each element is assigned a parent pointer.

The representative element of a set is the one whose parent points to itself.

Path compression is crucial for efficiency: During Find, update each node's
parent to point directly to the root, flattening the tree.

Time Complexity (Amortized):

Design and Analysis of Algorithm 9


MakeSet: O(1)

Find: O(α(n)), where α(n) is the inverse Ackermann function, practically


constant (less than 5 for all practical values of n).

Union: O(α(n))

Path Compression:

A technique that significantly improves the time complexity of Find.

During Find, updates each node's parent to point directly to the root, resulting in
flatter trees and faster subsequent Find operations.

Optimizations:

Union by Rank: Merges the smaller tree into the larger tree, keeping trees
balanced.

Path Compression: As mentioned above, critical for efficient Find operations.

Applications:

Finding connected components in graphs

Kruskal's algorithm for minimum spanning trees

Cycle detection in graphs

Equivalence class determination

Network connectivity analysis

Image processing (connected components labeling)

Social network analysis (clustering)

Many more

Example in Python:
Python

class
UnionFind:def

Design and Analysis of Algorithm 10


__init__(self, n):
self.parent = [i for i in
range(n)]

def
find(self, x):if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Pat
return self.parent[x]

def union(self, x, y):


xroot = self.find(x)
yroot = self.find(y)
if xroot != yroot:
self.parent[xroot] = yroot # Union by rank (opti

Key Points:

Union-Find is an efficient data structure for dynamic maintenance of disjoint


sets.

Amortized time complexity ensures excellent performance for most practical


applications.

Path compression and union by rank are essential optimizations.

DSU has widespread applications in graph algorithms, network analysis, and


various other domains.

Sorting Algorithms:
Bubble Sort:

Time Complexity: O(n^2) (worst and average case), O(n) (best case)

Space Complexity: O(1)

Insertion Sort:

Time Complexity: O(n^2) (worst and average case), O(n) (best case)

Design and Analysis of Algorithm 11


Space Complexity: O(1)

Selection Sort:

Time Complexity: O(n^2) (all cases)

Space Complexity: O(1)

Merge Sort:

Time Complexity: O(n log n) (all cases)

Space Complexity: O(n) (due to recursion or auxiliary array)

Quick Sort:

Time Complexity: O(n log n) (average case), O(n^2) (worst case)

Space Complexity: O(log n) (average case), O(n) (worst case)

Heap Sort:

Time Complexity: O(n log n) (all cases)

Space Complexity: O(1) (in-place sorting)

Searching Algorithms:

Linear Search:

Time Complexity: O(n) (all cases)

Space Complexity: O(1)

Binary Search:

Time Complexity: O(log n) (all cases)

Space Complexity: O(1)

Key Points:

Time complexity measures the algorithm's execution time as a function of input


size.

Space complexity measures the amount of memory used.

Design and Analysis of Algorithm 12


Best, average, and worst-case scenarios reflect performance under different
input conditions.

For large datasets, algorithms with O(n log n) or better time complexity are
generally preferred.

Algorithm choice depends on factors like input size, data structure, desired
performance, and memory constraints.

Additional Notes:

Sorting algorithms are often used to prepare data for efficient searching.

Hybrid algorithms combine techniques for better performance in specific cases


(e.g., Timsort).

Specialized sorting algorithms exist for particular data types (e.g., Counting Sort
for integers with a limited range).

Here are brief explanations of how each algorithm works:


Sorting Algorithms:

Bubble Sort:

Repeatedly iterates through the array, comparing adjacent elements and


swapping them if they are in the wrong order.

Larger elements "bubble" to the end of the array.

Insertion Sort:

Divides the array into sorted and unsorted parts.

Iterates through the unsorted part, inserting each element into its correct
position in the sorted part.

Selection Sort:

Repeatedly finds the minimum element in the unsorted part of the array and
swaps it with the element at the beginning of the unsorted part.

Merge Sort:

Design and Analysis of Algorithm 13


Divides the array into halves recursively until each subarray contains a
single element.

Merges the sorted subarrays back together, comparing elements and


placing them in the correct order.

Quick Sort:

Chooses a pivot element and partitions the array into elements smaller and
larger than the pivot.

Recursively sorts the subarrays on either side of the pivot.

Heap Sort:

Builds a max-heap from the array, ensuring that each parent node is greater
than or equal to its children.

Repeatedly extracts the maximum element (root of the heap) and places it
at the end of the array, rebuilding the heap after each extraction.

Searching Algorithms:

Linear Search:

Iterates through the array, comparing each element with the target value
until a match is found or the end of the array is reached.

Binary Search:

Requires a sorted array.

Repeatedly divides the search interval in half and compares the middle
element with the target value, narrowing down the search until the target is
found or the interval becomes empty.

Divide and conquer


Divide and conquer is a powerful algorithmic paradigm that involves breaking down
a large problem into smaller, similar subproblems, solving these subproblems
independently, and then combining the solutions to get the final answer. It's like

Design and Analysis of Algorithm 14


splitting a big task into smaller, easier-to-manage chunks and then assembling the
results to complete the whole job.
Here's a breakdown of the key steps involved:
1. Divide: Split the original problem into smaller subproblems of the same type.
This can be done recursively, meaning each subproblem can be further divided into
even smaller ones.

2. Conquer: Solve the smaller subproblems independently. These subproblems


should be easier to solve because they are smaller and simpler versions of the
original problem.

3. Combine: Combine the solutions to the subproblems to get the solution to the
original problem. This may involve merging or aggregating the results from each
subproblem.
The power of divide and conquer lies in its ability to handle complex problems by
breaking them down into manageable pieces. This can lead to improved efficiency,
particularly for problems where the cost of solving a subproblem is significantly less
than the cost of solving the original problem directly.
Here are some common examples of divide and conquer algorithms:

Merge Sort: Sorts a list by dividing it in half, recursively sorting each half, and
then merging the sorted halves.

Quick Sort: Chooses a pivot element and partitions the list into elements
smaller and larger than the pivot, then recursively sorts each partition.

Binary Search: Repeatedly divides the search space in half, comparing the
middle element with the target value, until the target is found or the search
space is empty.

Strassen's Matrix Multiplication: Breaks down matrix multiplication into


smaller subproblems and combines the results using specific formulas.

Benefits of Divide and Conquer:

Efficiency: Can lead to faster solving times for complex problems.

Design and Analysis of Algorithm 15


Parallelization: Many divide and conquer algorithms can be easily parallelized,
meaning they can be executed on multiple processors simultaneously.

Modular design: Makes the code easier to understand and maintain.

Challenges of Divide and Conquer:

Overhead: The repeated division and combination of subproblems can add


overhead.

Not all problems: Not all problems are suitable for a divide and conquer
approach.

Overall, divide and conquer is a versatile and powerful algorithmic technique


that can be applied to a wide range of problems. It's a valuable tool for any
programmer or algorithm designer to have in their toolkit.

Strassen's matrix multiplication algorithm and its


analysis:
Theory:

Divide and Conquer Approach: Breaks down matrix multiplication into smaller
subproblems, reducing the number of multiplications required compared to the
standard algorithm.

Recursively Divides Matrices: Divides two n x n matrices A and B into eight


n/2 x n/2 submatrices each.

Special Formulas: Combines the results of smaller matrix multiplications using


seven specific formulas, reducing the overall number of multiplications.

Algorithm:

1. Divide: Split matrices A and B into submatrices A11, A12, A21, A22, B11, B12,
B21, B22.

2. Conquer: Recursively compute seven products using Strassen's formulas:

P1 = (A11 + A22) * (B11 + B22)

P2 = (A21 + A22) * B11

Design and Analysis of Algorithm 16


P3 = A11 * (B12 - B22)

P4 = A22 * (B21 - B11)

P5 = (A11 + A12) * B22

P6 = (A21 - A11) * (B11 + B12)

P7 = (A12 - A22) * (B21 + B22)

3. Combine: Compute the final result matrix C using the seven products:

C11 = P1 + P4 - P5 + P7

C12 = P3 + P5

C21 = P2 + P4

C22 = P1 - P2 + P3 + P6

Complexity Analysis:

Time Complexity: O(n^2.81) (approximately), significantly faster than the


standard algorithm's O(n^3).

Space Complexity: O(n^2) due to the creation of temporary matrices.

Key Points:

First algorithm to achieve sub-cubic time complexity for matrix multiplication.

Not always practical due to overhead and constant factors.

More efficient for large matrices (n > 1000).

Often used in high-performance computing and theoretical research.

Additional Notes:

Strassen's algorithm is suitable for matrices with dimensions that are powers of
2.

It's a recursive algorithm, so it can be applied to matrices of any size by padding


them with zeros.

Design and Analysis of Algorithm 17


Various extensions and optimizations exist, such as Strassen-Winograd and
Coppersmith-Winograd algorithms.

Conclusion:
Strassen's algorithm demonstrates the power of divide and conquer to achieve
faster algorithms for complex problems, even though its practical application is
limited due to overhead and constant factors. It's a significant milestone in
theoretical computer science and has inspired further research into faster matrix
multiplication algorithms.

Unit 2
Greedy
The Greedy Method is a powerful algorithmic paradigm used to solve optimization
problems by making locally optimal choices at each step, hoping they will
cumulatively lead to the globally optimal solution. It's like navigating a maze by
always choosing the path that seems closest to the exit at each intersection, aiming
to reach the finish line as quickly as possible.
Here's a breakdown of the key features of the Greedy Method:
Key characteristics:

Locally optimal choices: Focuses on maximizing or minimizing immediate


gains at each step without considering the long-term impact on the overall
solution.

Ignores future consequences: Doesn't backtrack or revise decisions made


earlier, even if they lead to a suboptimal final outcome.

Simple and efficient: Often easy to implement and understand, with relatively
low computational cost.

Application areas:

Resource allocation: Scheduling tasks, assigning jobs to machines, allocating


bandwidth in networks.

Design and Analysis of Algorithm 18


Minimization problems: Finding the shortest path, minimum spanning tree,
optimal knapsack packing.

Maximization problems: Scheduling jobs for maximum profit, selecting


investments for maximum return.

Examples of Greedy Algorithms:

Prim's Algorithm: Finds a minimum spanning tree in a weighted graph by


choosing the minimum-weight edge at each step, ensuring all vertices are
connected with the minimal total weight.

Dijkstra's Algorithm: Finds the shortest path from a source node to all other
nodes in a weighted graph by iteratively relaxing edges and exploring the most
promising paths first.

Huffman Coding: Builds an optimal prefix code for symbols in a message by


repeatedly merging the two frequencies with the minimum sum, resulting in the
shortest average code length.

Challenges and limitations:

Not guaranteed optimal: Greedy choices can sometimes lead to suboptimal or


even infeasible solutions for certain problems.

Difficult to prove optimality: Analyzing the optimality of a greedy algorithm for


a specific problem can be challenging.

Not applicable to all problems: Works best for problems with well-defined
local optimality criteria and no dependencies between choices.

*Overall, the Greedy Method is a valuable tool for tackling optimization


problems, offering efficient and often near-optimal solutions. However, it's
important to understand its limitations and choose it for suitable problems.

Knapsack Problem
Theory:
The knapsack problem is a classic optimization problem where you must pack items
into a knapsack with a limited capacity (weight or volume) to maximize the total

Design and Analysis of Algorithm 19


value of the items included. Each item has its own weight and value, and you can
either take the entire item or leave it behind.

There are different types of knapsack problems, each with slightly different
variations and solution strategies:

0/1 Knapsack Problem: You can only include an item once (either 0 or 1 times)
in the knapsack.

Bounded Knapsack Problem: You can pack as many items as possible as


long as you stay within the weight limit, maximizing either the total value or
weight.

Unbounded Knapsack Problem: You can include any number of copies of


each item in the knapsack, aiming to maximize the total value with the limitation
of the weight.

Algorithm:
This section will focus on the Dynamic Programming approach for the 0/1
Knapsack Problem. It's a powerful and efficient technique for solving knapsack
problems of any size.
1. Dynamic Programming Table:
Create a 2D table dp with n rows (number of items) and W columns (knapsack
capacity). Each cell dp[i][j] represents the maximum value achievable using only
the first i items with a weight limit of j .

2. Filling the Table:

Start with dp[0][0] at 0, representing no items with no weight used.

For each item i and weight limit j :

If j < w_i (current item's weight exceeds the remaining capacity), dp[i][j]

= dp[i-1][j] (maximum value without the current item).

Otherwise, consider two options:

dp[i][j] = dp[i-1][j] (maximum value without the current item).

Design and Analysis of Algorithm 20


dp[i][j] = max(dp[i-1][j], dp[i-1][j - w_i] + v_i) (maximum value with
the current item, combining the value of the item v_i with the maximum
value achievable within the remaining capacity j - w_i ).

Fill the table iteratively using this logic.

3. Finding the Optimal Solution:

The final value in the table, dp[n][W] , represents the maximum achievable value
within the knapsack capacity.

To find the actual items included in the optimal solution, backtrack through the
table. Start at dp[n][W] , and for each previous item i :

If dp[i][j] == dp[i-1][j] , the item wasn't included. Move to dp[i-1][j] .

Otherwise, the item was included. Move to dp[i-1][j - w_i] .

Trace back this path to identify the items in the optimal solution.

Example:
Consider a knapsack with a capacity of 5 and three items:

Item Weight Value

A 3 6

B 2 5

C 1 6

Using the dynamic programming table approach, you'll find the optimal solution is to
include items A and C for a total value of 12.

Complexity Analysis:

Time Complexity: O(nW) due to iterating through the table with n rows and W

columns.

Space Complexity: O(nW) for storing the table values.

Additional Resources:

CLRS Introduction to Algorithms (3rd Edition), Chapter 35.

Design and Analysis of Algorithm 21


GeeksForGeeks 0/1 Knapsack Problem:
https://www.geeksforgeeks.org/videos/0-1-knapsack-problem-1/

YouTube video explaining Knapsack Problem with Dynamic Programming:


https://www.youtube.com/watch?v=xOlhR_2QCXY

Huffman Coding:
Theory:
Huffman coding is a lossless data compression technique that assigns variable-
length codes to symbols based on their frequency in a message. Frequent symbols
are assigned shorter codes, while less frequent symbols get longer codes. This
reduces the overall size of the encoded message compared to fixed-length coding
schemes.
Key Principles:

Frequency analysis: Analyze the message to determine the frequency of each


symbol.

Tree construction: Build a binary tree where each node represents a symbol
and its frequency.

Code assignment: Assign shorter codes to more frequent symbols by


traversing the tree from leaves (symbols) to the root.

Decoding: Use the constructed tree to decode the encoded message by


following the paths from the root to the leaves based on the received bit
sequence.

Algorithm:

1. Frequency Table: Create a table listing each symbol and its corresponding
frequency in the message.

2. Priority Queue: Build a priority queue with nodes representing each symbol,
with frequency as the priority (lower frequency = higher priority).

3. Tree Building: Repeatedly extract the two highest-priority nodes from the
queue. Combine them into a new parent node with a total frequency equal to

Design and Analysis of Algorithm 22


the sum of its children's frequencies. Add the new parent node to the queue.

4. Code Assignment: Assign shorter codes to more frequent symbols by


traversing the tree from leaves (symbols) to the root. At each node, assign a 0 if
you go left and a 1 if you go right.

5. Encoding: Replace each symbol in the message with its corresponding


Huffman code.

Example:
Consider a message with the following symbol frequencies:

A: 3

B: 2

C: 1

D: 4

1. Frequency Table:

Symbol Frequency

A 3

B 2

C 1

D 4

1. Tree Building:

A+B+C+D
/ \
D(8) A+B+C (6)
/ \
A (3) B+C (3)
/ \
B (2) C (1)

Design and Analysis of Algorithm 23


1. Code Assignment:

Symbol Huffman Code

D 0

A 10

B 110

C 111

1. Encoding:

The original message "ABCD" would be encoded as "011100101".


Complexity Analysis:

Time Complexity: O(n log n) due to priority queue operations and tree building.

Space Complexity: O(n) for the frequency table and tree.

Benefits:

Improves compression ratio for messages with non-uniform symbol


frequencies.

Can be used with any symbol set.

Relatively simple and efficient to implement.

Limitations:

Requires analyzing the entire message beforehand for frequency estimation.

Not suitable for messages with uniform symbol frequencies.

The performance depends on the actual symbol frequencies in the message.

Further Resources:

Introduction to Algorithms by Cormen et al. (Chapter 17)

Wikipedia article on Huffman coding:


https://en.wikipedia.org/wiki/Huffman_coding

Design and Analysis of Algorithm 24


https://www.youtube.com/watch?
v=co4_ahEDCho&pp=ygUHaHVmZm1hbg%3D%3D

Job Sequencing with Deadlines:


Theory:
Job sequencing with deadlines is a classic scheduling problem where you have a
set of jobs with associated deadlines and processing times. The objective is to
schedule these jobs on a single processor in a way that maximizes the number of
jobs completed before their deadlines.
Challenges:

Deadline constraints: Meeting deadlines is crucial, and exceeding them might


result in penalties or wasted resources.

Conflicting priorities: Balancing the desire to maximize completed jobs with


the need to adhere to deadlines can be tricky.

NP-hardness: For problems with large numbers of jobs and tight deadlines,
finding the optimal solution becomes computationally difficult.

Algorithms:
Several algorithms can be used to solve the job sequencing with deadlines
problem. Here are two common ones:
1. Greedy Algorithm:

Sort jobs in non-decreasing order of deadlines.

Schedule jobs in the order they appear in the sorted list, as long as they don't
exceed the current processor availability.

This approach is simple and efficient but may not always lead to the optimal
solution.

2. Dynamic Programming:

Build a table that stores the maximum number of jobs that can be completed
within each time slot while respecting deadlines.

Design and Analysis of Algorithm 25


Fill the table iteratively, considering all possible job combinations for each time
slot.

This approach is more complex than the greedy algorithm but guarantees an
optimal solution for smaller problems.

Complexity Analysis:

Greedy Algorithm: O(n log n) for sorting jobs and O(n) for scheduling, leading to
a total of O(n log n) time complexity.

Dynamic Programming: O(n^2) time complexity due to iterating over the table
and considering all possible job combinations.

Additional Considerations:

Job priorities: Some algorithms incorporate job priorities in addition to deadlines


for more nuanced scheduling.

Preemption: Allowing jobs to be interrupted and resumed later can improve


performance in some cases.

Approximation algorithms: For large-scale problems, approximation algorithms


can provide efficient solutions with guarantees on their quality.

Further Resources:

Introduction to Algorithms by Cormen et al. (Chapter 17)

MIT OpenCourseware: Job Sequencing with Deadlines:


https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-
2015/resources/lecture-1-course-overview-interval-scheduling/

GeeksForGeeks: Job Sequencing with Deadlines:


https://practice.geeksforgeeks.org/problems/job-sequencing-problem-
1587115620/1

Minimum Spanning Tree (MST)


Minimum Spanning Tree (MST) is a fundamental concept in graph theory and
computer science used to find the subset of edges in a connected, undirected graph

Design and Analysis of Algorithm 26


that connects all the vertices together without any cycles and with the minimum
possible total edge weight.

Properties of Minimum Spanning Tree:


1. Connectivity: The MST connects all vertices in the graph without forming
cycles.

2. Minimality: The total weight of the edges in the MST is minimized, ensuring the
sum of edge weights is as small as possible.

Algorithms for Finding Minimum Spanning Tree:


1. Kruskal's Algorithm:

It's a greedy algorithm that builds the MST by iteratively adding the shortest
edge that does not form a cycle.

It sorts all edges by weight and adds them one by one, ensuring that no
cycle is formed.

2. Prim's Algorithm:

Another greedy algorithm that starts with an arbitrary vertex and


incrementally grows the MST by adding the smallest edge that connects a
vertex in the MST to a vertex outside the MST.

It builds the MST one vertex at a time.

Steps for Finding Minimum Spanning Tree:


1. Initialization:

Initialize the MST with an empty set of edges or a single vertex.

2. Edge Selection:

Select the next edge that can be added to the MST without forming a cycle
and has the minimum weight.

3. Adding Edges:

Design and Analysis of Algorithm 27


Continuously add selected edges to the MST until all vertices are included
or the MST is complete.

Importance and Applications:


Minimum Spanning Trees have various practical applications, including network
design, circuit design, transportation networks, clustering algorithms, and more.

They help optimize costs and resources by finding the most efficient way to
connect all nodes in a network with minimum total edge weight.

Time Complexity:
Kruskal's Algorithm has a time complexity of (O(E log E)) using efficient data
structures for edge sorting.

Prim's Algorithm has a time complexity of (O(V^2)) using an adjacency matrix or


(O(E + V log V)) using a binary heap or Fibonacci heap with an adjacency list.

Understanding Minimum Spanning Trees and their associated algorithms is crucial


in network design and optimization problems, allowing for efficient connections
between various entities while minimizing costs.

Single Source Shortest Paths (SSSP) Problem


Given a weighted graph G = (V, E) and a source vertex s ∈ V, find the shortest
paths (i.e., paths with the minimum total weight) from s to all other vertices in
the graph.

Applications: Routing in networks, navigation systems, task scheduling,


resource allocation, etc.

Common SSSP Algorithms:

Breadth-First Search (BFS): For unweighted graphs.

Dijkstra's Algorithm: For graphs with non-negative edge weights.

Bellman-Ford Algorithm: For graphs with possibly negative edge weights (but
no negative-weight cycles).

Design and Analysis of Algorithm 28


Algorithm: Dijkstra's Algorithm

1. Initialization:

Mark all vertices as unvisited.

Create a distance array dist[] initialized to infinity, except for dist[s] = 0 .

Create a priority queue Q and insert the source vertex s with priority 0.

2. Main Loop:

While Q is not empty:

Extract the vertex u with the minimum distance from Q .

Mark u as visited.

For each neighbor v of u:

If v is not visited and dist[u] + weight(u, v) < dist[v] :

Update dist[v] = dist[u] + weight(u, v) .

Insert v into Q with priority dist[v] .

Example:

Consider a graph representing cities connected by roads with distances


(weights).

Find the shortest paths from Delhi to all other cities.

Complexity Analysis:

Time Complexity:

Using a simple priority queue (like an array): O(V^2)

Using a Fibonacci heap: O(E + V log V)

Space Complexity: O(V)

Further Classification:

Variants:

A* Search (heuristic-based)

Design and Analysis of Algorithm 29


Johnson's Algorithm (for all-pairs shortest paths)

Applications:

Routing in networks (IP routing, network management)

Navigation systems (GPS, maps)

Graph analysis (social networks, web graphs)

Resource allocation (scheduling, task prioritization)

Bellman-Ford Algorithm: Shortest Paths with


Negative Weights
Theory:

The Bellman-Ford algorithm finds the shortest paths from a single source vertex
s to all other vertices in a directed graph with potential negative edge
weights.

Unlike Dijkstra's algorithm, it can handle graphs with negative weights, but it
comes at the cost of being slower.

Applications: Network routing protocols, dynamic programming, resource


allocation problems with negative costs.

Algorithm:

1. Initialization:

Create a distance array dist[] initialized to infinity for all vertices except
the source s (set dist[s] = 0 ).

Set an iteration counter i to 0.

2. Relaxation Loop:

For i iterations (V-1 iterations at most):

For each edge (u, v) in the graph:

Update dist[v] = min(dist[v], dist[u] + weight(u, v)) .

Design and Analysis of Algorithm 30


3. Negative Cycle Detection (Optional):

After relaxation loop, run one more iteration.

If any distance is updated, the graph contains a negative weight cycle.

Example:

Consider the following graph:


A (0) --> B (5)
| |
6 8 C (12)
/ |
3 10 D (∞)

Find the shortest paths from vertex A to all other vertices.


Solution:

Iteration 1: dist[B] = min(∞, 0 + 5) = 5 , dist[C] = min(∞, 0 + 6) = 6 , dist[D] =

min(∞, 0 + 3) = 3 .

Iteration 2: dist[C] = min(6, 5 + 8) = 13 , dist[D] = min(3, 5 + 10) = 15 .

Iteration 3: No updates, indicating no negative cycle.

Therefore, the shortest paths are: A-B (5), A-C (6), A-D (15).
Complexity Analysis:

Time Complexity: O(V * E)

Space Complexity: O(V)

Further Classification:

Variants:

Bellman-Ford with path reconstruction

SPFA (Shortest Path Fast Algorithm) - a faster implementation for sparse


graphs

Applications:

Distance-vector routing protocols (RIP)

Design and Analysis of Algorithm 31


Stock market analysis

Game theory

Additional Notes:

The Bellman-Ford algorithm can detect negative weight cycles in the graph.

It is slower than Dijkstra's algorithm for graphs with non-negative weights.

However, it is the only viable option for graphs with negative weights (assuming
no negative weight cycles).

Backtracking: Exploring Solutions One Path at a


Time
Backtracking is a powerful algorithmic technique for solving constraint satisfaction
problems. It works by systematically exploring all possible combinations of choices
to find solutions that satisfy a set of constraints.

General Method:

1. Start with a candidate solution: Begin with an initial partial solution (or no
solution at all).

2. Make a choice: Explore one possible option to extend the current solution
further.

3. Recursively explore: If the chosen option doesn't violate any constraints,


continue recursively with steps 1 and 2 to build upon the solution.

4. Backtrack: If the chosen option leads to a dead end (violates constraints),


undo the choice and explore alternative options from the previous step.

5. Success or failure: Keep exploring branches until a complete solution


satisfying all constraints is found (success), or all possibilities have been
exhausted (failure).

Key Aspects:

Choices and constraints: The algorithm needs clearly defined choices (e.g.,
placing a queen on a chessboard) and constraints (e.g., queens cannot attack

Design and Analysis of Algorithm 32


each other).

Recursive exploration: Backtracking is often implemented as a recursive


function that explores each branch of the solution space.

Dead ends: Recognizing dead ends early helps avoid unnecessary exploration
and improves efficiency.

Data structures: Stacks and queues are commonly used to manage the
exploration state and backtrack when needed.

Examples:

N-Queens problem: Placing N queens on a chessboard such that no two


queens can attack each other.

Maze solving: Finding a path from the start to the end of a maze without hitting
walls.

Sudoku: Filling in the missing numbers in a Sudoku puzzle such that each row,
column, and 3x3 subgrid contains all digits from 1 to 9 exactly once.

General Method Variations:

Depth-first search: Explores all possibilities down one branch before


backtracking and trying another.

Breadth-first search: Explores all options at the current level before going
deeper.

Branch and bound: Uses additional heuristics to prune less promising


branches early on.

Overall, backtracking is a versatile and powerful technique for tackling


complex problems with logical rules and constraints. Understanding its
general method and common variations can equip you to solve a wide range
of challenging problems!

The 8 Queens Problem: Backtracking in Action

Design and Analysis of Algorithm 33


The 8 queens problem is a classic example of constraint satisfaction and a perfect
demonstration of the backtracking algorithm in action. Let's delve into its details:

The Problem:
Place 8 queens on an 8x8 chessboard so that no two queens can attack each other
(diagonally, horizontally, or vertically).

Solution Using Backtracking:

1. Start with an empty board: This is your initial partial solution.

2. Place the first queen: Pick any square on the board (e.g., row 1, column 1).

3. Recursively explore:

Place the second queen on a valid square: Any square that doesn't attack
the first queen (no row, column, or diagonal overlap).

Repeat step 3 for queens 3 to 8, always ensuring no queen attacks existing


ones.

4. Backtrack: If placing a queen leads to a situation where no valid squares


remain or queens attack each other, undo the last placement and try a different
square for the previous queen.

5. Success or failure: If you successfully place all 8 queens without violating


constraints, you've found a solution! Otherwise, backtrack until all possibilities
have been exhausted (no solutions possible).

Example:

Place the first queen in (1,1).

For the second queen, consider (2,3) (valid), (3,1) (invalid - attacks first queen),
etc. until you find a valid placement.

Continue placing queens recursively, backtracking when you hit dead ends.

If you successfully place all 8 queens without conflicts, you've found a solution!

Implementation Tips:

Design and Analysis of Algorithm 34


Use a boolean 2D array to track occupied squares and avoid placing queens on
them.

Employ techniques like pruning invalid squares beforehand to improve


efficiency.

Variations:

Try finding all possible solutions for the 8 queens problem (there are 92!).

Modify the problem to accommodate different board sizes or queen numbers.

The 8 queens problem showcases the power of backtracking in a clear and


engaging way. Understanding its solution approach can equip you to tackle
other constraint satisfaction problems with different rules and complexities.

The backtracking algorithm for the 8 queens problem exhibits both strengths and
weaknesses in terms of its time and space complexity:

Time Complexity:

The worst-case scenario involves exhaustive exploration of all invalid and valid
placements before finding a solution. On an n x n chessboard with n
queens, the number of such placements is approximately n^(n-1).

Consequently, the worst-case time complexity is O(n^(n-1)).

However, in practice, finding a solution usually takes much fewer


attempts, especially with optimizations like pruning invalid squares.

The average-case time complexity can be significantly lower, closer to O(n *


log(n)) or even better with more advanced techniques.

Space Complexity:

Backtracking typically uses a stack to store the states (placements) during


exploration.

In the worst case, the entire solution path up to the final queen needs to be
stored, leading to a space complexity of O(n).

However, as solutions are usually found earlier, the actual space usage is often
less and depends on the specific exploration path.

Design and Analysis of Algorithm 35


Graph Coloring with Backtracking: Exploring Colors
One Vertex at a Time
Graph coloring is a fascinating problem where you assign colors to the vertices of a
graph such that no two adjacent vertices share the same color. In this specific
example, we'll explore how backtracking can be used to efficiently solve this
problem.

Algorithm:

1. Prepare: Define the graph with its vertices and edges. Choose a set of
available colors (m colors).

2. Explore: Recursively visit each vertex in the graph:

For the current vertex, try each available color (1 to m):

Check if assigning the chosen color is valid (no adjacent vertices with
the same color).

If valid, recursively color the remaining vertices.

If coloring all vertices successfully leads to a complete (and valid)


coloring, backtrack and try another color for the current vertex.

If no valid color can be assigned to the current vertex, backtrack and try
different colorings for the previous vertex.

3. Success or Failure:

If all vertices are colored without violating constraints, you've found a


solution!

If all options have been exhausted without finding a valid coloring, the graph
is not m-colorable.

Example:
Consider a graph with 4 vertices and 3 available colors (red, green, blue). You start
with the first vertex and try each color, checking if its adjacent vertices already have
that color. If not, you recursively color the rest of the graph with this chosen color. If
everything works out, you've found a solution. Otherwise, backtrack and try different

Design and Analysis of Algorithm 36


colors for the first vertex and repeat the process until you either find a solution or
exhaust all possibilities.
Analysis:

Time Complexity: O(m^(V-1)) in the worst case, where m is the number of


colors and V is the number of vertices. This scenario involves trying all color
combinations for each vertex. However, practical cases often find solutions
much faster with backtracking's ability to prune invalid options early.

Space Complexity: O(V) due to the stack used to store the current state (color
assignments) during exploration.

Strengths: Backtracking is easy to understand and implement, flexible for


different color sets and graph sizes, and can help detect if a graph is even m-
colorable.

Weaknesses: Can be computationally expensive for large graphs or high color


numbers due to the exponential worst-case complexity. Alternative algorithms
like greedy coloring or simulated annealing might offer better efficiency for
specific cases.

Further Notes:

Backtracking can be combined with heuristics to prioritize promising color


combinations and improve search efficiency.

Additional constraints like specific color preferences for certain vertices can be
incorporated into the coloring process.

Graph coloring applications range from scheduling problems to resource


allocation and register allocation in compilers.

Hamiltonian Cycles and analysis using backtracking:


Hamiltonian Cycle:

A Hamiltonian cycle in a graph is a closed loop that visits every vertex exactly
once and returns to the starting vertex.

Design and Analysis of Algorithm 37


Applications: Traveling salesman problem (TSP), circuit design, job sequencing,
DNA sequencing.

Algorithm (Backtracking):

1. Initialization:

Mark all vertices as unvisited.

Create an empty path to store the vertices visited so far.

2. Recursive Exploration:

If all vertices are visited and the current vertex is adjacent to the starting
vertex, a Hamiltonian cycle is found.

Mark the current vertex as visited.

Add the current vertex to the path.

For each unvisited neighbor of the current vertex:

Recursively call the function to explore further.

Backtrack:

Remove the current vertex from the path.

Mark the current vertex as unvisited.

Example:

Consider a graph with vertices {A, B, C, D} and edges {(A, B), (A, C), (B, C), (B, D),
(C, D)}.

Starting from A, the algorithm might explore paths like A-B-C-D-A (a


Hamiltonian cycle) or A-C-B-D (not a cycle).

Complexity Analysis:

Time Complexity: O(n!), where n is the number of vertices. In the worst case,
the algorithm tries all possible permutations of vertices.

Space Complexity: O(n), due to the recursive calls and the path array.

Limitations and Considerations:

Design and Analysis of Algorithm 38


Backtracking can be computationally expensive for large graphs due to the
exponential worst-case time complexity.

It's not guaranteed to find a Hamiltonian cycle even if one exists, as it might get
stuck in parts of the graph that don't lead to a solution.

For large graphs, it's often impractical to find Hamiltonian cycles using
backtracking. Heuristic or approximation algorithms might be more suitable for
real-world applications.

Further Notes:

Backtracking can be optimized by pruning branches early based on graph


properties and heuristics.

Other approaches for Hamiltonian cycle detection include dynamic


programming and approximation algorithms.

The Hamiltonian cycle problem is NP-complete, meaning there's no known


polynomial-time algorithm to solve it efficiently for all graphs.

Unit 3
DP
Dynamic programming (DP) is a powerful technique for solving optimization
problems that exhibit optimal substructure and overlapping subproblems.
Here's a breakdown of the key concepts:
What is Dynamic Programming?

A technique for solving complex problems by breaking them down into smaller,
overlapping subproblems and storing the solutions to these subproblems to
avoid redundant computation.

Imagine climbing stairs: instead of calculating how many ways to reach each
step from the bottom each time, DP stores the solutions for each step, making
computations for higher steps efficient.

Key Characteristics:

Design and Analysis of Algorithm 39


Optimal substructure: The overall solution can be constructed from optimal
solutions to its subproblems.

Overlapping subproblems: Subproblems are reused multiple times during the


solution process.

Benefits of DP:

Reduced time complexity: Memorizing subproblem solutions avoids repetitive


calculations, leading to efficient solutions for large problems.

Clear and elegant solutions: DP often leads to intuitive and easy-to-


understand solutions compared to other methods.

Examples of DP Algorithms:

Fibonacci sequence: Calculating subsequent terms by storing previously


calculated ones.

Shortest path problems: Finding the shortest path between two points in a
graph by storing minimum distances to intermediate nodes.

Knapsack problem: Deciding which items to fill a knapsack with limited


capacity for maximum value, by storing optimal values for sub-capacities.

Key Steps in DP:

1. Identify subproblems: Break down the original problem into smaller,


overlapping subproblems.

2. Define memoization table: Create a table to store the solutions to the


subproblems.

3. Build the table: Fill the table bottom-up or top-down, solving subproblems and
utilizing values already stored.

4. Construct the solution: Use the stored subproblem solutions to build the
optimal solution for the original problem.

Overall, dynamic programming is a valuable tool for solving a wide range of


optimization problems with overlapping substructure. Its ability to memoize

Design and Analysis of Algorithm 40


subproblems and avoid redundant computations makes it an efficient and
elegant approach to complex challenges.

Ingredients of Dynamic Programming


When crafting a dish called "Dynamic Programming," several essential ingredients
make up its recipe for success:
1. Optimal Substructure:

This is the bedrock of DP. The problem must be able to be decomposed into
smaller subproblems, where the optimal solution for the whole can be
assembled by combining the optimal solutions of its parts. Think of a jigsaw
puzzle; each piece contributes to the complete picture.

2. Overlapping Subproblems:

Imagine making many batches of cookies - you wouldn't re-invent the wheel for
each batch. Similarly, subproblems in DP often appear repeatedly throughout
the solution process. Recognizing and efficiently storing their solutions avoids
redundant calculations, saving time and effort.

3. Memoization/Tabulation:

This acts as the pantry of DP - a data structure holding previously calculated


solutions. Instead of recomputing these subproblems, we retrieve them from the
"pantry" for swift reuse. Memoization stores solutions based on specific inputs,
while tabulation builds the table of solutions for all possible subproblems
beforehand.

4. Recursion or Iterative Approach:

These are the cooking methods of DP. Recursion breaks down the problem into
smaller calls of itself, eventually reaching the base case and building up the
solution. Iteration, on the other hand, systematically fills the memoization table
or calculates solutions for all subproblems in a loop.

5. Base Cases:

Design and Analysis of Algorithm 41


These are the pre-made ingredients, the simplest subproblems for which
solutions are readily known. Having defined base cases allows the recursion or
iteration to proceed smoothly, reaching the final product from these building
blocks.

6. Efficiency Trade-Offs:

While DP avoids redundant computations, it comes at the cost of additional


memory to store solutions. Choosing the right memoization technique and
optimizing space usage are crucial for maintaining efficiency.

By carefully blending these ingredients, you can concoct a powerful DP algorithm


that tackles complex optimization problems with elegance and efficiency.
Remember, understanding these core elements is key to mastering the art of
dynamic programming!

Matrix Chain Multiplication: Optimizing


Multiplications with Dynamic Programming
The matrix chain multiplication problem aims to find the most efficient way to
multiply a sequence of matrices, minimizing the total number of scalar
multiplications performed. Enter dynamic programming (DP) to the rescue! Here's
how it goes:
Problem:

Given a sequence of matrices A₁ through Aₙ, with dimensions (m₁xn₁) for A₁,
(n₁xn₂) for A₂, and so on, find the order in which to multiply them such that the
total number of scalar multiplications is minimized.
Naive Approach:

Multiplying matrices straightforwardly can lead to unnecessary computations. For


example, multiplying (AB)C involves (m₁n₁) * (n₁n₂) * (n₂n₃) multiplications, while
multiplying A(BC) involves (m₁n₂) * (n₂n₃) * (n₃n₄). In some cases, these orders
can differ significantly in their number of multiplications.

Dynamic Programming Solution:

Design and Analysis of Algorithm 42


1. Define subproblems: Consider subproblems M[i,j] representing the minimum
number of multiplications needed to multiply matrices Ai through Aj (inclusive).

2. Base cases: M[i,i] = 0 (single matrix doesn't require multiplication).

3. Recurrence relation: For i < j, consider all possible split points k (i ≤ k < j) and
define:
M[i,j] = min(M[i,k] + M[k+1,j] + m_i * m_k * m_j) for all k between i and j-1

This equation essentially checks the minimum cost of multiplying Ai through Ak and
Ak+1 through Aj, then adding their cost to the cost of multiplying Ak with the
resulting matrices.

1. Memoization: Store the calculated M[i,j] values in a table to avoid redundant


computations.

2. Construct the optimal order: Backtrack through the table from M[1,n] to
identify the split points k that contributed to the minimum cost. This reveals the
optimal order of multiplication.

Complexity Analysis:

Time complexity: O(n³) due to iterating through all subproblems.

Space complexity: O(n²) for storing the M table.

Benefits:

Finds the optimal multiplication order for any sequence of matrices.

Efficient solution compared to naive approaches, especially for large matrices.

Easy to understand and implement with clear subproblems and recurrence


relations.

Extensions:

Strassen's algorithm offers potentially faster multiplication for specific matrix


sizes.

Parallelization techniques can optimize computations further.

Design and Analysis of Algorithm 43


Overall, dynamic programming provides a powerful and efficient approach to
solving the matrix chain multiplication problem. Its ability to identify and
reuse optimal solutions for subproblems makes it a valuable tool for tackling
a wide range of optimization problems.

Longest Common Subsequence (LCS): Dynamic


Programming for Matching Sequences
The Longest Common Subsequence (LCS) problem seeks to find the longest
sequence of characters that appears in two given strings while maintaining their
relative order. Dynamic programming (DP) proves itself again as a powerful tool for
solving this efficiently.
Problem:

Given two strings X and Y, find the longest subsequence that is common to both.

Naive Approach:
Brute-forcing all possible subsequences of X and comparing them to Y is inefficient
and impractical for long strings. This approach leads to exponential time complexity.

DP Solution:

1. Define subproblems: Let L(i,j) represent the length of the LCS for the prefixes
of X up to index i and Y up to index j.

2. Base cases: L(0,0) = 0 (empty prefixes have no common subsequence).

3. Recurrence relation: Consider the characters X[i] and Y[j]:

If X[i] == Y[j]: L(i,j) = L(i-1,j-1) + 1 (include the matching character).

If X[i] != Y[j]: L(i,j) = max(L(i-1,j), L(i,j-1)) (take the longer subsequence from
either neglecting X[i] or Y[j]).

1. Memoization: Store the calculated L(i,j) values in a table to avoid redundant


computations.

2. Construct the LCS: Backtrack through the table from L(m,n) (where m and n
are the lengths of X and Y, respectively) to identify the matching characters that
contribute to the LCS. This reveals the longest common sequence.

Design and Analysis of Algorithm 44


Complexity Analysis:

Time complexity: O(mn) due to iterating through all subproblems in the table.

Space complexity: O(mn) for storing the L table.

Benefits:

Finds the optimal LCS length and sequence for any pair of strings.

Efficient compared to the naive approach, especially for large strings.

Easy to understand and implement with clear subproblems and recurrence


relations.

Extensions:

Finding multiple LCSs of equal length.

Incorporating weights on characters for weighted LCS.

Applying suffix arrays or other data structures for space optimization.

*Overall, DP offers a fast and effective solution for the LCS problem. Its ability
to dynamically solve subproblems and reuse their solutions makes it a powerful
tool for string comparison and related optimization tasks.

Optimal Binary Search Trees (OBST) using Dynamic


Programming
Problem:

Given a set of keys with associated search probabilities, construct a binary search
tree (BST) that minimizes the average search cost (expected number of
comparisons needed to find a key).
Dynamic Programming Approach:

1. Define Subproblems:

Let E[i, j] represent the minimum expected search cost for a BST
containing keys i to j (inclusive).

2. Base Cases:

Design and Analysis of Algorithm 45


E[i, i] = 0 (single key tree has no search cost).

3. Recurrence Relation:

For i < j :
E[i, j] = min(k=i to j-1) { E[i, k] + E[k+1, j] + w(i, j) }

w(i, j) represents the weighted external path length for keys i to j .

The formula explores all possible root nodes k and chooses the one
minimizing the overall cost.

4. Memoization Table:

Store E[i, j] values to avoid redundant computations.

5. Construction:

Backtrack through the table to determine the optimal root choices for each
subproblem, constructing the OBST.

Complexity Analysis:

Time Complexity: O(n³) due to iterating through all subproblem combinations.

Space Complexity: O(n²) for storing the E table.

Additional Considerations:

Weighted External Path Length: Sum of path lengths from the root to each
external node, weighted by their search probabilities.

Variations:

Weighted internal path length for different operation costs.

Constraints on tree structure or node placement.

Benefits of Dynamic Programming for OBST:

Efficiently finds the optimal tree structure for any set of keys and probabilities.

Avoids exhaustive search through all possible BSTs.

Provides a clear and structured approach to the problem.

Design and Analysis of Algorithm 46


Applications:

Data structures, information retrieval, code optimization, compiler design, and


more.

Overall, dynamic programming offers a powerful solution to the OBST


problem, demonstrating its effectiveness in tackling optimization problems
with overlapping substructure and optimal subproblems.

0-1 Knapsack Problem: Finding the Max Loot with


Dynamic Programming
The 0-1 knapsack problem is a classic optimization challenge where you must fill a
knapsack with maximum value items, subject to a weight limit. Each item has its
own weight and value, and you can either include it or exclude it completely (hence
the "0-1" designation). Dynamic programming comes to the rescue in finding the
optimal solution efficiently.

Problem:
Given a set of N items with weights w[i] and values v[i] , and a knapsack capacity
of W , find the subset of items that maximizes the total value within the weight
constraint.

Dynamic Programming Solution:

1. Define Subproblems: Let dp[i][j] represent the maximum value achievable


using the first i items with a remaining capacity of j .

2. Base Cases:

dp[0][j] = 0 (no items used).

dp[i][0] = 0 (no capacity available).

3. Recurrence Relation:

Consider item i :

dp[i][j] = max(dp[i-1][j], dp[i-1][j-w[i]] + v[i])

The first term represents ignoring item i .

Design and Analysis of Algorithm 47


The second term includes item i if its weight allows (j-w[i]) and adds
its value.

4. Memoization Table:

Store dp[i][j] values to avoid redundant calculations.

5. Optimal Solution:

dp[N][W] holds the maximum achievable value.

Backtracking (Optional):

Traverse the table backwards to identify the chosen items in the optimal
solution.

Complexity Analysis:

Time Complexity: O(NW) due to iterating through the subproblems.

Space Complexity: O(NW) for storing the dp table.

Benefits:

Efficiently finds the optimal solution compared to brute-forcing all possible


combinations.

Easy to understand and implement with clear subproblems and recurrence


relations.

Extensible to variations like fractional knapsack problems.

Applications:

Resource allocation, project scheduling, investment planning, cargo loading,


and more.

Overall, dynamic programming proves its power in tackling the 0-1 knapsack
problem. Its ability to optimize subproblems and reuse their solutions makes
it a valuable tool for a wide range of optimization challenges.

Traveling Salesman Problem (TSP): Dynamic


Programming in a Maze of Routes

Design and Analysis of Algorithm 48


The traveling salesman problem (TSP) is a notorious optimization challenge: Given
a set of cities and the distances between them, find the shortest possible route that
visits each city exactly once and returns to the starting point. While dynamic
programming (DP) can't guarantee an optimal solution for TSP in all cases, it offers
powerful tools for tackling specific instances and gaining insights into the problem's
structure.
Challenges in Applying DP to TSP:

NP-hard problem: Solving TSP for large sets of cities becomes


computationally intractable as the number of possible routes grows
exponentially.

Subproblems not necessarily optimal: Selecting the shortest sub-route


between a subset of cities may not lead to the shortest overall route.

Approaches using DP:

1. Held-Karp Algorithm: Applies DP to solve the metric TSP (symmetric


distances between cities). It iteratively constructs the shortest tours for subsets
of cities, utilizing memoization to avoid redundant computations. While not
guaranteed to find the optimal solution, it often provides highly efficient
approximations.

2. Subtour Elimination Techniques: Augment DP with rules to avoid forming


subtours (cycles that visit some but not all cities). This helps prevent suboptimal
choices that lead to invalid or unnecessarily long paths.

Analysis:

Efficiency: DP provides significant improvements over brute-forcing all possible


routes, especially for smaller city sets.

Approximation: While not always optimal, DP-based algorithms can offer near-
optimal solutions within an acceptable time frame.

Insights: Studying how DP algorithms build solutions can reveal patterns and
properties of the TSP landscape, aiding in the development of heuristic or
approximation algorithms.

Design and Analysis of Algorithm 49


Limitations:

Computational complexity: Even with DP, solving TSP for large graphs
remains computationally expensive.

Limited guarantee of optimality: The Held-Karp and other DP-based


algorithms can't guarantee finding the absolute shortest path for all TSP
instances.

Overall, dynamic programming offers a powerful approach for tackling the


traveling salesman problem in specific contexts. By leveraging its ability to
optimize subproblems and avoid redundant computations, DP helps identify
efficient routes and gain valuable insights into the complex world of TSP
solutions. However, it's important to recognize its limitations and explore
alternative solution methods for large-scale or non-metric TSP instances.

Floyd-Warshall Algorithm:
Problem: Finds all-pairs shortest paths in a weighted graph, meaning the shortest
path between every pair of vertices.
Key Features:

Handles both directed and undirected graphs.

Can accommodate negative edge weights (but not negative cycles).

Efficient dynamic programming approach with time complexity O(n³).

Algorithm:

1. Initialization:

Create a distance matrix D where D[i][j] represents the current shortest


distance from vertex i to vertex j . Initialize with direct edge weights if
present, or infinity for unreachable pairs.

2. Iterative Updates:

For each intermediate vertex k (from 0 to n-1):

For every pair of vertices i and j (from 0 to n-1):

Design and Analysis of Algorithm 50


Check if going through k improves the distance:

D[i][j] = min(D[i][j], D[i][k] + D[k][j])

Analysis:

Time Complexity: O(n³), due to iterating through all triples of vertices.

Space Complexity: O(n²), for storing the distance matrix.

Key Points:

It gradually improves the shortest paths by considering intermediate vertices.

It can detect negative cycles if any D[i][i] becomes negative during the
process.

It's often a preferred choice for dense graphs with many edges.

Applications:

Route planning in transportation networks.

Finding arbitrage opportunities in financial markets.

Detecting transitive relationships in social networks.

Computing reachability in graph analysis.

Strengths:

Efficient for dense graphs with positive or negative edge weights.

Finds all-pairs shortest paths in a single algorithm run.

Detects negative cycles.

Limitations:

Time complexity can become significant for large graphs.

Not suitable for graphs with extremely sparse connections.

In conclusion, the Floyd-Warshall algorithm is a valuable tool for solving all-


pairs shortest path problems in weighted graphs. Its dynamic programming
approach effectively leverages intermediate vertices to find optimal paths,

Design and Analysis of Algorithm 51


making it applicable to a wide range of optimization and graph analysis tasks.
Understanding its strengths and limitations is crucial for choosing the
appropriate algorithm for specific graph problems.

Branch and Bound: Exploring Paths to Optimality


with Pruning Power
Branch and Bound (B&B) is a powerful optimization technique used to solve
complex problems with well-defined search spaces and objective functions. It works
by systematically exploring the search space, dividing it into smaller subproblems,
and discarding non-promising branches based on bounds or estimates of their
potential solutions.

Key Features:

Divide-and-conquer approach: Breaks down the main problem into smaller,


easier-to-handle subproblems.

Bounds and pruning: Utilizes upper and lower bounds on the objective
function to estimate the potential solutions of subproblems and discard those
unlikely to yield the optimal result.

Exploration and backtracking: Explores promising subproblems recursively


and backtracks if they turn out to be worse than previously found solutions.

Method:

1. Start with the original problem: Define the search space, objective function,
and initial bounds.

2. Divide the search space: Generate subproblems by branching (e.g., splitting a


continuous space into intervals, choosing options for discrete variables).

3. Calculate bounds: Evaluate the objective function or estimate upper and lower
bounds for each subproblem.

4. Prune non-promising branches: Discard subproblems whose bounds cannot


improve the current best solution.

Design and Analysis of Algorithm 52


5. Explore promising branches: Explore subproblems with tighter bounds or
potential for improvement recursively.

6. Backtrack and update: Backtrack if a subproblem doesn't lead to a better


solution and update the current best solution as needed.

7. Stop when there are no more promising branches: The current best solution
is the optimal solution (within the accuracy of the bounds).

Complexity Analysis:
The complexity of B&B depends heavily on the problem, the effectiveness of its
branching and bounding strategies, and the size of the search space.

Worst-case: O(b^n), where b is the branching factor (number of subproblems


created per level) and n is the depth of the search tree. This can be exponential
for problems with large branching factors and depths.

Best-case: O(n log n) if effective pruning eliminates most of the search space.

Strengths:

Guaranteed optimal solution: If implemented correctly, B&B finds the globally


optimal solution within the accuracy of the bounds.

Flexibility: Applicable to various problem types with defined search spaces and
objective functions.

Heuristics can improve efficiency: Combining B&B with good heuristics can
significantly reduce the search space and improve efficiency.

Limitations:

Can be computationally expensive: Worst-case complexity can be prohibitive


for large problems.

Effectiveness depends on bounds: Tight bounds are crucial for efficient


pruning and finding the optimal solution.

May not find feasible solutions: For problems with no feasible solutions, B&B
can get stuck exploring infeasible branches.

Design and Analysis of Algorithm 53


Overall, Branch and Bound is a versatile and powerful optimization technique
that can find optimal solutions for complex problems. Its ability to
systematically explore the search space and discard non-promising branches
makes it a valuable tool for a wide range of optimization challenges.

0/1 Knapsack Problem with Branch and Bound


The 0/1 knapsack problem involves finding the subset of items with maximum value
that can fit within a knapsack with a limited weight capacity. Each item can either be
included or excluded entirely (hence the "0/1" designation). Branch and Bound
offers a powerful approach to solving this problem by systematically exploring the
solution space and pruning non-promising branches.

Branching Strategy:

Each level of the branch and bound tree represents a decision point: include or
exclude an item.

The left branch corresponds to including the item and the right branch to
excluding it.

Bounding Function:

To prune branches effectively, we need a lower bound on the maximum value


achievable from any subtree.

A simple bound can be the sum of the values of the remaining items that
haven't been included yet.

We can also use more sophisticated bounds like fractional knapsack values to
tighten the bound and prune more branches.

Pruning Criteria:

A branch can be pruned if its lower bound is already less than the current best
solution (maximum value found so far).

This guarantees that no further exploration down that branch can lead to a
better solution.

Algorithm Outline:

Design and Analysis of Algorithm 54


1. Initialize the current best solution to 0 and the remaining capacity to the
knapsack capacity.

2. Explore the topmost level of the branch and bound tree, creating two branches
for each item.

3. For each branch:

Add the item's value to the current value if included.

Decrease the remaining capacity by the item's weight.

Update the lower bound using the chosen strategy.

If the lower bound is less than the current best solution, prune the branch.

Otherwise, recursively explore the sub-branches (children) of the current


branch.

4. Backtrack when all branches at a level have been explored.

5. The current best solution at the end of the exploration is the optimal solution for
the knapsack problem.

Advantages:

Guarantees finding the optimal solution within the bounds used.

Can be efficient with good bounding functions and early pruning.

Disadvantages:

Can be computationally expensive for large problems with many items.

Depends on the effectiveness of the chosen bounding function.

Overall, Branch and Bound offers a powerful approach to solving the 0/1
knapsack problem, especially for smaller instances or with good bounding
functions. Its ability to systematically explore and prune the search space
helps guarantee optimality, making it a valuable tool for discrete optimization
problems.

Design and Analysis of Algorithm 55


Branch and Bound for the Traveling Salesperson
Problem (TSP): Exploring Routes with Pruning
Power
The traveling salesperson problem (TSP) poses a notorious optimization challenge:
finding the shortest route that visits each city in a network exactly once and returns
to the starting point. Branch and Bound (B&B) offers a powerful, albeit not
guaranteed to find the absolute shortest path, approach to tackle this complex
problem.
Branching Strategy:

Each level of the B&B tree represents a partial tour visited so far.

Branches are created by choosing the next city to visit from the unvisited ones.

This leads to an exponential explosion of branches as the number of cities


increases.

Bounding Function:

Estimating the remaining distance to complete the tour is crucial for pruning
non-promising branches.

Commonly used bounds include:

Minimum spanning tree (MST) bound: Sum of the shortest edges


connecting all unvisited cities.

Christofides' bound: Adds twice the weight of the MST to the shortest
Hamiltonian cycle found in a modified graph.

Pruning Criteria:

A branch is pruned if its lower bound on the remaining distance is already


greater than the current best solution (shortest tour found so far).

This avoids exploring potentially worse paths and speeds up the search.

Algorithm Outline:

Design and Analysis of Algorithm 56


1. Initialize the current best solution to an initial solution (e.g., first city to all other
cities and back).

2. Explore the topmost level of the B&B tree, creating branches for each remaining
city.

3. For each branch:

Add the distance to the chosen city to the current distance.

Update the lower bound using the chosen strategy.

If the lower bound is greater than the current best solution, prune the
branch.

Otherwise, recursively explore the sub-branches (children) representing


visiting further cities.

4. Backtrack when all branches at a level have been explored.

5. The current best solution at the end of the exploration is the optimal solution
found using B&B.

Advantages:

Can guarantee finding a good, though not necessarily the absolute shortest,
solution.

Effective with strong bounding functions like Christofides' bound.

Disadvantages:

Computationally expensive due to the vast search space and numerous


potential branches.

Not guaranteed to find the optimal solution for all TSP instances.

Overall, Branch and Bound, coupled with effective bounding functions and
pruning strategies, can be a valuable tool for tackling the TSP, especially for
smaller instances or finding good approximate solutions. For larger
problems, alternative algorithms like heuristics or metaheuristics might be
more efficient for finding acceptable solutions within a reasonable time frame.

Design and Analysis of Algorithm 57


Unit 4
Naïve String Matching Algorithm:
Problem:

Finds all occurrences of a pattern string (P) within a text string (T).

Algorithm:

1. Slide a window of length P along T:

Align the start of P with the current position in T.

2. Compare characters:

For each character in both P and T within the window:

If characters match, continue to the next pair.

If characters mismatch, shift the window one position to the right.

3. Match Found:

If all characters in P match those in the corresponding window in T, report a


match at the current position.

4. Repeat:

Continue sliding and comparing until the end of T is reached.

Analysis:

Time Complexity: O(mn), where m is the length of P and n is the length of T.


Each character in T is compared with characters in P in the worst case.

Space Complexity: O(1), as only a constant amount of extra space is used for
variables and pointers.

Key Points:

Simple and easy to understand, but can be inefficient for large texts and
patterns.

Design and Analysis of Algorithm 58


Often used as a baseline for comparison with more advanced string matching
algorithms.

Strengths:

Simple implementation.

No preprocessing required for the pattern or text.

Limitations:

Inefficient for large texts and patterns due to repeatedly comparing characters.

Doesn't leverage any knowledge about the pattern or text structure to optimize
comparisons.

In conclusion, the naïve string matching algorithm offers a straightforward


approach to finding pattern occurrences within a text. However, its simplicity
comes at the cost of efficiency, especially for large inputs. Understanding its
strengths and limitations is crucial for choosing the appropriate algorithm for
specific string matching tasks.

Rabin-Karp Algorithm: Hashing Your Way to Efficient


String Matching
The Rabin-Karp algorithm is a powerful string matching technique that utilizes
hashing to find all occurrences of a pattern string within a text string. Its efficient
approach makes it a popular choice for situations where large texts and long
patterns are involved.
Problem:
Find all occurrences of a pattern string P within a text string T.

Algorithm:

1. Define a rolling hash function: Assign a numerical value (hash) to each


substring of length P in T and the pattern P itself. Popular hash functions
include sum of character ASCII values or polynomial rolling hash.

Design and Analysis of Algorithm 59


2. Initial comparison: Calculate the hash values for the first P characters of T
and the pattern P. Compare the two hashes.

3. Rolling hash and comparison: If the hashes match, compare the characters
individually to confirm an actual match. Otherwise, shift the window by one
character in T, recalculate the hash for the new window (using the rolling hash
property), and compare it to the pattern's hash.

4. Repeat: Continue shifting, hashing, and comparing until the end of T is


reached.

Analysis:

Time Complexity: O(m + n), where m is the length of P and n is the length of T.
The rolling hash computation significantly reduces redundant comparisons
compared to the naive algorithm.

Space Complexity: O(1), as only a constant amount of extra space is used for
variables and temporary values.

Key Points:

Offers significant efficiency improvement over the naive algorithm for large texts
and patterns.

Requires careful choice of a good hash function to avoid collisions (different


strings having the same hash).

Probabilistic, meaning a collision could lead to a false positive match, requiring


further character-by-character verification.

Strengths:

Efficient for large datasets.

Simple implementation.

Good average-case performance.

Limitations:

Requires choosing a good hash function to avoid collisions.

Design and Analysis of Algorithm 60


Probabilistic nature potentially leads to false positives.

Not as efficient as the KMP algorithm in the best case.

Overall, the Rabin-Karp algorithm provides a powerful and efficient approach


to string matching, especially for large datasets. Its probabilistic nature
requires caution, but its advantages often outweigh its limitations. While
alternative algorithms like KMP may offer optimal worst-case performance,
Rabin-Karp shines in practical scenarios and offers a valuable tool for various
string search tasks.

Finite Automata for String Matching:


Concept:

String matching with finite automata involves constructing a deterministic finite


automaton (DFA) that recognizes a specific pattern string.

The DFA transitions between states based on the characters it reads in the text
string.

A match is found when the DFA reaches an accepting state, indicating the
pattern has been found within the text.

Construction:

1. States: Create states for each possible prefix of the pattern, including an initial
state and an accepting state for the full pattern.

2. Transitions: Define transitions between states based on matching characters


in the pattern:

If a character matches the next character in the pattern, transition to the


state representing the next prefix.

If a character doesn't match, transition to a "failure" state or stay in the


current state (depending on the algorithm).

Algorithm:

1. Start: Begin in the initial state of the DFA.

Design and Analysis of Algorithm 61


2. Read Characters: For each character in the text string:

Follow the corresponding transition in the DFA based on the current state
and the character.

3. Match Found: If the DFA reaches the accepting state, a match is found at the
current position in the text.

4. Repeat: Continue reading characters and following transitions until the end of
the text is reached.

Analysis:

Time Complexity: O(n), where n is the length of the text string. Each character
is processed only once.

Space Complexity: O(m), where m is the length of the pattern string, to store
the DFA.

Strengths:

Efficient linear-time matching for fixed patterns.

DFA can be pre-computed for known patterns, making subsequent searches


fast.

Limitations:

Constructing the DFA can be complex for long patterns.

Not as flexible as algorithms like KMP or Boyer-Moore for handling multiple


patterns or variable-length patterns.

In conclusion, finite automata provide a powerful and efficient approach to


string matching for fixed patterns. Their ability to process text in linear time
makes them valuable for various string search tasks, especially when
patterns are known beforehand. However, understanding their construction
complexity and limitations is crucial for choosing the appropriate algorithm
for specific string matching problems.

Knuth-Morris-Pratt (KMP) Algorithm:

Design and Analysis of Algorithm 62


Problem: Finds all occurrences of a pattern string P within a text string T.

Key Idea: Preprocesses the pattern to create a "partial match" table (also called
"failure function"), guiding efficient shifts during the search to avoid redundant
comparisons.

Algorithm:

1. Preprocessing (Building the Partial Match Table):

Create an array lps (longest proper prefix which is also a suffix) of length
m (pattern length).

Initialize lps[0] = 0 .

For i = 1 to m-1:

If character at i matches the character at lps[i-1] + 1, set lps[i] =

lps[i-1] + 1 .

Otherwise, set lps[i] = 0 and start comparing from the beginning of


the pattern again.

2. Searching for Matches:

Initialize i = 0 (index for text) and j = 0 (index for pattern).

While i < n (text length):

If characters at i and j match:

If j == m-1 (end of pattern), report a match at i-m+1.

Increment both i and j.

Else (mismatch):

If j > 0, set j = lps[j-1] (shift pattern using partial match info).

Else, increment i (no match for first character).

Analysis:

Time Complexity: O(n + m) in the worst case, where n is the text length and m
is the pattern length. Preprocessing takes O(m), and searching takes O(n) due

Design and Analysis of Algorithm 63


to efficient shifts.

Space Complexity: O(m) to store the lps array.

Key Advantages:

Efficient worst-case time complexity compared to naive string matching


(O(nm)).

Avoids redundant comparisons by using the partial match table to guide shifts.

Well-suited for patterns with repeated substrings.

Strengths:

Best-case time complexity of O(n) when no mismatches occur.

Optimal worst-case time complexity for string matching.

Space complexity independent of text length.

Limitations:

Preprocessing overhead for building the partial match table.

Can be less efficient than Boyer-Moore in certain practical scenarios, especially


for patterns with rare characters.

In conclusion, the KMP algorithm stands out as a highly efficient and elegant
approach to string matching, offering optimal worst-case time complexity and
clever strategies to avoid redundant comparisons. Its ability to leverage
pattern structure makes it a valuable tool for various text search tasks,
especially when time efficiency is critical.

Demystifying the World of Computational


Complexity: Basic Concepts
Computational complexity delves into the heart of efficiency, analyzing how
resources like time and memory are consumed by algorithms. This fascinating field
equips us with tools to understand, compare, and design efficient algorithms for
solving diverse computational problems. Here's a breakdown of some fundamental
concepts:

Design and Analysis of Algorithm 64


1. Big O Notation:

This powerful notation quantifies the upper bound of an algorithm's resource


consumption (usually time or space) as the input size grows.

Common complexity classes include:

O(1): Constant time - independent of input size (e.g., accessing an element


in an array by its index).

O(n): Linear time - increases proportionally with input size (e.g., traversing
a list).

O(n^2): Quadratic time - increases quadratically with input size (e.g.,


nested loops iterating over all pairs in a list).

O(log n): Logarithmic time - increases logarithmically with input size (e.g.,
binary search).

O(exponential): Exponential time - increases exponentially with input size


(e.g., brute-force search).

2. Worst-Case, Average-Case, and Best-Case Complexity:

These terms assess an algorithm's performance under different scenarios:

Worst-case: Analyzes the maximum resource consumption regardless of


the specific input (e.g., a sorting algorithm taking O(n^2) time for a specific
pattern).

Average-case: Considers the average resource consumption across all


possible inputs (e.g., a sorting algorithm taking O(n log n) time on average).

Best-case: Examines the minimum resource consumption for the most


favorable input (e.g., a sorting algorithm taking O(n) time when the input is
already sorted).

3. Space Complexity:

Just like time, we also analyze the memory requirements of algorithms using
Big O notation.

Common space complexity classes include:

Design and Analysis of Algorithm 65


O(1): Constant space - independent of input size (e.g., swapping two
variables).

O(n): Linear space - increases proportionally with input size (e.g., storing
every element in a list).

O(log n): Logarithmic space - increases logarithmically with input size (e.g.,
recursion with constant additional memory per level).

4. NP-Completeness and NP-Hardness:

These concepts categorize problems within the realm of computational


complexity.

NP-Complete: Problems where finding a solution and verifying its correctness


have the same complexity (both in the NP complexity class).

NP-Hard: Problems at least as difficult as NP-Complete problems (any NP-


Complete problem can be reduced to an NP-Hard problem in polynomial time).

For these classes, no known efficient algorithms exist for all instances, although
specialized methods may work for specific cases.

Understanding these fundamental concepts equips you with a powerful lens to


analyze algorithms, compare their efficiencies, and choose the best-suited ones for
specific tasks. As you delve deeper into the world of computational complexity, you'll
discover intricate relationships between algorithms, problem structures, and
resource limitations, unlocking new perspectives on the capabilities and limitations
of computing.

Polynomial Complexity:
Algorithms with a time or space complexity that grows as a polynomial function
of the input size (e.g., O(n), O(n^2), O(log n)).

Key Characteristics:

Resource consumption increases relatively slowly as input size grows.

Generally considered "tractable" or "efficient" problems, solvable within


reasonable time and resources for practical input sizes.

Design and Analysis of Algorithm 66


Examples:

Sorting algorithms like merge sort and quicksort (O(n log n)).

Graph algorithms like depth-first search (O(V + E)).

Linear programming (O(n^3.5)).

Non-Polynomial Complexity:
Algorithms with a time or space complexity that grows faster than any
polynomial function of the input size (e.g., O(2^n), O(n!), O(n^n)).

Key Characteristics:

Resource consumption increases very rapidly as input size grows.

Often considered "intractable" or "hard" problems, becoming impractical to


solve for even moderate input sizes.

Examples:

Brute-force search for a specific element in an unsorted list (O(n)).

Traveling salesperson problem (worst-case O(n!)).

Solving the Rubik's Cube (O(2^n)).

Key Differences:

Feature Polynomial Complexity Non-Polynomial Complexity

Growth Rate Bounded by a polynomial Grows faster than any polynomial

Practicality Tractable for most inputs Often intractable for larger inputs

Examples Sorting, graph algorithms Brute-force search, NP-hard problems

Implications:

Problem Solving: Polynomial-time algorithms are generally preferred for


practical applications due to their scalability and efficiency.

Algorithm Design: Researchers strive to find polynomial-time algorithms for


challenging problems, or, if not possible, develop efficient approximations or

Design and Analysis of Algorithm 67


heuristics.

Computational Limits: Non-polynomial complexity suggests limitations to what


computers can solve efficiently, leading to exploration of alternative approaches
like approximation algorithms and quantum computing.

Understanding these distinctions is crucial for:

Selecting appropriate algorithms for specific tasks based on input size and
resource constraints.

Appreciating the challenges and limitations of problem-solving in computing.

Guiding research efforts towards efficient algorithms and alternative


computational approaches.

In the realm of computational complexity, NP-hard and NP-complete classes occupy


a fascinating and challenging space. Here's a deeper dive into their intricacies:

NP-Completeness:
Imagine problems where verifying a proposed solution is easy in polynomial
time, but finding that solution in the first place seems much harder. NP-
complete problems fall into this category.

Key Characteristics:

Belonging to the NP complexity class (where solutions can be verified in


polynomial time).

Every other NP problem can be reduced to an NP-complete problem in


polynomial time. This means if you can efficiently solve an NP-complete
problem, you can efficiently solve any problem in NP.

Examples:

Boolean Satisfiability Problem (SAT).

Traveling Salesperson Problem (TSP).

Vertex Cover Problem.

Design and Analysis of Algorithm 68


NP-Hardness:
Think of problems that are at least as difficult as NP-complete problems, even
though they might not themselves be in the NP class.

Key Characteristics:

Any NP-complete problem can be reduced to an NP-hard problem in


polynomial time.

Not necessarily in the NP class (verification might be harder than finding a


solution).

Examples:

Hamiltonian Circuit Problem.

Subset Sum Problem.

3-SAT problem.

Relationship between NP-Completeness and NP-Hardness:

Every NP-complete problem is also NP-hard.

All problems in NP can be reduced to any NP-complete problem in polynomial


time.

The big question is: are there any NP-complete problems that are also easy
to solve (P = NP)? This remains an unsolved problem in theoretical computer
science, with significant implications for the field.

Implications of NP-Hardness and NP-Completeness:

Understanding these classes helps us identify problems that are likely to be


computationally difficult.

It forces us to seek alternative approaches like approximation algorithms,


heuristics, or parallel computing methods for tackling such problems.

Research in this area focuses on proving whether specific problems are NP-
complete, finding efficient approximations, or even proving P = NP (which would
overturn current complexity assumptions).

Design and Analysis of Algorithm 69


Exploring these concepts further:

Specific examples of NP-complete and NP-hard problems, and their reductions.

Techniques for approximating NP-hard problems.

Recent advancements and open questions in NP-completeness research.

Remember, NP-hard and NP-complete problems represent a fascinating frontier in


computational complexity, pushing the boundaries of what we can efficiently solve
with computers. Understanding them leads to deeper appreciation for the
challenges and potential of computation itself!

approximation algorithms
When Optimal Solutions Are Elusive:

For some computationally challenging problems, finding the exact optimal


solution in a reasonable time becomes impractical, especially for large inputs.

Approximation algorithms emerge as a powerful alternative, aiming to find


solutions that are "good enough" but not necessarily perfect.

Trading Perfection for Efficiency:

Key Idea: Accept a slight deviation from the optimal solution in exchange for
significantly faster computation time.

Focus on finding solutions that are guaranteed to be within a certain factor


(approximation ratio) of the optimal solution.

Prioritize efficiency and scalability over absolute accuracy.

Working with Guarantees:

Approximation algorithms offer performance guarantees, bounds on how far


their solutions deviate from the optimal:

Approximation Ratio: Expresses the worst-case difference between the


approximate solution and the optimal solution as a multiplicative factor (e.g.,
a 2-approximation algorithm guarantees a solution no more than twice as
bad as the optimal).

Design and Analysis of Algorithm 70


Additive Error: Specifies the maximum difference between the
approximate and optimal solutions as a fixed constant (e.g., an algorithm
with additive error of 5 might produce a solution 5 units worse than the
optimal).

Common Techniques:

Greedy Algorithms: Make locally optimal choices at each step, hoping to


reach a good overall solution (e.g., Prim's algorithm for minimum spanning
trees).

Dynamic Programming: Break problems into smaller subproblems, solving


them optimally and combining solutions (e.g., knapsack problem with bounded
weights).

Linear Programming Relaxation: Relax integrality constraints in a problem's


integer programming formulation, allowing for efficient LP solutions that can be
rounded to approximate the original problem (e.g., set cover problem).

Randomization: Introduce randomness to diversify solution exploration and


potentially find better solutions (e.g., randomized rounding for certain graph
problems).

Applications:

NP-Hard Problems: Vertex cover, set cover, traveling salesperson problem,


knapsack problem.

Machine Learning: Clustering, dimensionality reduction, feature selection.

Scheduling and Resource Allocation: Task scheduling, load balancing,


network optimization.

Approximation algorithms provide a practical toolkit for solving difficult


problems in various domains. They strike a balance between optimality and
efficiency, opening up possibilities for solving problems that would otherwise
be computationally intractable.

Flow and Sorting Network

Design and Analysis of Algorithm 71


A sorting network is a specific type of network designed to sort a sequence of
elements. It consists of comparators arranged in a specific pattern, ensuring that
any input sequence passes through these comparators and gets sorted by the time
it reaches the output.

Characteristics of Sorting Networks:


1. Fixed Structure: Sorting networks have a fixed structure that doesn’t change
based on input data. The arrangement of comparators is predetermined and
doesn't depend on the values being sorted.

2. Parallel Comparisons: Elements are compared simultaneously in parallel,


making sorting networks suitable for hardware implementations where parallel
operations are efficient.

3. Deterministic Sorting: Due to the fixed structure, sorting networks guarantee


that any input sequence will be sorted correctly when it reaches the output.

Flow in Sorting Networks:


The flow of elements through a sorting network happens by systematically
comparing and swapping elements as per the predefined structure. Each
comparator compares two elements and potentially swaps them to ensure the
correct ordering.

For example, in a 4-element sorting network:

Comparator 1 compares elements 1 and 2, potentially swapping them.

Comparator 2 compares elements 3 and 4, potentially swapping them.

Comparator 3 compares elements 1 and 3, then 2 and 4, and again 1 and 2 to


ensure the correct sorting order.

This process continues until all necessary comparisons are made, guaranteeing the
correct sorting of the input sequence.

Flow in Flow Networks:

Design and Analysis of Algorithm 72


A flow network is a directed graph where each edge has a capacity and represents
the flow that can pass through it. It's often used to model systems like transportation
networks or data flow.
The flow in a flow network follows specific paths, respecting edge capacities and
meeting certain constraints (like conservation of flow at nodes). Algorithms like the
Ford-Fulkerson method or the Edmonds-Karp algorithm determine the maximum
flow through the network.

Importance and Applications:


Sorting networks are useful in hardware design and parallel computing due to
their fixed structure and ability to sort in a predictable manner.

Flow networks are crucial in modeling and solving flow-related problems such
as network flow optimization, circulation, and transportation problems.

Understanding the flow and operations within sorting networks aids in hardware
optimization, while comprehending flow networks is essential for solving various
flow-related optimization problems in different domains.
The Ford-Fulkerson algorithm and its connection to sorting networks are two
intriguing topics in algorithm design and analysis. Here's a breakdown of each:

Ford-Fulkerson Algorithm (Flow Networks):


This algorithm finds the maximum flow that can pass through a flow network, a
representation of directed connections with capacities. It iteratively increases the
flow by finding augmenting paths, paths that can carry additional flow from the
source to the sink without exceeding any capacity constraints. It offers:

Efficiency: Runs in O(mf) time, where m is the number of edges and f is the
maximum flow value.

Simplicity: Relatively easy to understand and implement.

Guarantees: Finds a maximum flow in any flow network.

Limitations:

Can be slow for large networks or with many augmenting paths.

Design and Analysis of Algorithm 73


Doesn't provide intermediate flow values (only the final maximum).

Connections to Sorting Networks:


Interestingly, the Ford-Fulkerson algorithm can be used to design sorting networks.
These are networks of comparators (represented by edges) that can sort a
sequence of numbers by pairwise comparisons. By associating input values with
flow in a network and directing it towards "correct" outputs, the Ford-Fulkerson
algorithm can be used to find paths that sort the entire sequence.

Benefits of Sorting Networks:

Hardware implementation: Can be implemented directly in hardware due to


their parallel nature.

Efficient for small data: Can be faster than software sorting algorithms for small
datasets.

Challenges:

Scalability: Designing efficient networks for large data sizes can be complex.

Difficulty in optimization: Finding the optimal network for a specific sorting


problem can be computationally expensive.

Overall, the Ford-Fulkerson algorithm and its connection to sorting networks


highlight the fascinating interplay between different areas of computer
science. Its ability to tackle both flow optimization and sorting problems
showcases its versatility and effectiveness. Understanding these connections
can lead to interesting insights into algorithm design and the potential for
hardware-optimized computation.

Maximum bipartite matching


Maximum bipartite matching is a fascinating problem in graph theory with diverse
applications, from scheduling tasks to assigning students to schools. Here's a
comprehensive breakdown:
Problem:

Design and Analysis of Algorithm 74


Given a bipartite graph (a graph where edges only connect vertices from different
"sides"), find the largest set of edges where each vertex is incident to at most one
edge. These edges represent pairings or assignments between entities on the two
sides of the graph.

Examples:

Matching workers to jobs.

Assigning students to courses.

Pairing socks in a drawer.

Algorithms:

Several efficient algorithms solve the maximum bipartite matching problem:

Greedy algorithms: Choose edges that seem promising at each step, aiming
to build a large matching. Examples include Hungarian algorithm and the
augmenting path method.

Network flow algorithms: Convert the matching problem into a flow network
problem and utilize flow optimization techniques. Ford-Fulkerson algorithm can
be adapted for this purpose.

Matching algorithms: Algorithms specifically designed for bipartite matching


problems, like Hopcroft-Karp algorithm, offer optimal time complexity.

Properties:

Kőnig's Theorem: The size of the maximum matching equals the minimum
size of a vertex cover in the graph (a set of vertices that includes at least one
endpoint of every edge). This provides a valuable connection between matching
and covering problems.

Hall's Marriage Theorem: A bipartite graph has a perfect matching (all vertices
are matched) if and only if every subset of one side has at least as many
neighbors on the other side. This theorem allows for efficient checks for the
existence of perfect matchings.

Applications:

Design and Analysis of Algorithm 75


Maximum bipartite matching finds applications in various domains:

Resource allocation: Scheduling tasks to processors, assigning jobs to


workers, resource allocation in distributed systems.

Scheduling and optimization: Course scheduling, conference speaker


assignment, transportation planning.

Cryptography: Used in key exchange protocols and other cryptographic


algorithms.

Complexity:

Finding the maximum matching in a bipartite graph can be done in O(V^3/2 log
V) time for dense graphs and O(E sqrt(V)) time for sparse graphs, where V is
the number of vertices and E is the number of edges.

Conclusion:
Maximum bipartite matching is a powerful tool for solving real-world problems
involving assignments and pairings. With diverse algorithms, theoretical
connections to other graph problems, and numerous applications, it's a valuable
concept in theoretical and practical computer science.

Comparison Network:
A comparison network is a network of comparators used to sort elements. It
represents a sorting algorithm with a fixed topology of comparators that compare
pairs of elements. These comparators perform comparisons and swaps to achieve
the sorted order.

Properties:

Fixed Structure: Similar to sorting networks, a comparison network has a


fixed structure regardless of the input data.

Parallel Comparisons: Elements are compared in parallel through


simultaneous comparisons in the network.

Deterministic Sorting: The network guarantees correct sorting of any input


sequence.

Design and Analysis of Algorithm 76


Example: For (n) elements, the number of comparators in a comparison
network is at least O(n log n) due to the nature of comparison-based sorting
algorithms like Merge Sort or Quick Sort.

Zero-One Principle:
The zero-one principle is a concept used in sorting networks. It states that for a
sorting network to correctly sort all inputs, it’s sufficient to test it with all possible
combinations of 0s and 1s (binary inputs) within a fixed size. If it correctly sorts all
possible binary inputs, it will sort all inputs, confirming its correctness.

Application: Verifying the correctness of sorting networks by testing all


possible combinations of binary inputs within a certain range.

Bitonic Sorting Network:


A bitonic sorting network is a specific type of sorting network used to sort
sequences that can be divided into two parts, one of which is sorted in ascending
order, followed by a part sorted in descending order.

Structure: It's constructed using a series of compare-exchange operations to


sort bitonic sequences, which are sequences that first increase, then decrease.

Size and Complexity: The number of comparators required in a bitonic sorting


network is O(nlog^2 n) for sequences of size n.

Applications: Commonly used in parallel computing and hardware design due


to its structured and predictable nature.

Merging Network:
A merging network is a network designed to merge or combine two sorted
sequences into a single sorted sequence. It uses a series of comparators to merge
the sequences by comparing elements and arranging them in sorted order.

Structure: The structure of a merging network facilitates the merging of two


sorted sequences by comparing elements from both sequences and merging
them in order.

Design and Analysis of Algorithm 77


Efficiency: Merging networks are efficient and commonly used in sorting
algorithms like Merge Sort, where they are crucial for merging smaller sorted
sequences into larger ones.

Understanding these network-based sorting and merging principles is vital for


designing efficient algorithms and networks for sorting, merging, and related parallel
computing or hardware design applications.
alternative on these 4 topics if needed:
Comparison Networks:

Structure: A fixed network of wires and comparators, designed to sort a fixed


number of inputs.

Comparators: Elements that compare two inputs and swap them if they are in
the wrong order.

Functionality: Inputs enter at the top, flow through comparators, and exit
sorted at the bottom.

Properties:

Non-adaptive: Comparison sequence is fixed, independent of input values.

Depth determines time complexity: The number of comparator layers


dictates the number of comparison steps.

Zero-One Principle:

Key Idea: A comparison network works correctly for all inputs if and only if it
works correctly for all inputs consisting of only 0s and 1s.

Implication: Simplifies testing and design of comparison networks by focusing


on binary inputs.

Proof: Relies on the fact that comparators only consider the relative order of
inputs, not their specific values.

Bitonic Sorting Network:

Structure: A specific type of comparison network that sorts inputs using a


divide-and-conquer approach based on bitonic sequences.

Design and Analysis of Algorithm 78


Bitonic Sequence: A sequence that is first monotonically increasing and then
monotonically decreasing (or vice versa).

Steps:

Divides inputs into smaller bitonic sequences.

Sorts each bitonic sequence using a merging network.

Merges sorted sequences to achieve complete sorting.

Efficiency: Can achieve O(log^2 n) depth for sorting n elements.

Merging Network:

Purpose: Combines two sorted sequences into a single sorted sequence.

Structure: A specific type of comparison network designed for merging.

Example: The simplest merging network, a butterfly network, merges two


sorted sequences of length two.

Building Blocks: Merging networks are often used as building blocks for larger
sorting networks, including bitonic sorting networks.

Key Applications:

Hardware Implementation: Comparison networks are well-suited for hardware


implementation due to their fixed structure and parallel nature.

Parallel Sorting: Bitonic sorting networks and other comparison networks can
be efficiently parallelized for sorting on multi-processor systems.

Specialized Sorting: They can be optimized for specific input distributions or


hardware constraints.

Additional Insights:

Limitations: Comparison networks are generally less efficient than adaptive


sorting algorithms for software implementation.

Design Challenges: Designing optimal comparison networks for larger input


sizes is a complex task.

Design and Analysis of Algorithm 79


Research Areas: Ongoing research explores alternative network structures,
optimization techniques, and applications in parallel computing and hardware
design.

NOTE
made with gemini pro so content maybe wrong :)

report errors at telegram @yash707, there maybe some errors especially in


examples!
(gpt’s markdown for complexity analysis was not formatting properly on notion)

updated:

https://yashnote.notion.site/DAA-1082ac510a41474dbcac15d4923aef46?pvs=4

Design and Analysis of Algorithm 80

You might also like