Chapter 2
Chapter 2
Complexity Analysis
A Data Structure is a way to store and organize data so that it can be used efficiently. The
data structure name indicates that the data is being stored in memory. Data Structures are
widely usedin almost every aspect of Computer Science i.e. operating Systems, Compiler
Design, Artificial intelligence, Graphics, and many more.
Need of data structure
Processor speed: Data is growing day by day to the billions of files per entity, the processor
may not be able to deal with that much data in a very fast enough way.
Data Search: Consider an inventory size of 106 items in a store; if our application needs to
search for a particular item, it needs to traverse 106 items every time, which results in
slowing down the search process.
Multiple requests: If thousands of users are searching the data simultaneously on a web
server, then there are the chances that a very large server can be failed during that process
to solve the above problems, data structures are used.
Traversing: Every data structure contains a set of data elements. Traversing the data
means visiting each element of the data structure. If we want to calculate the average of
a student's marks in 6 subjects, we need to traverse the complete array of marks and
calculate the total sum.
Insertion: Insertion can be defined as the process of adding elements to the data
structure at any location. If the size of the data structure is n then we can only insert n-
1 data elements into it.
Deletion: The process of removing an element from the data structure is called Deletion.
We can delete an element from the data structure at any random location. If we try to
delete an element from an empty data structure then underflow occurs.
Searching: The process of finding the location of an element within the data structure
is called Searching. There are two algorithms to perform searching, Linear Search and
Binary Search. We will discuss each one of them later in this tutorial.
Sorting: The process of arranging the data structure in a specific order is known as
Sorting. Many algorithms can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.
Merging: When two lists List A and List B of size M and N respectively, of similar
types of elements, clubbed or joined to produce the third list, List C of size (M+N),
then this processis called merging.
The primitive (in-built like int, char, float etc.) data structure is a fundamental type of data
structure that stores the data of only one type whereas the non-primitive data structure is a
type of data structure that is user-defined and stores the data of different types in a single
entity.
Linear Data Structures: A data structure is called linear if all of its elements are arranged
in linear order. In linear data structures, the elements are stored in a non-hierarchical way
where each element has successors and predecessors except the first and last element.
Types of Linear Data Structures are given below:
Arrays: It is a collection of similar types of data items and each data item is called
an element of the array. The data type of the element may be any valid data type like
char, int, float, or double. The individual elements of the array age are: age[0],
age[1], age[2], age[3],. age[98], age[99].
Linked List: is a linear data structure that is used to maintain a list in the memory.
It canbe seen as the collection of nodes stored at non-contiguous memory locations.
Each node of the list contains a pointer to its adjacent node.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one
end, called top.
The queue is a linear list in which elements can be inserted only at one end
called rear and deleted only at the other end called a front. The queue is opened at
both ends therefore it follows First-In-First-Out (FIFO) methodology for storing the
dataitems.
Non-Linear Data Structures: This data structure does not form a sequence i.e. each item
or element is connected with two or more other items in a non-linear arrangement. The data
elements are not arranged in sequential structure.
Trees: are multilevel data structures with a hierarchical relationship among its
elements known as nodes. The bottommost nodes in the hierarchy are called leaf
nodes while the topmost node is called the root node
Graphs: can be defined as the pictorial representation of the set of elements
(representedby vertices) connected by the links known as edges.
2.1.1 Algorithm
What are algorithms? Why is the study of algorithms useful? What is the role of algorithms
relative to other technologies used in computers? In this chapter, we will answer these
questions.
Definition and characteristics of Algorithm
Definition:
Characteristics of algorithm
An algorithm can be as
Design of Algorithms: this consists of the steps a programmer should do before they start
coding the program in a specific language. Proper program design helps other programmers
to maintain the program in the future. There are different ways to represent an algorithm
such as programming language, Natural Language, Pseudo code, and flowchart. The most
commonrepresentations are flow chart and pseudo code
2.2 Computational and asymptotic complexity
Computational Complexity
This refers to the study of the amount of resources, typically time and space, required by an
algorithm to solve a given problem. It's a measure of how efficient an algorithm is in practice.
Time Complexity: This measures how much time an algorithm takes to execute as a
function of the input size. It's usually denoted by the letter "T(n)", where "n" is the input
size.
Space Complexity: This measures how much memory an algorithm uses as a function of
the input size. It's denoted by "S(n)".
Asymptotic Complexity
Asymptotic complexity focuses on the behavior of an algorithm as the input size approaches
infinity. It's a theoretical measure that provides a simplified way to compare algorithms without
worrying about constant factors or lower-order terms.
To measure the time efficiency of an algorithm, you can write a program based on the
algorithm, execute it, and measure the time it takes to run.
The execution time that you measure in this case would depend on a number of factors
such as:
Compiler
Operating system
Programming language
Input data
However, we would like to determine how the execution time is affected by the nature of
the algorithm.
The rate at which the running time of an algorithm increases as a result of an increase in
the volume of input data is called the order of growth of the algorithm.
The big O notation has been accepted as a fundamental technique for describing the
efficiency of an algorithm.
The different orders of growth and their corresponding big O notations are:
Constant - O(1)
Logarithmic - O(log n)
Linear - O(n)
Quadratic - O(n2)
Cubic - O(n3)
According to their orders of growth, the big O notations can be arranged in an increasing
order as:
O(1) < O(log n) < O(n) < O(n log n) < O(n2) < O(n3) < O(2n) < O(10n)
1. Choose Constants:
o c = 3: We select a constant c = 3.
o n₀ = 1: We choose a value n₀ = 1.
2. Verify Inequality:
o We need to show that 2n² + 3n + 1 ≤ 3n² for all n ≥ 1.
3. Proof:
o For n ≥ 1, we can add 2n² to both sides of the inequality:
2n² + 3n + 1 + 2n² ≤ 3n² + 2n²
4n² + 3n + 1 ≤ 5n²
o Since 5n² ≥ 4n² + 3n + 1 for all n ≥ 1, the inequality holds.
Conclusion
In simpler terms:
The example demonstrates that the function's growth is at most quadratic, meaning it
increases proportionally to the square of the input size.
The higher-order term (2n²) dominates the growth, and the lower-order terms (3n and 1)
become insignificant as n gets larger.
This means that the function's running time will increase at a rate that is proportional to the
square of the input size. For example, if the input size doubles, the running time will increase by
a factor of 4.
Ω Notation is used to provide a lower bound on the growth rate of a function. It indicates that a
function's growth is at least proportional to another function.
Formal Definition: A function f(n) is said to be Ω(g(n)) if there exist positive constants c and n₀
such that f(n) ≥ cg(n) for all n ≥ n₀.
Interpretation:
Lower Bound: Ω(g(n)) means that f(n) will grow at least as fast as g(n) for sufficiently
large values of n.
Constant Factor: The constant c allows for a certain degree of flexibility in the
comparison, as the exact growth rate may differ by a constant factor.
Example:
Best-Case Analysis: Ω notation is often used to analyze the best-case running time of
algorithms.
Lower Bounds: It helps in understanding the fundamental limitations of certain
problems or algorithms.
Algorithm Comparison: It can be used to compare the lower bounds of different
algorithms and identify the most efficient ones.
In Summary: Ω notation is a valuable tool for analyzing the growth rate of functions and
understanding the lower bounds of algorithms. It provides a useful perspective on the
performance of algorithms and helps in making informed decisions about their suitability for
different applications.
Θ Notation
Θ Notation provides both an upper bound and a lower bound on the growth rate of a function.
It indicates that a function's growth is exactly proportional to another function.
Formal Definition: A function f(n) is said to be Θ(g(n)) if there exist positive constants c₁, c₂,
and n₀ such that c₁g(n) ≤ f(n) ≤ c₂g(n) for all n ≥ n₀.
Interpretation:
Tight Bound: Θ(g(n)) means that f(n) grows neither faster nor slower than g(n) by more
than a constant factor for sufficiently large values of n.
Exact Proportionality: The constants c₁ and c₂ ensure that f(n) is tightly bounded by
g(n) within a constant multiplicative factor.
Example:
In Summary: Θ notation is a powerful tool for analyzing the growth rate of functions and
understanding the exact behavior of algorithms. It provides a precise and informative way to
compare the efficiency of different algorithms and make informed decisions about their
suitability for various applications.
Little-o Notation (o(n)) is used to indicate that a function grows strictly slower than another
function. It implies that the function's growth is asymptotically negligible compared to the other
function.
Formal Definition: A function f(n) is said to be o(g(n)) if for any positive constant c, there
exists a positive integer n₀ such that f(n) < cg(n) for all n ≥ n₀.
Interpretation:
Negligible Growth: o(g(n)) means that f(n) grows so much slower than g(n) that the
constant factor c can always be made small enough to ensure that f(n) is strictly less than
cg(n) for sufficiently large values of n.
Dominance: g(n) dominates f(n) in terms of growth rate.
Example:
f(n) = n is o(n²) because for any c > 0, we can choose n₀ = ⌈1/c⌉, and we have n < cn² for
all n ≥ n₀.
In Summary: Little-o notation is a valuable tool for analyzing the growth rate of functions and
understanding the relative dominance of functions. It provides a precise way to describe the
asymptotic behavior of functions and is essential for understanding the complexity of algorithms
and data structures.
OO Notation
OO Notation is a less commonly used notation that indicates that a function grows strictly
faster than another function. It implies that the function's growth is asymptotically dominant
compared to the other function.
Formal Definition: A function f(n) is said to be OO(g(n)) if for any positive constant c, there
exists a positive integer n₀ such that f(n) > cg(n) for all n ≥ n₀.
Interpretation:
Dominant Growth: OO(g(n)) means that f(n) grows so much faster than g(n) that the
constant factor c can always be made small enough to ensure that f(n) is strictly greater
than cg(n) for sufficiently large values of n.
Dominance: f(n) dominates g(n) in terms of growth rate.
Example:
f(n) = n² is OO(n) because for any c > 0, we can choose n₀ = ⌈1/c⌉, and we have n² > cn
for all n ≥ n₀.
In Summary: OO notation is a less frequently used notation compared to Big-O, Big-Theta, and
Big-Omega. It provides a way to express the asymptotic dominance of one function over another,
and can be useful in certain contexts where a precise understanding of the relative growth rates
of functions is required.
Key Points:
Big-O, Ω, and Θ notations are used to describe the asymptotic behavior of functions, while
little-o and OO notations are used to describe the relative growth rates of functions.
Big-O is often used to analyze the worst-case time complexity of algorithms, while Ω is
used to analyze the best-case time complexity.
Θ notation is used to describe the tight bound on the growth rate of a function.
Little-o is used to indicate that a function is asymptotically negligible compared to another
function, and OO is used to indicate that a function is asymptotically dominant compared
to another function.
Common Complexity Classes
Complexity classes are fundamental categories used to classify problems based on the resources
required to solve them. Here are some of the most common complexity classes:
When analyzing the performance of an algorithm, we often consider three different scenarios:
1. Best-Case Complexity
This refers to the minimum amount of time or space the algorithm will take to complete its task.
It's usually the most optimistic scenario and may not be representative of typical performance.
Example: In bubble sort, the best-case scenario occurs when the input array is already sorted.
In this case, the algorithm only needs to make one pass through the array to determine that it's
sorted, resulting in a time complexity of O(n).
2. Average-Case Complexity
This refers to the typical amount of time or space the algorithm will take to complete its task. It's
often calculated by analyzing the performance over a large number of random inputs.
Example: In quicksort, the average-case complexity is O(n log n). This assumes that the pivot
element chosen at each partitioning step is a random element of the array.
3. Worst-Case Complexity
This refers to the maximum amount of time or space the algorithm will take to complete its task.
It's usually the most pessimistic scenario and is often used to determine the algorithm's suitability
for large datasets.
Example: In bubble sort, the worst-case scenario occurs when the input array is sorted in
reverse order. In this case, the algorithm needs to make n-1 passes through the array, resulting in
a time complexity of O(n²).
Amortized Complexity
Key Idea:
Instead of analyzing the cost of each individual operation, we analyze the average cost of
a sequence of operations.
If most operations have a low cost, and only a few operations have a high cost, the
average cost can be significantly lower than the worst-case cost.
1. Accounting Method:
o Assign a potential to each data structure element.
o For each operation, analyze its actual cost and the change in potential.
o If the actual cost plus the increase in potential is always non-positive, the
amortized cost is at most the initial potential.
2. Aggregate Method:
o Analyze the total cost of a sequence of n operations.
o If the total cost is bounded by a function f(n), the amortized cost of each operation
is f(n)/n.