0% found this document useful (0 votes)
9 views

Chapter 2

Chapter 2 discusses the importance of data structures and algorithms in computer science, highlighting their efficiency in organizing and processing data. It covers various types of data structures, operations on them, and the significance of algorithms, including their characteristics and representation techniques. Additionally, the chapter delves into computational and asymptotic complexity, introducing Big-O, Ω, and Θ notations for analyzing algorithm efficiency.

Uploaded by

gaffiketema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Chapter 2

Chapter 2 discusses the importance of data structures and algorithms in computer science, highlighting their efficiency in organizing and processing data. It covers various types of data structures, operations on them, and the significance of algorithms, including their characteristics and representation techniques. Additionally, the chapter delves into computational and asymptotic complexity, introducing Big-O, Ω, and Θ notations for analyzing algorithm efficiency.

Uploaded by

gaffiketema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 2.

Complexity Analysis

2. Introduction to Data Structures and AlgorithmAnalysis


2.1 Data structure

A Data Structure is a way to store and organize data so that it can be used efficiently. The
data structure name indicates that the data is being stored in memory. Data Structures are
widely usedin almost every aspect of Computer Science i.e. operating Systems, Compiler
Design, Artificial intelligence, Graphics, and many more.
 Need of data structure

Processor speed: Data is growing day by day to the billions of files per entity, the processor
may not be able to deal with that much data in a very fast enough way.

Data Search: Consider an inventory size of 106 items in a store; if our application needs to
search for a particular item, it needs to traverse 106 items every time, which results in
slowing down the search process.

Multiple requests: If thousands of users are searching the data simultaneously on a web
server, then there are the chances that a very large server can be failed during that process
to solve the above problems, data structures are used.

Operations on data structure

Traversing: Every data structure contains a set of data elements. Traversing the data
means visiting each element of the data structure. If we want to calculate the average of
a student's marks in 6 subjects, we need to traverse the complete array of marks and
calculate the total sum.
Insertion: Insertion can be defined as the process of adding elements to the data
structure at any location. If the size of the data structure is n then we can only insert n-
1 data elements into it.
Deletion: The process of removing an element from the data structure is called Deletion.
We can delete an element from the data structure at any random location. If we try to
delete an element from an empty data structure then underflow occurs.

Searching: The process of finding the location of an element within the data structure
is called Searching. There are two algorithms to perform searching, Linear Search and
Binary Search. We will discuss each one of them later in this tutorial.
Sorting: The process of arranging the data structure in a specific order is known as
Sorting. Many algorithms can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.
Merging: When two lists List A and List B of size M and N respectively, of similar
types of elements, clubbed or joined to produce the third list, List C of size (M+N),
then this processis called merging.

Types of data structure

The primitive (in-built like int, char, float etc.) data structure is a fundamental type of data
structure that stores the data of only one type whereas the non-primitive data structure is a
type of data structure that is user-defined and stores the data of different types in a single
entity.

Linear Data Structures: A data structure is called linear if all of its elements are arranged
in linear order. In linear data structures, the elements are stored in a non-hierarchical way
where each element has successors and predecessors except the first and last element.
Types of Linear Data Structures are given below:

Arrays: It is a collection of similar types of data items and each data item is called
an element of the array. The data type of the element may be any valid data type like
char, int, float, or double. The individual elements of the array age are: age[0],
age[1], age[2], age[3],. age[98], age[99].
Linked List: is a linear data structure that is used to maintain a list in the memory.
It canbe seen as the collection of nodes stored at non-contiguous memory locations.
Each node of the list contains a pointer to its adjacent node.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one
end, called top.
The queue is a linear list in which elements can be inserted only at one end
called rear and deleted only at the other end called a front. The queue is opened at
both ends therefore it follows First-In-First-Out (FIFO) methodology for storing the
dataitems.

Non-Linear Data Structures: This data structure does not form a sequence i.e. each item
or element is connected with two or more other items in a non-linear arrangement. The data
elements are not arranged in sequential structure.

Types of Non-Linear Data Structures are given below:

Trees: are multilevel data structures with a hierarchical relationship among its
elements known as nodes. The bottommost nodes in the hierarchy are called leaf
nodes while the topmost node is called the root node
Graphs: can be defined as the pictorial representation of the set of elements
(representedby vertices) connected by the links known as edges.

2.1.1 Algorithm
What are algorithms? Why is the study of algorithms useful? What is the role of algorithms
relative to other technologies used in computers? In this chapter, we will answer these
questions.
 Definition and characteristics of Algorithm

Definition:

An Algorithm is a method of representing the step-by-step procedure for solving a


problem. It is a method of finding the right answer to a problem or a different problem
by breaking the problem into simple cases

 Characteristics of algorithm
An algorithm can be as

Finiteness: An algorithm should terminate in a finite number of steps.

Definiteness: Each step of the algorithm must be precisely (clearly) stated.


Effectiveness: Each step must be effective. i.e.; it should be easily convertible into
a program statement and can be performed exactly in a finite amount of time.
Generality: The algorithm should be complete in itself so that it can be used to solve
all problems of a given type for any input data.
Input/output: Each algorithm must take zero, one or more quantities as input data, and
give one or more output values. An algorithm can be written in English-like sentences
or any standard. Thus, these are the characteristics that an algorithm should have for its
fruitfulness.

Two commonly used techniques for designing algorithms are:


 Divide and conquer approach
 Greedy approach
Divide and conquer is a powerful approach for solving conceptually difficult problems.
Divide and conquer approach requires you to find a way of:
Breaking the problem into sub problems
Solving the trivial cases
Combining the solutions to the sub problems to solve the original problem
Algorithms based on greedy approach are used for solving optimization problems,
where you need to maximize profits or minimize costs under a given set of conditions.
Some examples of optimization problems are:
Finding the shortest distance from an originating city to a set of destination
cities, given the distances between the pairs of cities.
Finding the minimum number of currency notes required for an amount, where
an arbitrary number of notes for each denomination are available.
Selecting items with maximum value from a given set of items, where the total
weight of the selected items cannot exceed a given value.
Recursion:
Refers to the technique of defining a process in terms of itself
Is used to solve complex programming problems that are repetitive in nature
Is implemented in a program by using a recursive procedure or function. A
recursive procedure or function is a function that invokes itself
Is useful in writing clear, short, and simple programs
Determining the Efficiency of an Algorithm
Factors that affect the efficiency of a program include:
Speed of the machine
Compiler
Operating system
Programming language
Size of the input
In addition to these factors, the way data of a program is organized, and the algorithm
used to solve the problem also has a significant impact on the efficiency of a program.

 Algorithm representation techniques

Design of Algorithms: this consists of the steps a programmer should do before they start
coding the program in a specific language. Proper program design helps other programmers
to maintain the program in the future. There are different ways to represent an algorithm
such as programming language, Natural Language, Pseudo code, and flowchart. The most
commonrepresentations are flow chart and pseudo code
2.2 Computational and asymptotic complexity

Computational and Asymptotic Complexity

Computational Complexity and Asymptotic Complexity are fundamental concepts in computer


science, particularly in the analysis of algorithms. They help us understand how efficient an
algorithm is in terms of time and space resources as the input size grows.

Computational Complexity

This refers to the study of the amount of resources, typically time and space, required by an
algorithm to solve a given problem. It's a measure of how efficient an algorithm is in practice.

 Time Complexity: This measures how much time an algorithm takes to execute as a
function of the input size. It's usually denoted by the letter "T(n)", where "n" is the input
size.
 Space Complexity: This measures how much memory an algorithm uses as a function of
the input size. It's denoted by "S(n)".

Asymptotic Complexity

Asymptotic complexity focuses on the behavior of an algorithm as the input size approaches
infinity. It's a theoretical measure that provides a simplified way to compare algorithms without
worrying about constant factors or lower-order terms.

2.3 Big-O, Ω, Θ, little-o and OO notations

Method for Determining Efficiency

To measure the time efficiency of an algorithm, you can write a program based on the
algorithm, execute it, and measure the time it takes to run.

The execution time that you measure in this case would depend on a number of factors
such as:

Speed of the machine

Compiler

Operating system

Programming language

Input data
However, we would like to determine how the execution time is affected by the nature of
the algorithm.

The execution time of an algorithm is directly proportional to the number of key


comparisons involved in the algorithm and is a function of n, where n is the size of the
input data.

The rate at which the running time of an algorithm increases as a result of an increase in
the volume of input data is called the order of growth of the algorithm.

The order of growth of an algorithm is defined by using the big O notation.

The big O notation has been accepted as a fundamental technique for describing the
efficiency of an algorithm.

The different orders of growth and their corresponding big O notations are:

Constant - O(1)

Logarithmic - O(log n)

Linear - O(n)

Loglinear - O(n log n)

Quadratic - O(n2)

Cubic - O(n3)

Exponential - O(2n), O(10n)

According to their orders of growth, the big O notations can be arranged in an increasing
order as:

O(1) < O(log n) < O(n) < O(n log n) < O(n2) < O(n3) < O(2n) < O(10n)

Big-O Notation (O(n))

 Definition: It provides an upper bound on the growth rate of a function.


 Meaning: The function's growth is at most proportional to another function.
 Notation: f(n) is O(g(n)) if there exist positive constants c and n₀ such that f(n) ≤ cg(n)
for all n ≥ n₀.
 Example: f(n) = 2n² + 3n + 1 is O(n²) because 2n² + 3n + 1 ≤ 3n² for all n ≥ 1.

Understanding O(n²) Notation


 O(n²) means that the function's growth rate is bounded above by a constant multiple of
n². In other words, there exists a constant c and a value n₀ such that f(n) ≤ c * n² for all n
≥ n₀.
 This indicates that the function's growth is at most quadratic, meaning it increases
proportionally to the square of the input size.

Analyzing the Example

1. Choose Constants:
o c = 3: We select a constant c = 3.
o n₀ = 1: We choose a value n₀ = 1.
2. Verify Inequality:
o We need to show that 2n² + 3n + 1 ≤ 3n² for all n ≥ 1.
3. Proof:
o For n ≥ 1, we can add 2n² to both sides of the inequality:
 2n² + 3n + 1 + 2n² ≤ 3n² + 2n²
 4n² + 3n + 1 ≤ 5n²
o Since 5n² ≥ 4n² + 3n + 1 for all n ≥ 1, the inequality holds.

Conclusion

 We have successfully shown that f(n) = 2n² + 3n + 1 ≤ 3n² for all n ≥ 1.


 Therefore, f(n) is O(n²), indicating that its growth rate is bounded above by a quadratic
function.

In simpler terms:

 The example demonstrates that the function's growth is at most quadratic, meaning it
increases proportionally to the square of the input size.
 The higher-order term (2n²) dominates the growth, and the lower-order terms (3n and 1)
become insignificant as n gets larger.

This means that the function's running time will increase at a rate that is proportional to the
square of the input size. For example, if the input size doubles, the running time will increase by
a factor of 4.

Ω Notation is used to provide a lower bound on the growth rate of a function. It indicates that a
function's growth is at least proportional to another function.

Formal Definition: A function f(n) is said to be Ω(g(n)) if there exist positive constants c and n₀
such that f(n) ≥ cg(n) for all n ≥ n₀.

Interpretation:
 Lower Bound: Ω(g(n)) means that f(n) will grow at least as fast as g(n) for sufficiently
large values of n.
 Constant Factor: The constant c allows for a certain degree of flexibility in the
comparison, as the exact growth rate may differ by a constant factor.

Example:

 f(n) = 2n² + 3n + 1 is Ω(n²) because for n ≥ 1, we can choose c = 1 and n₀ = 1, and we


have 2n² + 3n + 1 ≥ 1 * n² for all n ≥ 1.

Common Uses of Ω Notation:

 Best-Case Analysis: Ω notation is often used to analyze the best-case running time of
algorithms.
 Lower Bounds: It helps in understanding the fundamental limitations of certain
problems or algorithms.
 Algorithm Comparison: It can be used to compare the lower bounds of different
algorithms and identify the most efficient ones.

In Summary: Ω notation is a valuable tool for analyzing the growth rate of functions and
understanding the lower bounds of algorithms. It provides a useful perspective on the
performance of algorithms and helps in making informed decisions about their suitability for
different applications.

Θ Notation

Θ Notation provides both an upper bound and a lower bound on the growth rate of a function.
It indicates that a function's growth is exactly proportional to another function.

Formal Definition: A function f(n) is said to be Θ(g(n)) if there exist positive constants c₁, c₂,
and n₀ such that c₁g(n) ≤ f(n) ≤ c₂g(n) for all n ≥ n₀.

Interpretation:

 Tight Bound: Θ(g(n)) means that f(n) grows neither faster nor slower than g(n) by more
than a constant factor for sufficiently large values of n.
 Exact Proportionality: The constants c₁ and c₂ ensure that f(n) is tightly bounded by
g(n) within a constant multiplicative factor.

Example:

 f(n) = 2n² + 3n + 1 is Θ(n²) because for n ≥ 1, we can choose c₁ = 1, c₂ = 6, and n₀ = 1,


and we have 1 * n² ≤ 2n² + 3n + 1 ≤ 6 * n² for all n ≥ 1.

Common Uses of Θ Notation:


 Average-Case Analysis: Θ notation is often used to analyze the average-case running
time of algorithms.
 Exact Growth Rates: It provides a precise characterization of the growth rate of
functions.
 Algorithm Comparison: It can be used to directly compare the growth rates of different
algorithms.

In Summary: Θ notation is a powerful tool for analyzing the growth rate of functions and
understanding the exact behavior of algorithms. It provides a precise and informative way to
compare the efficiency of different algorithms and make informed decisions about their
suitability for various applications.

Little-o Notation (o(n))

Little-o Notation (o(n)) is used to indicate that a function grows strictly slower than another
function. It implies that the function's growth is asymptotically negligible compared to the other
function.

Formal Definition: A function f(n) is said to be o(g(n)) if for any positive constant c, there
exists a positive integer n₀ such that f(n) < cg(n) for all n ≥ n₀.

Interpretation:

 Negligible Growth: o(g(n)) means that f(n) grows so much slower than g(n) that the
constant factor c can always be made small enough to ensure that f(n) is strictly less than
cg(n) for sufficiently large values of n.
 Dominance: g(n) dominates f(n) in terms of growth rate.

Example:

 f(n) = n is o(n²) because for any c > 0, we can choose n₀ = ⌈1/c⌉, and we have n < cn² for
all n ≥ n₀.

Common Uses of Little-o Notation:

 Asymptotic Negligibility: It helps in identifying functions that are asymptotically


negligible compared to others.
 Algorithm Analysis: It can be used to analyze the asymptotic behavior of algorithms and
identify the dominant terms in their running time.
 Complexity Classes: It is used in defining complexity classes such as NC (Nick's Class)
and AC (Alternating Complexity).

In Summary: Little-o notation is a valuable tool for analyzing the growth rate of functions and
understanding the relative dominance of functions. It provides a precise way to describe the
asymptotic behavior of functions and is essential for understanding the complexity of algorithms
and data structures.
OO Notation

OO Notation is a less commonly used notation that indicates that a function grows strictly
faster than another function. It implies that the function's growth is asymptotically dominant
compared to the other function.

Formal Definition: A function f(n) is said to be OO(g(n)) if for any positive constant c, there
exists a positive integer n₀ such that f(n) > cg(n) for all n ≥ n₀.

Interpretation:

 Dominant Growth: OO(g(n)) means that f(n) grows so much faster than g(n) that the
constant factor c can always be made small enough to ensure that f(n) is strictly greater
than cg(n) for sufficiently large values of n.
 Dominance: f(n) dominates g(n) in terms of growth rate.

Example:

 f(n) = n² is OO(n) because for any c > 0, we can choose n₀ = ⌈1/c⌉, and we have n² > cn
for all n ≥ n₀.

Common Uses of OO Notation:

 Asymptotic Dominance: It helps in identifying functions that are asymptotically


dominant compared to others.
 Algorithm Analysis: It can be used to analyze the asymptotic behavior of algorithms and
identify the dominant terms in their running time.
 Complexity Classes: It is used in defining complexity classes and understanding the
relative growth rates of functions.

In Summary: OO notation is a less frequently used notation compared to Big-O, Big-Theta, and
Big-Omega. It provides a way to express the asymptotic dominance of one function over another,
and can be useful in certain contexts where a precise understanding of the relative growth rates
of functions is required.

Key Points:

 Big-O, Ω, and Θ notations are used to describe the asymptotic behavior of functions, while
little-o and OO notations are used to describe the relative growth rates of functions.
 Big-O is often used to analyze the worst-case time complexity of algorithms, while Ω is
used to analyze the best-case time complexity.
 Θ notation is used to describe the tight bound on the growth rate of a function.
 Little-o is used to indicate that a function is asymptotically negligible compared to another
function, and OO is used to indicate that a function is asymptotically dominant compared
to another function.
Common Complexity Classes

Complexity classes are fundamental categories used to classify problems based on the resources
required to solve them. Here are some of the most common complexity classes:

Time Complexity Classes

 P (Polynomial Time): Problems that can be solved by a deterministic algorithm in


polynomial time. These problems are considered to be efficiently solvable. Examples
include sorting algorithms, matrix multiplication, and shortest path problems.
 NP (Non-deterministic Polynomial Time): Problems where a solution can be verified in
polynomial time, but finding a solution is believed to be difficult. Many optimization and
decision problems fall into this category, such as the traveling salesman problem, the
knapsack problem, and satisfiability (SAT).
 NP-complete: The hardest problems within NP. If a polynomial-time algorithm were
found for any NP-complete problem, it would mean that all problems in NP could be solved
in polynomial time. Examples include the 3-SAT problem and the clique problem.
 NP-hard: Problems that are at least as hard as any problem in NP. They may or may not
be in NP themselves. Examples include the halting problem and the subset sum problem.
 EXPTIME: Problems that can be solved by a deterministic algorithm in exponential time.
These problems are considered to be computationally intractable for large inputs.

Space Complexity Classes

 L (Logarithmic Space): Problems that can be solved by a deterministic algorithm using


only a logarithmic amount of space. These problems are considered to be very space-
efficient.
 NL (Non-deterministic Logarithmic Space): Problems where a solution can be verified
in logarithmic space, but finding a solution is believed to be difficult.
 PSPACE (Polynomial Space): Problems that can be solved by a deterministic algorithm
using polynomial space. These problems are considered to be relatively space-efficient.

Best, average and worst-case complexity

When analyzing the performance of an algorithm, we often consider three different scenarios:

1. Best-Case Complexity

This refers to the minimum amount of time or space the algorithm will take to complete its task.
It's usually the most optimistic scenario and may not be representative of typical performance.

Example: In bubble sort, the best-case scenario occurs when the input array is already sorted.
In this case, the algorithm only needs to make one pass through the array to determine that it's
sorted, resulting in a time complexity of O(n).
2. Average-Case Complexity

This refers to the typical amount of time or space the algorithm will take to complete its task. It's
often calculated by analyzing the performance over a large number of random inputs.

Example: In quicksort, the average-case complexity is O(n log n). This assumes that the pivot
element chosen at each partitioning step is a random element of the array.

3. Worst-Case Complexity

This refers to the maximum amount of time or space the algorithm will take to complete its task.
It's usually the most pessimistic scenario and is often used to determine the algorithm's suitability
for large datasets.

Example: In bubble sort, the worst-case scenario occurs when the input array is sorted in
reverse order. In this case, the algorithm needs to make n-1 passes through the array, resulting in
a time complexity of O(n²).

Amortized Complexity

Amortized complexity is a method of analyzing the average-case running time of a sequence of


operations on a data structure. It's particularly useful when individual operations can have widely
varying costs, but the average cost over a sequence of operations is more predictable.

Key Idea:

 Instead of analyzing the cost of each individual operation, we analyze the average cost of
a sequence of operations.
 If most operations have a low cost, and only a few operations have a high cost, the
average cost can be significantly lower than the worst-case cost.

Common Amortized Analysis Techniques:

1. Accounting Method:
o Assign a potential to each data structure element.
o For each operation, analyze its actual cost and the change in potential.
o If the actual cost plus the increase in potential is always non-positive, the
amortized cost is at most the initial potential.
2. Aggregate Method:
o Analyze the total cost of a sequence of n operations.
o If the total cost is bounded by a function f(n), the amortized cost of each operation
is f(n)/n.

Example: Dynamic Array


 Push: If there's enough space, it's a constant-time operation. If the array is full, a resizing
operation is needed, which takes linear time.
 Worst-case: A sequence of n pushes could result in n resize operations, leading to a total
cost of O(n²).
 Amortized Analysis: Using the aggregate method, we can show that a sequence of n
pushes has a total cost of O(n). This means the amortized cost of each push is O(1).

You might also like