0% found this document useful (0 votes)
9 views15 pages

UNIT I

This document provides an introduction to data structures and algorithms, emphasizing the importance of efficient programming through proper data management. It categorizes data structures into primitive and non-primitive types, detailing linear and non-linear structures such as arrays, linked lists, stacks, queues, trees, and graphs. Additionally, it discusses algorithm design approaches, control structures, and operations performed on data structures.

Uploaded by

edwinsel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

UNIT I

This document provides an introduction to data structures and algorithms, emphasizing the importance of efficient programming through proper data management. It categorizes data structures into primitive and non-primitive types, detailing linear and non-linear structures such as arrays, linked lists, stacks, queues, trees, and graphs. Additionally, it discusses algorithm design approaches, control structures, and operations performed on data structures.

Uploaded by

edwinsel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT I

Introduction to Data Structures and Algorithms


A programming language is a set of instructions and syntax used to create software programs.
a good program is defined as a program that
• runs correctly
• is easy to read and understand
• is easy to debug and
• is easy to modify.
A program should undoubtedly give correct results, but along with that it should also run
efficiently. A program is said to be efficient when it executes in minimum time and with
minimum memory space. In order to write efficient programs we need to apply certain data
management concepts.
The concept of data management is a complex task that includes activities like data collection,
organization of data into appropriate structures, and developing and maintaining routines for
quality assurance. Data structure is a crucial part of data management.
A data structure is basically a group of data elements that are put together under one name, and
which defines a particular way of storing and organizing data in a computer so that it can be
used efficiently.
Data structures are used in almost every program or software system. Some common examples
of data structures are arrays, linked lists, queues, stacks, binary trees, and hash tables. Data
structures are widely applied in the following areas:
• Compiler design
• Operating system
• Statistical analysis package
• Numerical analysis
• Artificial intelligence
• DBMS
• Simulation
• Graphics
Data structures are building blocks of a program. A program built using improper data
structures may not work as expected. So as a programmer it is mandatory to choose most
appropriate data structures for a program.
The term data means a value or set of values. It specifies either the value of a variable or a
constant (e.g., marks of students, name of an employee, address of a customer, value of pi,
etc.).
While a data item that does not have subordinate data items is categorized as an elementary
item, the one that is composed of one or more subordinate data items is called a group item.
For example, a student’s name may be divided into three sub-items—first name, middle name,
and last name—but his roll number would normally be treated as a single item.
A record is a collection of data items. For example, the name, address, course, and marks
obtained are individual data items. But all these data items can be grouped together to form a
record.
A file is a collection of related records. For example, if there are 60 students in a class, then
there are 60 records of the students. All these related records are stored in a file. Similarly, we
can have a file of all the employees working in an organization, a file of all the customers of a
company, a file of all the suppliers, so on and so forth.

CLASSIFICATION OF DATA STRUCTURES


Data structures are generally categorized into two classes: primitive and non-primitive data
structures.
Primitive and Non-primitive Data Structures
Primitive data structures are the fundamental data types which are supported by a
programming language. Some basic data types are integer, real, character, and boolean. The
terms ‘data type’, ‘basic data type’, and ‘primitive data type’ are often used interchangeably.
Non-primitive data structures are those data structures which are created using primitive data
structures. Examples of such data structures include linked lists, stacks, trees, and graphs.
Non-primitive data structures can further be classified into two categories: linear and non-
linear data structures.
Linear and Non-linear Structures
If the elements of a data structure are stored in a linear or sequential order, then it is a linear
data structure. Examples include arrays, linked lists, stacks, and queues. Linear data structures
can be represented in memory in two different ways. One way is to have to a linear relationship
between elements by means of sequential memory locations. The other way is to have a linear
relationship between elements by means of links.
However, if the elements of a data structure are not stored in a sequential order, then it is a non-
linear data structure. The relationship of adjacency is not maintained between elements of a
non-linear data structure. Examples include trees and graphs.
Arrays
An array is a collection of similar data elements. These data elements have the same data type.
The elements of the array are stored in consecutive memory locations and are referenced by an
index (also known as the subscript).
In C, arrays are declared using the following syntax:
type name[size];
For example,
int marks[10];
The above statement declares an array marks that contains 10 elements. In C, the array index
starts from zero. This means that the array marks will contain 10 elements in all. The first
element will be stored in marks[0], second element in marks[1], so on and so forth. Therefore,
the last element, that is the 10th element, will be stored in marks[9]. In the memory, the array
will be stored as shown in Fig. 2.1.
Arrays are generally used when we want to store large amount of similar type of data. But they
have the following limitations:
• Arrays are of fixed size.
• Data elements are stored in contiguous memory locations which may not be always
available.
• Insertion and deletion of elements can be problematic because of shifting of elements
from their positions.
Linked Lists
A linked list is a very flexible, dynamic data structure in which elements (called nodes) form a
sequential list. In contrast to static arrays, a programmer need not worry about how many
elements will be stored in the linked list. This feature enables the programmers to write robust
programs which require less maintenance.
In a linked list, each node is allocated space as it is added to the list. Every node in the list
points to the next node in the list. Therefore, in a linked list, every node contains the following
two types of data:
• The value of the node or any other data that corresponds to that node
• A pointer or link to the next node in the list
The last node in the list contains a NULL pointer to indicate that it is the end or tail of the list.
Since the memory for a node is dynamically allocated when it is added to the list, the total
number of nodes that may be added to a list is limited only by the amount of memory available.
Figure 2.2 shows a linked list of seven nodes.

Stack
A stack is a linear data structure in which insertion and deletion of elements are done at only
one end, which is known as the top of the stack. Stack is called a last-in, first-out (LIFO)
structure because the last element which is added to the stack is the first element which is
deleted from the stack.
In the computer’s memory, stacks can be implemented using arrays or linked lists. Figure 2.3
shows the array implementation of a stack. Every stack has a variable top associated with it.
top is used to store the address of the topmost element of the stack. It is this position from
where the element will be added or deleted. There is another variable MAX, which is used to
store the maximum number of elements that the stack can store.
If top = NULL, then it indicates that the stack is empty and if top = MAX–1, then the stack is
full.

In Fig. 2.3, top = 4, so insertions and deletions will be done at this position. Here, the stack can
store a maximum of 10 elements where the indices range from 0–9. In the above stack, five
more elements can still be stored.
A stack supports three basic operations: push, pop, and peep.
• The push operation adds an element to the top of the stack.
• The pop operation removes the element from the top of the stack.
• And the peep operation returns the value of the topmost element of the stack (without
deleting it).
Queues
A queue is a first-in, first-out (FIFO) data structure in which the element that is inserted first is
the first one to be taken out. The elements in a queue are added at one end called the rear and
removed from the other end called the front. Like stacks, queues can be implemented by using
either arrays or linked lists.
Every queue has front and rear variables that point to the position from where deletions and
insertions can be done, respectively. Consider the queue shown in Fig. 2.4

Here, front = 0 and rear = 5. If we want to add one more value to the list, say, if we want to add
another element with the value 45, then the rear would be incremented by 1 and the value would
be stored at the position pointed by the rear.
• A queue is full when rear = MAX – 1, where MAX is the size of the queue, that is MAX
specifies the maximum number of elements in the queue.
• Note that we have written MAX – 1 because the index starts from 0.
• Similarly, before deleting an element from the queue, we must check for underflow
conditions.
• An underflow condition occurs when we try to delete an element from a queue that is
already empty.
• If front = NULL and rear = NULL, then there is no element in the queue.
Trees
A tree is a non-linear data structure which consists of a collection of nodes arranged in a
hierarchical order. One of the nodes is designated as the root node, and the remaining nodes
can be partitioned into disjoint sets such that each set is a sub-tree of the root.
The simplest form of a tree is a binary tree. A binary tree consists of a root node and left and
right sub-trees, where both sub-trees are also binary trees. Each node contains a data element,
a left pointer which points to the left sub-tree, and a right pointer which points to the right sub-
tree. The root element is the topmost node which is pointed by a ‘root’ pointer. If root = NULL
then the tree is empty.
Figure 2.7 shows a binary tree, where R is the root node and T1 and T2 are the left and right
sub trees of R. If T1 is non-empty, then T1 is said to be the left successor of R. Likewise, if T2
is non-empty, then it is called the right successor of R.

In Fig. 2.7, node 2 is the left child and node 3 is the right child of the root node 1. Note that
the left sub-tree of the root node consists of the nodes 2, 4, 5, 8, and 9. Similarly, the right
sub-tree of the root node consists of the nodes 3, 6, 7, 10, 11, and 12.
Graphs
A graph is a non-linear data structure which is a collection of vertices (also called nodes) and
edges that connect these vertices. A graph is often viewed as a generalization of the tree
structure, where instead of a purely parent-to-child relationship between tree nodes, any kind
of complex relationships between the nodes can exist.
In a tree structure, nodes can have any number of children but only one parent, a graph on the
other hand relaxes all such kinds of restrictions. Figure 2.8 shows a graph with five nodes.
A node in the graph may represent a city and the edges connecting the nodes can represent
roads. A graph can also be used to represent a computer network where the nodes are
workstations and the edges are the network connections.
OPERATIONS ON DATA STRUCTURES
The different operations that can be performed on the various data structures previously
mentioned are given below.
Traversing It means to access each data item exactly once so that it can be processed. For
example, to print the names of all the students in a class.
Searching It is used to find the location of one or more data items that satisfy the given
constraint. Such a data item may or may not be present in the given collection of data items.
For example, to find the names of all the students who secured 100 marks in mathematics.
Inserting It is used to add new data items to the given list of data items. For example, to add
the details of a new student who has recently joined the course. Deleting It means to remove
(delete) a particular data item from the given collection of data items. For example, to delete
the name of a student who has left the course.
Sorting Data items can be arranged in some order like ascending order or descending order
depending on the type of application. For example, arranging the names of students in a class
in an alphabetical order, or calculating the top three winners by arranging the participants’
scores in descending order and then extracting the top three.
Merging Lists of two sorted data items can be combined to form a single list of sorted data
items.
ABSTRACT DATA TYPE
Data type: Data type of a variable is the set of values that the variable can take. We have already
read the basic data types in C include int, char, float, and double.
When we talk about a primitive type (built-in data type), we actually consider two things: a
data item with certain characteristics and the permissible operations on that data. For example,
an int variable can contain any whole-number value from –32768 to 32767 and can be operated
with the operators +, –, *, and /.
Abstract The word ‘abstract’ in the context of data structures means considered apart from the
detailed specifications or implementation.
In C, an abstract data type can be a structure considered without regard to its implementation.
It can be thought of as a ‘description’ of the data in the structure with a list of operations that
can be performed on the data within that structure.
For example, when we use a stack or a queue, the user is concerned only with the type of data
and the operations that can be performed on it.

Fundamentals of Algorithmic problem solving


ALGORITHMS
The typical definition of algorithm is ‘a formally defined procedure for performing some
calculation’. If a procedure is formally defined, then it can be implemented using a formal
language, and such a language is known as a programming language.
In general terms, an algorithm provides a blueprint to write a program to solve a particular
problem. It is considered to be an effective procedure for solving a problem in finite number
of steps.
DIFFERENT APPROACHES TO DESIGNING AN ALGORITHM
Algorithms are used to manipulate the data contained in data structures. When working with
data structures, algorithms are used to perform operations on the stored data.
A complex algorithm is often divided into smaller units called modules. This process of
dividing an algorithm into modules is called modularization. The key advantages of
modularization are as follows:
• It makes the complex algorithm simpler to design and implement.
• Each module can be designed independently. While designing one module, the details
of other modules can be ignored, thereby enhancing clarity in design which in turn
simplifies implementation, debugging, testing, documenting, and maintenance of the
overall algorithm.
There are two main approaches to design an algorithm—top-down approach and bottom-up
approach, as shown in Fig. 2.9.

Top-down approach: A top-down design approach starts by dividing the complex algorithm
into one or more modules. These modules can further be decomposed into one or more sub-
modules, and this process of decomposition is iterated until the desired level of module
complexity is achieved. Top-down design method is a form of stepwise refinement where we
begin with the topmost module and incrementally add modules that it calls.
Bottom-up approach A bottom-up approach is just the reverse of top-down approach. In the
bottom-up design, we start with designing the most basic or concrete modules and then proceed
towards designing higher level modules. The higher level modules are implemented by using
the operations performed by lower level modules. Thus, in this approach sub-modules are
grouped together to form a higher level module. All the higher level modules are clubbed
together to form even higher level modules. This process is repeated until the design of the
complete algorithm is obtained.

CONTROL STRUCTURES USED IN ALGORITHMS


• An algorithm has a finite number of steps. Some steps may involve decision-making
and repetition.
• An algorithm may employ one of the following control structures: (a) sequence, (b)
decision, and (c) repetition.
Sequence
By sequence, we mean that each step of an algorithm is executed in a specified order. Let us
write an algorithm to add two numbers. This algorithm performs the steps in a purely sequential
order, as shown in Fig. 2.10.

Decision
Decision statements are used when the execution of a process depends on the outcome of
some condition. For example, if x = y, then print EQUAL. So the general form of IF construct
can be given as:
IF condition Then process
A condition in this context is any statement that may evaluate to either a true value or a false
value. In the above example, a variable x can be either equal to y or not equal to y. However,
it cannot be both true and false. If the condition is true, then the process is executed.
A decision statement can also be stated in the following manner:
IF condition
Then process1
ELSE
process2

Repetition
Repetition, which involves executing one or more steps for a number of times, can be
implemented using constructs such as while, do–while, and for loops. These loops execute one
or more steps until some condition is true.
Figure 2.12 shows an algorithm that prints the first 10 natural numbers.
TIME AND SPACE COMPLEXITY
• Analysing an algorithm means determining the amount of resources (such as time and
memory) needed to execute it.
• Algorithms are generally designed to work with an arbitrary number of inputs, so the
efficiency or complexity of an algorithm is stated in terms of time and space complexity.
• The time complexity of an algorithm is basically the running time of a program as a
function of the input size.
• Similarly, the space complexity of an algorithm is the amount of computer memory that
is required during the program execution as a function of the input size.
Worst-case, Average-case, Best-case, and Amortized Time Complexity
Worst-case running time: This denotes the behaviour of an algorithm with respect to the worst
possible case of the input instance. The worst-case running time of an algorithm is an upper
bound on the running time for any input. Therefore, having the knowledge of worst-case
running time gives us an assurance that the algorithm will never go beyond this time limit.
Average-case running time: The average-case running time of an algorithm is an estimate of
the running time for an ‘average’ input. It specifies the expected behaviour of the algorithm
when the input is randomly drawn from a given distribution. Average-case running time
assumes that all inputs of a given size are equally likely.
Best-case running time: The term ‘best-case performance’ is used to analyse an algorithm under
optimal conditions. For example, the best case for a simple linear search on an array occurs
when the desired element is the first in the list. However, while developing and choosing an
algorithm to solve a problem, we hardly base our decision on the best-case performance. It is
always recommended to improve the average performance and the worst-case performance of
an algorithm.
Amortized running time: Amortized running time refers to the time required to perform a
sequence of (related) operations averaged over all the operations performed. Amortized
analysis guarantees the average performance of each operation in the worst case.

Algorithm Efficiency
• If a function is linear (without any loops or recursions), the efficiency of that algorithm
or the running time of that algorithm can be given as the number of instructions it
contains.
• However, if an algorithm contains loops, then the efficiency of that algorithm may vary
depending on the number of loops and the running time of each loop in the algorithm.
Linear Loops
To calculate the efficiency of an algorithm that has a single loop, we need to first determine the
number of times the statements in the loop will be executed.
This is because the number of iterations is directly proportional to the loop factor. Greater the
loop factor, more is the number of iterations.
For example, consider the loop given below:
for(i=0;i<100;i++)
statement block;
Here, 100 is the loop factor. We have already said that efficiency is directly proportional to the
number of iterations. Hence, the general formula in the case of linear loops may be given as
f(n) = n

Logarithmic Loops
We have seen that in linear loops, the loop updation statement either adds or subtracts the loop-
controlling variable. However, in logarithmic loops, the loop-controlling variable is either
multiplied or divided during each iteration of the loop. For example, look at the loops given
below:
for(i=1;i<1000; i*=2)
statement block;
for(i=1;i=1;i/=2)
statement block;
• Consider the first for loop in which the loop-controlling variable i is multiplied by 2.
The loop will be executed only 10 times and not 1000 times because in each iteration
the value of i doubles.
• Now, consider the second loop in which the loop-controlling variable i is divided by 2.
In this case also, the loop will be executed 10 times.
• Thus, the number of iterations is a function of the number by which the loop-controlling
variable is divided or multiplied. In the examples discussed, it is 2. That is, when n =
1000, the number of iterations can be given by log 1000 which is approximately equal
to 10.
• Therefore, putting this analysis in general terms, we can conclude that the efficiency of
loops in which iterations divide or multiply the loop-controlling variables can be given
as
f(n) = log n
Arrays
An array is a collection of similar data elements. These data elements have the same data type.
The elements of the array are stored in consecutive memory locations and are referenced by an
index (also known as the subscript). The subscript is an ordinal number which is used to
identify an element of the array.
DECLARATION OF ARRAYS
We have already seen that every variable must be declared before it is used. The same concept
holds true for array variables. An array must be declared before being used. Declaring an array
means specifying the following:
• Data type - the kind of values it can store, for example, int, char, float, double.
• Name - to identify the array.
• Size - the maximum number of values that the array can hold.
Arrays are declared using the following syntax:
type name[size];
The type can be either int, float, double, char, or any other valid data type. The number within
brackets indicates the size of the array, i.e., the maximum number of elements that can be stored
in the array. For example, if we write,
int marks[10];
then the statement declares marks to be an array containing 10 elements. In C, the array index
starts from zero.
• The first element will be stored in marks[0], second element in marks[1], and so on.
• Therefore, the last element, that is the 10th element, will be stored in marks[9].
• Note that 0, 1, 2, 3 written within square brackets are the subscripts. In the memory, the
array will be stored as shown in Fig. 3.2.

Figure 3.3 shows how different types of arrays are declared.


ACCESSING THE ELEMENTS OF AN ARRAY
To access all the elements, we must use a loop. That is, we can access all the elements of an
array by varying the value of the subscript into the array. As shown in Fig. 3.2, the first element
of the array marks[10] can be accessed by writing marks[0]. Now to process all the elements
of the array, we use a loop as shown in Fig. 3.4.

Figure 3.5 shows the result of the code shown in Fig. 3.4. The code accesses every individual
element of the array and sets its value to –1.
In the for loop, first the value of marks[0] is set to –1, then the value of the index (i) is
incremented and the next value, that is, marks[1] is set to –1.
The procedure continues until all the 10 elements of the array are set to –1.

Programming Examples
#include <stdio.h>
#include <conio.h>
int main()
{
int i, n, arr[20];
clrscr();
printf("\n Enter the number of elements in the array : ");
scanf("%d", &n);
for(i=0;i<n;i++)
{
printf("\n arr[%d] = ", i);
scanf("%d",&arr[i]);
}
printf("\n The array elements are ");
for(i=0;i<n;i++)
printf("\t %d", arr[i]);
return 0;
}

Output
Enter the number of elements in the array : 5
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
arr[4] = 5
The array elements are 1 2 3 4 5
TWO-DIMENSIONAL ARRAYS
A two-dimensional array is specified using two subscripts where the first subscript denotes
the row and the second denotes the column. The C compiler treats a two-dimensional array as
an array of one-dimensional arrays. Figure 3.26 shows a two-dimensional array which can be
viewed as an array of arrays.

Declaring Two-dimensional arrays


Any array must be declared before being used. The declaration statement tells the compiler the
name of the array, the data type of each element in the array, and the size of each dimension. A
two-dimensional array is declared as:
data_type array_name[row_size][column_size];
Therefore, a two-dimensional m ¥ n array is an array that contains m ¥ n data elements and
each element is accessed using two subscripts, i and j, where i <= m and j <= n.
For example, if we want to store the marks obtained by three students in five different
subjects, we can declare a two dimensional array as:
int marks[3][5];
In the above statement, a two-dimensional array called marks has been declared that has m(3)
rows and n(5) columns. The first element of the array is denoted by marks[0][0], the second
element as marks[0][1], and so on. Here, marks[0][0] stores the marks obtained by the first
student in the first subject, marks[1][0] stores the marks obtained by the second student in the
first subject.
The pictorial form of a two-dimensional array is shown in Fig. 3.27.
MULTI-DIMENSIONAL ARRAYS
A multi-dimensional array in simple terms is an array of arrays. As we have one index in a one
dimensional array, two indices in a two-dimensional array, in the same way, we have n indices
in an n-dimensional array or multi-dimensional array.
• Conversely, an n–dimensional array is specified using n indices.
• An n-dimensional m1 x m2 x m3 x …… x mn array is a collection of m1 x m2 x m3 x
…… x mn elements.
• In a multi-dimensional array, a particular element is specified by using n subscripts as
A[I1][I2 ][I3 ]...[In], where
I1 < = M1 , I2 < = M2 , I3 < = M3 , ... In < = Mn
A multi-dimensional array can contain as many indices as needed and as the requirement of
memory increases with the number of indices used.
Figure 3.33 shows a three-dimensional array. The array has three pages, four rows, and two
columns.

You might also like