UNIT I
UNIT I
Stack
A stack is a linear data structure in which insertion and deletion of elements are done at only
one end, which is known as the top of the stack. Stack is called a last-in, first-out (LIFO)
structure because the last element which is added to the stack is the first element which is
deleted from the stack.
In the computer’s memory, stacks can be implemented using arrays or linked lists. Figure 2.3
shows the array implementation of a stack. Every stack has a variable top associated with it.
top is used to store the address of the topmost element of the stack. It is this position from
where the element will be added or deleted. There is another variable MAX, which is used to
store the maximum number of elements that the stack can store.
If top = NULL, then it indicates that the stack is empty and if top = MAX–1, then the stack is
full.
In Fig. 2.3, top = 4, so insertions and deletions will be done at this position. Here, the stack can
store a maximum of 10 elements where the indices range from 0–9. In the above stack, five
more elements can still be stored.
A stack supports three basic operations: push, pop, and peep.
• The push operation adds an element to the top of the stack.
• The pop operation removes the element from the top of the stack.
• And the peep operation returns the value of the topmost element of the stack (without
deleting it).
Queues
A queue is a first-in, first-out (FIFO) data structure in which the element that is inserted first is
the first one to be taken out. The elements in a queue are added at one end called the rear and
removed from the other end called the front. Like stacks, queues can be implemented by using
either arrays or linked lists.
Every queue has front and rear variables that point to the position from where deletions and
insertions can be done, respectively. Consider the queue shown in Fig. 2.4
Here, front = 0 and rear = 5. If we want to add one more value to the list, say, if we want to add
another element with the value 45, then the rear would be incremented by 1 and the value would
be stored at the position pointed by the rear.
• A queue is full when rear = MAX – 1, where MAX is the size of the queue, that is MAX
specifies the maximum number of elements in the queue.
• Note that we have written MAX – 1 because the index starts from 0.
• Similarly, before deleting an element from the queue, we must check for underflow
conditions.
• An underflow condition occurs when we try to delete an element from a queue that is
already empty.
• If front = NULL and rear = NULL, then there is no element in the queue.
Trees
A tree is a non-linear data structure which consists of a collection of nodes arranged in a
hierarchical order. One of the nodes is designated as the root node, and the remaining nodes
can be partitioned into disjoint sets such that each set is a sub-tree of the root.
The simplest form of a tree is a binary tree. A binary tree consists of a root node and left and
right sub-trees, where both sub-trees are also binary trees. Each node contains a data element,
a left pointer which points to the left sub-tree, and a right pointer which points to the right sub-
tree. The root element is the topmost node which is pointed by a ‘root’ pointer. If root = NULL
then the tree is empty.
Figure 2.7 shows a binary tree, where R is the root node and T1 and T2 are the left and right
sub trees of R. If T1 is non-empty, then T1 is said to be the left successor of R. Likewise, if T2
is non-empty, then it is called the right successor of R.
In Fig. 2.7, node 2 is the left child and node 3 is the right child of the root node 1. Note that
the left sub-tree of the root node consists of the nodes 2, 4, 5, 8, and 9. Similarly, the right
sub-tree of the root node consists of the nodes 3, 6, 7, 10, 11, and 12.
Graphs
A graph is a non-linear data structure which is a collection of vertices (also called nodes) and
edges that connect these vertices. A graph is often viewed as a generalization of the tree
structure, where instead of a purely parent-to-child relationship between tree nodes, any kind
of complex relationships between the nodes can exist.
In a tree structure, nodes can have any number of children but only one parent, a graph on the
other hand relaxes all such kinds of restrictions. Figure 2.8 shows a graph with five nodes.
A node in the graph may represent a city and the edges connecting the nodes can represent
roads. A graph can also be used to represent a computer network where the nodes are
workstations and the edges are the network connections.
OPERATIONS ON DATA STRUCTURES
The different operations that can be performed on the various data structures previously
mentioned are given below.
Traversing It means to access each data item exactly once so that it can be processed. For
example, to print the names of all the students in a class.
Searching It is used to find the location of one or more data items that satisfy the given
constraint. Such a data item may or may not be present in the given collection of data items.
For example, to find the names of all the students who secured 100 marks in mathematics.
Inserting It is used to add new data items to the given list of data items. For example, to add
the details of a new student who has recently joined the course. Deleting It means to remove
(delete) a particular data item from the given collection of data items. For example, to delete
the name of a student who has left the course.
Sorting Data items can be arranged in some order like ascending order or descending order
depending on the type of application. For example, arranging the names of students in a class
in an alphabetical order, or calculating the top three winners by arranging the participants’
scores in descending order and then extracting the top three.
Merging Lists of two sorted data items can be combined to form a single list of sorted data
items.
ABSTRACT DATA TYPE
Data type: Data type of a variable is the set of values that the variable can take. We have already
read the basic data types in C include int, char, float, and double.
When we talk about a primitive type (built-in data type), we actually consider two things: a
data item with certain characteristics and the permissible operations on that data. For example,
an int variable can contain any whole-number value from –32768 to 32767 and can be operated
with the operators +, –, *, and /.
Abstract The word ‘abstract’ in the context of data structures means considered apart from the
detailed specifications or implementation.
In C, an abstract data type can be a structure considered without regard to its implementation.
It can be thought of as a ‘description’ of the data in the structure with a list of operations that
can be performed on the data within that structure.
For example, when we use a stack or a queue, the user is concerned only with the type of data
and the operations that can be performed on it.
Top-down approach: A top-down design approach starts by dividing the complex algorithm
into one or more modules. These modules can further be decomposed into one or more sub-
modules, and this process of decomposition is iterated until the desired level of module
complexity is achieved. Top-down design method is a form of stepwise refinement where we
begin with the topmost module and incrementally add modules that it calls.
Bottom-up approach A bottom-up approach is just the reverse of top-down approach. In the
bottom-up design, we start with designing the most basic or concrete modules and then proceed
towards designing higher level modules. The higher level modules are implemented by using
the operations performed by lower level modules. Thus, in this approach sub-modules are
grouped together to form a higher level module. All the higher level modules are clubbed
together to form even higher level modules. This process is repeated until the design of the
complete algorithm is obtained.
Decision
Decision statements are used when the execution of a process depends on the outcome of
some condition. For example, if x = y, then print EQUAL. So the general form of IF construct
can be given as:
IF condition Then process
A condition in this context is any statement that may evaluate to either a true value or a false
value. In the above example, a variable x can be either equal to y or not equal to y. However,
it cannot be both true and false. If the condition is true, then the process is executed.
A decision statement can also be stated in the following manner:
IF condition
Then process1
ELSE
process2
Repetition
Repetition, which involves executing one or more steps for a number of times, can be
implemented using constructs such as while, do–while, and for loops. These loops execute one
or more steps until some condition is true.
Figure 2.12 shows an algorithm that prints the first 10 natural numbers.
TIME AND SPACE COMPLEXITY
• Analysing an algorithm means determining the amount of resources (such as time and
memory) needed to execute it.
• Algorithms are generally designed to work with an arbitrary number of inputs, so the
efficiency or complexity of an algorithm is stated in terms of time and space complexity.
• The time complexity of an algorithm is basically the running time of a program as a
function of the input size.
• Similarly, the space complexity of an algorithm is the amount of computer memory that
is required during the program execution as a function of the input size.
Worst-case, Average-case, Best-case, and Amortized Time Complexity
Worst-case running time: This denotes the behaviour of an algorithm with respect to the worst
possible case of the input instance. The worst-case running time of an algorithm is an upper
bound on the running time for any input. Therefore, having the knowledge of worst-case
running time gives us an assurance that the algorithm will never go beyond this time limit.
Average-case running time: The average-case running time of an algorithm is an estimate of
the running time for an ‘average’ input. It specifies the expected behaviour of the algorithm
when the input is randomly drawn from a given distribution. Average-case running time
assumes that all inputs of a given size are equally likely.
Best-case running time: The term ‘best-case performance’ is used to analyse an algorithm under
optimal conditions. For example, the best case for a simple linear search on an array occurs
when the desired element is the first in the list. However, while developing and choosing an
algorithm to solve a problem, we hardly base our decision on the best-case performance. It is
always recommended to improve the average performance and the worst-case performance of
an algorithm.
Amortized running time: Amortized running time refers to the time required to perform a
sequence of (related) operations averaged over all the operations performed. Amortized
analysis guarantees the average performance of each operation in the worst case.
Algorithm Efficiency
• If a function is linear (without any loops or recursions), the efficiency of that algorithm
or the running time of that algorithm can be given as the number of instructions it
contains.
• However, if an algorithm contains loops, then the efficiency of that algorithm may vary
depending on the number of loops and the running time of each loop in the algorithm.
Linear Loops
To calculate the efficiency of an algorithm that has a single loop, we need to first determine the
number of times the statements in the loop will be executed.
This is because the number of iterations is directly proportional to the loop factor. Greater the
loop factor, more is the number of iterations.
For example, consider the loop given below:
for(i=0;i<100;i++)
statement block;
Here, 100 is the loop factor. We have already said that efficiency is directly proportional to the
number of iterations. Hence, the general formula in the case of linear loops may be given as
f(n) = n
Logarithmic Loops
We have seen that in linear loops, the loop updation statement either adds or subtracts the loop-
controlling variable. However, in logarithmic loops, the loop-controlling variable is either
multiplied or divided during each iteration of the loop. For example, look at the loops given
below:
for(i=1;i<1000; i*=2)
statement block;
for(i=1;i=1;i/=2)
statement block;
• Consider the first for loop in which the loop-controlling variable i is multiplied by 2.
The loop will be executed only 10 times and not 1000 times because in each iteration
the value of i doubles.
• Now, consider the second loop in which the loop-controlling variable i is divided by 2.
In this case also, the loop will be executed 10 times.
• Thus, the number of iterations is a function of the number by which the loop-controlling
variable is divided or multiplied. In the examples discussed, it is 2. That is, when n =
1000, the number of iterations can be given by log 1000 which is approximately equal
to 10.
• Therefore, putting this analysis in general terms, we can conclude that the efficiency of
loops in which iterations divide or multiply the loop-controlling variables can be given
as
f(n) = log n
Arrays
An array is a collection of similar data elements. These data elements have the same data type.
The elements of the array are stored in consecutive memory locations and are referenced by an
index (also known as the subscript). The subscript is an ordinal number which is used to
identify an element of the array.
DECLARATION OF ARRAYS
We have already seen that every variable must be declared before it is used. The same concept
holds true for array variables. An array must be declared before being used. Declaring an array
means specifying the following:
• Data type - the kind of values it can store, for example, int, char, float, double.
• Name - to identify the array.
• Size - the maximum number of values that the array can hold.
Arrays are declared using the following syntax:
type name[size];
The type can be either int, float, double, char, or any other valid data type. The number within
brackets indicates the size of the array, i.e., the maximum number of elements that can be stored
in the array. For example, if we write,
int marks[10];
then the statement declares marks to be an array containing 10 elements. In C, the array index
starts from zero.
• The first element will be stored in marks[0], second element in marks[1], and so on.
• Therefore, the last element, that is the 10th element, will be stored in marks[9].
• Note that 0, 1, 2, 3 written within square brackets are the subscripts. In the memory, the
array will be stored as shown in Fig. 3.2.
Figure 3.5 shows the result of the code shown in Fig. 3.4. The code accesses every individual
element of the array and sets its value to –1.
In the for loop, first the value of marks[0] is set to –1, then the value of the index (i) is
incremented and the next value, that is, marks[1] is set to –1.
The procedure continues until all the 10 elements of the array are set to –1.
Programming Examples
#include <stdio.h>
#include <conio.h>
int main()
{
int i, n, arr[20];
clrscr();
printf("\n Enter the number of elements in the array : ");
scanf("%d", &n);
for(i=0;i<n;i++)
{
printf("\n arr[%d] = ", i);
scanf("%d",&arr[i]);
}
printf("\n The array elements are ");
for(i=0;i<n;i++)
printf("\t %d", arr[i]);
return 0;
}
Output
Enter the number of elements in the array : 5
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
arr[4] = 5
The array elements are 1 2 3 4 5
TWO-DIMENSIONAL ARRAYS
A two-dimensional array is specified using two subscripts where the first subscript denotes
the row and the second denotes the column. The C compiler treats a two-dimensional array as
an array of one-dimensional arrays. Figure 3.26 shows a two-dimensional array which can be
viewed as an array of arrays.