0% found this document useful (0 votes)
2 views69 pages

Data Structures and Algorithms

The document provides an overview of data structures and algorithms, detailing data types, including simple and abstract data types, and the implementation of data structures in programming. It classifies data structures into linear and non-linear types, explaining specific structures like arrays, lists, stacks, and queues, along with their properties and operations. Additionally, it discusses the importance of choosing appropriate data structures for efficient program design and data management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views69 pages

Data Structures and Algorithms

The document provides an overview of data structures and algorithms, detailing data types, including simple and abstract data types, and the implementation of data structures in programming. It classifies data structures into linear and non-linear types, explaining specific structures like arrays, lists, stacks, and queues, along with their properties and operations. Additionally, it discusses the importance of choosing appropriate data structures for efficient program design and data management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

DATA STRUCTURES AND ALGORITHMS

DATA TYPE

• a data type simply refers to a defined kind of data,


that is, a set of possible values and basic operations
on those values. When applied in programming
languages, a data type defines a set of values and
the allowable operations on those values.
A data type consists of:
• a domain (= a set of values)
• a set of operations that may be applied to the
values.

2
SIMPLE DATA TYPES

• The simple data types are classified as follows:


a. Character
b. Numeric integer
c. Numeric real
d. Boolean (logical).

3
Abstract Data Type

• An Abstract Data Type commonly referred to


as ADT, is a collection of data objects
characterized by how the objects are
accessed

4
Data Structure

• A data structure is the implementation of an


abstract data type in a particular programming
language. Data structures can also be referred to as
“data aggregate”.
• A carefully chosen data structure will allow the
most efficient algorithm to be used.
• Thus, a well-designed data structure allows a
variety of critical operations to be performed using
a few resources, both execution time and memory
spaces as possible.
5
Classification of Data Structures

Data structures are broadly divided into two:


• Linear Data Structures
• Non-Linear Data Structures.

6
Linear Data Structures

• Linear data structures are data structures in


which individual data elements are stored and
accessed linearly in the computer memory.
• For the purpose of this course, the following
linear data structures would be studied: lists,
stacks, queues and arrays.

7
Non-Linear Data Structures

• A non-linear data structure, as the name


implies, is a data structure in which the data
items are not stored linearly in the computer
memory, but data items can be processed
using some techniques or rules. Typical non-
linear data structures to be studied in this
course are Trees and graphs.

8
Data Structures and Programmes

• The structure of data in the computer is very important in software


programmes, especially where the set of data is very large.
• When data is properly structured and stored in the computer, the
accessibility of data is easier and the software programme routines
that make do with the data are made simpler; time and storage
spaces are also reduced.

• In the design of many types of programmes, the choice of data


structures is a primary design consideration, as experience in
building large systems has shown that the difficulty of
implementation and the quality and performance of the final result
depends heavily on choosing the best data structure.

9
ARRAYS

• An array is a data structure consisting of a


group of elements that are accessed by
indexing.
• Each data item of an array is known as an
element, and the elements are referenced by
a common name known as the array name.

10
• Before any array is used in the computer, some memory
locations have to be created for storage of the elements. This is
often done by using the DIM instruction of BASIC programming
language or DIMENSION instruction of FORTRAN programming
language.
• For example, the instruction: DIM LAGOS (45) will create 45
memory locations for storage of the elements of the array
called LAGOS.
• In most programming languages, each element has the same
data type
• Array occupies a contiguous area of storage.
• Most programming languages have a built-in array data type.
11
Declaration of Arrays

• Variables normally only store a single value but, in some situations, it is useful to
have a variable that can store a series of related values – using an array. For example,
suppose a programme is required that will calculate the average age among a group
of six students. The ages of the students could be stored in six integer variables in C:
• int age1;
• int age2;
• int age3;
• However, a better solution would be to declare a six-element array:
• int age[6];
• This creates a six element array; the elements can be accessed as age[0] through
age[5] in C.
• A two-dimensional array (in which the elements are arranged into rows and columns)
declared by say DIM X(3,4) can be stored as linear arrays in the computer memory by
determining the product of the subscripts.
• The above can thus be expressed as DIM X (3 * 4) or DIM X (12).

12
Multi-dimensional Arrays

• Multi-dimensional arrays can be stored as linear arrays in


order to reduce the computation time and memory
• Ordinary arrays are indexed by a single integer. Also useful,
particularly in numerical and graphics applications, is the
concept of a multi-dimensional array, in which we index into
the array using an ordered list of integers, such as in a[3,1,5].
• The number of integers in the list used to index into the
multi-dimensional array is always the same and is referred to
as the array's dimensionality, and the bounds on each of
these are called the array's dimensions. An array with
dimensionality k, is often called k-dimensional.

13
Suppose we want to represent this simple two-
dimensional array:

It is most common to index this array using the RC-convention,


where elements are referred in row, column fashion such as:

Multi-dimensional arrays are typically represented by one-dimensional


arrays of references (Iliffe vectors) to other one-dimensiona l arrays. The
sub-arrays can be either the rows or columns.

14
Classification of Arrays

• Arrays can be classified as static arrays (i.e.


whose size cannot change once their storage
has been allocated), or dynamic arrays, which
can be resized.

15
Applications of Arrays

• Arrays are employed in many computer


applications in which data items need to be
saved in the computer memory for
subsequent reprocessing. Due to their
performance characteristics, arrays are used
to implement other data structures, such as
heaps, hash tables, dequeues, queues, stacks
and strings.

16
LIST DATA STRUCTURE

• Lists are the most fundamental data structure upon which most other data
structures are built and many more algorithms must operate. It’s not hard to
find examples of lists in the real world: shopping lists, to-do lists, train
timetables, order forms, even this “list of lists.”
• A list data structure is a sequential data structure, i.e. a collection of items
accessible one after the other, beginning at the head and ending at the tail.
• It is a widely used data structure for applications which do not need random
access.
• Lists differ from the stacks and queues data structures in that additions and
removals can be made at any position in the list.
• Lists also preserve insertion order so that, assuming there are no intervening
modifications, a given list will always return the same value for the same
position.
• Like arrays, lists make no attempt to preserve the uniqueness of values,
meaning a list may contain duplicate values.
17
Elements of a List

The sentence “Dupe is not a boy” can be written as a


list as follows:
DUPE IS NOT A BOY

We regard each word in the sentence above as a data-


item or datum, which is linked to the next datum, by a
pointer. Datum plus pointer make one node of a list.
The last pointer in the list is called a terminator. It is
often convenient to speak of the first item as the head
of the list, and the remainder of the list as the tail.

18
Operations

The main primitive operations of a list are known as:


i. Add adds a new node
ii. Set updates the contents of a node
iii. Remove removes a node
iv. Get returns the value at a specified index
v. IndexOf returns the index in the list of a specified element
vi. Additional primitives can be defined:
vii. IsEmpty reports whether the list is empty
viii. IsFull reports whether the list is full
ix. Initialise creates/initialises the list
x. Destroy deletes the contents of the list (may be implemented by re-
initialising the list)
xi. Initialise Creates the structure – i.e. ensures that the structure exists but
contains no elements e.g. Initialise(L) creates a new empty queue named Q
19
List Implementation

• The two most common, are an array-based


implementation and a linked list.
1. Array List: As the name suggests, an array list
uses an array to hold the values.
2. Linked List: A linked list, conversely, is a chain
of elements in which each item has a
reference (or link) to the next (and optionally
previous) element.

20
Array Lists

• As the name suggests, an array list uses an


array as the underlying mechanism for storing
elements.
• Because of this, the fact that you can index
directly into arrays makes implementing
access to elements very easy
• It also makes an array list the fastest
implementation for indexed and sequential
access.
21
Properties of Array List:

• 1. The position of each element is given by an index from 0 to n-1, where


n is the number of elements.
• 2. Given any index, the element with that index can be accessed in
constant time – i.e. the time to access does not depend on the size of
the list.
• 3. To add an element at the end of the list, the time taken does not
depend on the size of the list. However, the time taken to add an
element at any other point in the list does depend on the size of the list,
as all subsequent elements must be shifted up. Additions near the start
of the list take longer than additions near the middle or end.
• 4. When an element is removed, subsequent elements must be shifted
down, so removals near the start of the list take longer than removals
near the middle or end.
22
Linked List
• The Linked List is stored as a sequence of
linked nodes. Rather than use an array to hold
the elements, a linked list contains individual
elements with links between them.

23
Properties of Linked-list

• The list can grow and shrink as needed.


• The position of each element is given by an index from 0 to n-1,
where n is the number of elements.
• Given any index, the time taken to access an element with that
index depends on the index. This is because each element of the
list must be traversed until the required index is found.
• The time taken to add an element at any point in the list does
not depend on the size of the list, as no shifts are required. It
does, however, depend on the index.
• Additions near the end of the list take longer than additions near
the middle or start. The same applies to the time taken to
remove an element. A list needs a reference to the front node.
24
i. Singly Linked Lists

• A singly linked list is a data structure in which


the data items are chained (linked) in one
direction. Figure 1 shows an example of a
singly linked list.

25
header

a1 a2 an

tail

ii. Circularly Linked Lists


In a circularly linked list, the tail of the list always points to the head of the list.

26
iii. Doubly Linked Lists

• This permits scanning or searching of the list


in both directions. This double linking makes it
possible to traverse the elements in either
direction. It also makes insertion and deletion
much simpler than it is for an array list.
iv. Sorted Lists: Lists can be designed to be
maintained in a given order. In this case, the
Add method will search for the correct place
in the list to insert a new data item.
27
THE STACK DATA STRUCTURE

• A stack is a linear data structure in which all


insertions and deletions of data are made only
at one end of the stack, often called the top of
the stack. For this reason, a stack is referred to
as a LIFO (last-in-first-out) structure.

28
• A frequently used metaphor is the idea of a stack of
plates in a spring loaded cafeteria stack.
• In such a stack, only the top plate is visible and
accessible to the user, all other plates remain hidden.
• As new plates are added, each new plate becomes
the top of the stack, hiding each plate below, pushing
the stack of plates down.
• As the top plate is removed from the stack, the plates
pop back up, and the second plate becomes the top
of the stack.
29
Application of Stacks

• Stacks are used extensively at every level of a


modern computer system.
• For example, a modern PC uses stacks at the
architecture level, which are used in the basic
design of an operating system for interrupt
handling and operating system function calls.
• Another common use of stacks at the
architecture level is as a means of allocating
and accessing memory
30
Operations on a Stack

• The stack is usually implemented with two basic operations known as


"push" and "pop". Thus, two operations applicable to all stacks are:
• A push operation, in which a data item is placed at the location
pointed to by the stack pointer and the address in the stack pointer is
adjusted by the size of the data item; Push adds a given node to the
top of the stack leaving previous nodes below.
• A pop or pull operation, in which a data item at the current location
pointed to by the stack pointer is removed, and the stack pointer is
adjusted by the size of the data item. Pop removes and returns the
current top node of the stack.
• The main primitives of a stack are known as:
Push adds a new node
Pop removes a node

31
Operations on a Stack Contd.

• Additional primitives can be defined:


IsEmpty reports whether the stack is empty
IsFull reports whether the stack is full
Initialise creates/initialises the stack
Destroy deletes the contents of the stack
(may be implemented by re-initialising the
stack

32
Stack Storage Modes

A stack can be stored in two ways:


a static data structure or a dynamic data structure
• Static Data Structures
These define collections of data which are fixed in size
when the Programme is compiled.
• Dynamic Data Structures
These define collections of data which are variable in
size and structure.
They are created as the programme executes, and grow
and shrink to accommodate the data being stored
33
THE QUEUE DATA STRUCTURE

• Queues are an essential part of algorithms that


manage the allocation and scheduling of work,
events, or messages to be processed.
• They are often used as a way of enabling different
processes— either on the same or different
machines
• More often than not, the order of retrieval is
indeed the same as the order of insertion (also
known as first-in-first-out, or FIFO), but there are
other possibilities as well.
34
Bounded or unbounded Queue

• Queues can be ether bounded or unbounded.


• Bounded queues have limits placed on the
number of items that can be held at any one
time.
• Unbounded queues are free to grow in size as
the limits of the hardware allow.

35
Application of Queues

• Queues are very important structures in computer simulations, data


processing, information management, and in operating systems.
• In simulations, queue structures are used to represent real-life events
such as car queues at traffic light junctions and petrol filling stations,
queues of people at the check-out point in super markets, queues of
bank customers, etc.

• In operating systems, queue structures are used to represent different


programmes in the computer memory in the order in which they are
executed.
• For example, if a programme, J is submitted before programme K, then
programme J is queued before programme K in the computer memory
and programme J is executed before programme K.

36
Operations on a Queue

The main primitive operations on a queue are known as:


• Enqueue: Stores a value in the queue. The size of the queue will increase
by one.
• Dequeue: Retrieves the value at the head of the queue. The size of the
queue will decrease by one. Throws EmptyQueueException if there are
no more items in the queue.
• Clear: Deletes all elements from the queue. The size of the queue will be
reset to zero (0).
• Size: Obtains the number of elements in the queue.
• IsEmpty: Determines whether the queue is empty (size() = 0) or not.

• IsFull reports whether the queue is full


• Initialise creates/initialises the queue

37
Storing a Queue in a Static Data Structure

• This implementation stores the queue in an array.


• The array indices at which the head and tail of the
queue are currently stored must be maintained.
• The head of the queue is not necessarily at index 0.
• The array can be a “circular array” in which the
queue “wraps round” if the last index of the array
is reached
• The following figure is an example of storing a
queue in an array of length 5:
38
39
Storing a Queue in a Dynamic Data Structure

• A queue requires a reference to the head node AND a


reference to the tail node. The following diagram describes
the storage of a queue called Queue. Each node consists of
data (DataItem) and a reference (NextNode).

The first node is accessed using the name Queue.Head.


·Its data is accessed using Queue.Head.DataItem
·The second node is accessed using Queue.Head.NextNode
·The last node is accessed using Queue.Tail

40
Adding a Node (Add)
• The new node is to be added at the tail of the queue. The reference
Queue.Tail should point to the new node, and the NextNode reference of
the node previously at the tail of the queue should point to the DataItem
of the new node.

41
Blocking Queues

• Queues are often used in multi-threaded environments


as a form of interprocess communication. Unfortunately,
FIFO Queue is totally unsafe for use in situations where
multiple consumers would be accessing it concurrently.
• Instead, a blocking queue is one way to provide a
thread-safe implementation, ensuring that all access to
the data is correctly synchronized.
• The first main enhancement that a blocking queue
offers over a regular queue is that it can be bounded
(must have upper and lower limit)

42
TREES DATA STRUCTURE
• A tree is often used to represent a hierarchy. This is because
the relationships between the items in the hierarchy suggest
the branches of a botanical tree.
• In computer science, a tree is a widely-used data structure
that emulates a hierarchical tree structure with a set of linked
nodes.

A simple unordered tree; in this diagram, the node labeled 7


has two children, labeled 2 and 6, and one parent, labeled 2.

43
• Each node in a tree has zero or more child nodes, which are below it
in the tree
• By convention, trees are drawn growing downwards.
• A node that has a child is called the child's parent node (or ancestor
node, or superior).
• A node has at most one parent. Nodes that do not have any children
are called leaf nodes. They are also referred to as terminal nodes.
• The height of a node is the length of the longest downward path to
a leaf from that node. The height of the root is the height of the
tree.
• The depth of a node is the length of the path to its root (i.e., its root
path)

44
• The topmost node in a tree is called the root node. Being the
topmost node, the root node will not have parents.
• It is the node at which operations on the tree commonly begin
(although some algorithms begin with the leaf nodes and work
up ending at the root).
• All other nodes can be reached from it by following edges or
links. (In the formal definition, each such path is also unique). In
diagrams, it is typically drawn at the top.
• An internal node or inner node is any node of a tree that has
child nodes and is thus not a leaf node.
• Similarly, an external node or outer node is any node that does
not have child nodes and is thus a leaf.
45
Binary Trees
The simplest form of tree is a binary tree. A binary tree consists of
• a node (called the root node) and
• left and right sub-trees.
• Both the sub-trees are themselves binary trees.

46
Properties of Binary Tree

In an ordered binary tree,


• the keys of all the nodes in the left sub-tree
are less than that of the root,
• the keys of all the nodes in the right sub-tree
are greater than that of the root,
• the left and right sub-trees are themselves
ordered binary trees.

47
Traversal methods

• There are many different applications of trees. As a result, there are


many different algorithms for manipulating them.
• However, many of the different tree algorithms have in common
the characteristic that they systematically visit all the nodes in the
tree.
• That is, the algorithm walks through the tree data structure and
performs some computation at each node in the tree. This process
of walking through the tree is called a tree traversal.
• Stepping through the items of a tree, by means of the connections
between parents and children, is called walking the tree, and the
action is a walk of the tree.
• Often, an operation might be performed when a pointer arrives at a
particular node
48
• There are 3 types of walks or transversal in a
tree: Pre-order, In-order and Post-order
• pre-order walk : each parent node is traversed
before its children is called;
• post-order walk : the children are traversed
before their respective parents are traversed;
• in-order : is a walk in which a node's left
subtree, is transversed first, then the node itself,
and then finally its right subtree are traversed.
49
Preorder traversal

• Preorder traversal gets its name from the fact


that it visits the root first. That is,
1. Visit the root first; and then
2. Traverse the left subtree; and then
3. Traverse the right subtree.

50
Postorder traversal

To do a postorder traversal of a binary tree


1. Traverse the left subtree; and then
2. Traverse the right subtree; and then
3. Visit the root.

51
Inorder traversal
Inorder traversal visits the root in between
visiting the left and right subtrees that is,
1. Traverse the left subtree; and then
2. Visit the root; and then
3. Traverse the right subtree.

52
Common operations on Trees

i. Enumerating all the items


ii. Enumerating a section of a tree
iii. Searching for an item
iv. Adding a new item at a certain position on the tree
v. Deleting an item
vi. Removing a whole section of a tree (called pruning)
vii. Adding a whole section to a tree (called grafting)
viii. Finding the root for any node

53
Common uses of Trees

• Manipulate hierarchical data


• Make information easy to search
• Manipulate sorted lists of data
• As a workflow for compositing digital images
for visual effects

54
Key Terms

• A B-tree of order m is an m-way tree in which


a) all leaves are on the same level and
b) all nodes except for the root and the leaves have at least m/2
children and at most m children. The root has at least 2
children and at most m children.
• B+-tree is a variation of the B-tree in which all the keys in
nodes except the leaves as dummies. All keys are duplicated in
the leaves. This has the advantage that is all the leaves are
linked together sequentially; the entire tree may be scanned
without visiting the higher nodes at all.
It is B-tree in which all the leaves are linked to facilitate fast in -
order traversal.
55
AVL tree

• An AVL tree is another balanced binary search tree. Named


after their inventors, Adelson-Velskii and Landis, they were
the first dynamically balanced trees to be proposed.
• AVL tree is a self-balancing Binary Search Tree (BST) where
the difference between heights of left and right subtrees
cannot be more than one for all nodes.
• An AVL tree is a binary search tree which has the following
properties:
i. The sub-trees of every node differ in height by at most one.
ii. Every sub-tree is an AVL tree.

56
COMPARISON BETWEEN B-TREE AND B + TREE

B-TREE B+TREE

Has Lower fan-out compared with B+Tress Has very high fan-out (number of pointers
to child nodes in a node)

Leaf nodes has no linkage (i.e. Leaf Nodes pointing to Leaf nodes are Linked with each other
another leaf Node)

B trees contain data with each key B+ trees don't have data associated with
interior nodes

57
Storage Management

An executing program uses memory (storage)


for many different purposes such as to store:
1. the machine instructions that represent the
executable part of the program,
2. the values of data objects, and
3. the return location for a function invocation.

58
Static Memory Management

• When memory is allocated during compilation time,


it is called ‘Static Memory Management’.
• This memory is fixed and cannot be increased or
decreased after allocation.
• If more memory is allocated than requirement,
then memory is wasted.
• If less memory is allocated than requirement, then
program will not run successfully.
• Hence, exact memory requirements must be known
in advance.
59
Dynamic Memory Management

• When memory is allocated during


run/execution time, it is called ‘Dynamic
Memory Management’.
• This memory is not fixed and is allocated
according to requirements.
• Thus in it there is no wastage of memory.
• Hence, there is no need to know exact
memory requirements in advance.

60
Phases of Storage Management

• In general, storage is managed in three phases:


• 1) allocation, in which needed storage is found from available
(unused) storage and assigned to the program;
• 2) recovery, in which storage that is no longer needed is made
available for reuse; and
• 3) compaction, in which blocks of storage that are in use but
are separated by blocks of unused storage are moved together
in order to provide larger blocks of available storage.
• Although, compaction is desirable, but usually it is difficult or
impossible to do in practice so it is not often done.
• These 3 phases may be repeated many times during the
execution of a program.
61
HASHING

• Hashing is the technique used for performing


almost constant time search in case of
insertion, deletion and find operation.

62
HASH FUNCTIONS

• A function which employs some algorithm to compute the key K for


all the data elements in the set U, such that the key K which is of a
fixed size.
• The same key K can be used to map data to a hash table and all the
operations like insertion, deletion and searching should be
possible.
• The values returned by a hash function are also referred to as hash
values, hash codes, hash sums, or hashes
• Some relatively simple hash functions that have been used are:
– Division-remainder method
– Folding method
– Radix transformation method
– Digit rearrangement method

63
HASH TABLES

• A hash table is a collection of items which are


stored in such a way as to make it easy to find them
later.
• Each position of the hash table, often called a slot,
can hold an item and is named by an integer value
starting at 0.
• For example, we will have a slot named 0, a slot
named 1, a slot named 2, and so on.
• Initially, the hash table contains no items so every
slot is empty.
64
• The Figure below shows a hash table of size
(m=11). In other words, there are m slots in
the table, named 0 through 10.

The mapping between an item and the slot where


that item belongs in the hash table is called the hash
function 65
Collision
• Collision occurs when two or more data elements hatches into the same
location or slot.
• Example: Assuming the letters of the alphabet are assigned values from1
for a through 26 for z. Explain how to hatch the following strings
“scorpion”, “snake”, “elvis” and “lives”.
• Solution:

Scorpion: s + c + o + r + p + i+ o + n = 19 + 3 + 15 + 18 + 16 + 9 + 15 + 14 = 109
Snake: s + n + a + k + e = 19 + 14 + 1 + 11 + 5 = 50
Elvis: e + l + v + i + s = 5 + 12 + 22 + 9 + 19 = 67
Lives: l + i + v + e + s = 12 + 9 +22 +5 + 19 = 67

Looking at the generated values, you can see that the string “scorpion” would
be placed into an array at position 109, “snake” at position 50, “elvis” at
position 67 and “lives” also at position 67.
Note that the strings are not stored in any particular order
66
• There are two major problems with this approach.
• Take another look at the generated values. If these were used as index
positions within an array, then it would need too big enough to
accommodate the largest position, 109. Having filled only 4 of the 109
positions available—that is, 0 to 109—you would still have 105 empty
ones.

• One way to solve this problem is to modify the hash function to produce
only values within a certain range.
• If the size of the hash table was restricted to, for example, ten positions,
then the hash function could be modified to take the original result and
use a modulus, that is, the remainder after division (to find the remainder
after division by 10), as shown below:

67
• Scorpion: s + c + o + r + p + i+ o + n = 19 + 3 + 15 + 18 + 16 + 9 + 15 +
14 = 109/10 = 9
• Snake: s + n + a + k + e = 19 + 14 + 1 + 11 + 5 = 50/10 = 0
• Elvis: e + l + v + i + s = 5 + 12 + 22 + 9 + 19 = 67/10 = 7
• Lives: l + i + v + e + s = 12 + 9 +22 +5 + 19 = 67/10 = 7

• Now the addresses fall within the range 0 to 9 and can be stored in a
hash table of size 10.

• Unfortunately, there is still one more problem with the hash function
as described as it suffers from a high rate of collisions—different
values hashing to the same address. This is because “elvis” and “lives”
produced the same address. This is called collision.
68
• The first solution to the problem of collision
resolution is known as linear probing. Linear
probing is a very simple technique that, on
detecting a collision, searches linearly for the next
available slot.

• The second approach to collision resolution


involves the use of hash buckets to store more than
one item at each position. Each bucket holds zero
or more items that hash to the same value.
69

You might also like