0% found this document useful (0 votes)

13 views23 pages

Hashing

Uploaded by

Ak221197

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views23 pages

Hashing

Uploaded by

Ak221197

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Data Structures Notes

Farhan Sufyan
Contents

1 Hashing 1
1.1 Direct Addressing or Direct Access Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Hash Function and Hash Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Complexity of Opertaions in Hash Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Limitations of Hash Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Types of Hash Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Collision Resolution Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Separate Chaining (Open Hashing or Closed Addressing) . . . . . . . . . . . . . . . . . . 9
1.3.2 Open Addressing (Closed Hashing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.1 Linear Probing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.2 Quadratic Probing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3.2.3 Double Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.3 Comparison of Probing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Separate Chaining vs Open Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 1

Hashing

1.1 Direct Addressing or Direct Access Table

• Searching algorithms like linear and binary search trees are totally dependent on number of elements and
many comparisons.
• Suppose we have 0 − (n − 1) and all of them are unique. We can take array of size n and store the records in
that array based on the condition that key and array index are same.

• For example: Suppose we have to store the records of 15 students and each record consist of roll number and
name of the student. The roll number are in range 0 to 14 and they are taken as keys. The record with key
(roll number) 0 is stored at array index 0 and so on. Hence, we can access any record in constant time and no
comparison is involved.
• The above method is called Direct Addressing or Key-indexed search, but it is only useful only when the
set of possible key value is small.
• Direct Address Table is a data structure that has the capability of mapping records to their corresponding
keys using arrays. In direct address tables, records are placed using their key values directly as indexes. They
facilitate fast searching, insertion and deletion operations.

• Consider the case where we have to store 500 employees ofcompany and their mobile number is taken as key
to identify the individual. The mobile number can be from 0000000000 to 9999999999, so here the set of
possible key value is much more than the number of key.
• Time complexity wise this solution is the best among all, we can do all operations in O(1) time. However,
problem with this solution is extra space required is huge.

1
• Due to above limitations Direct Access Table cannot always be used. Hashing is the solution that can be
used in almost all such situations and performs extremely well compared to above data structures like Array,
Linked List, Balanced BST in practice.
• With hashing we get O(1) search time on average (under reasonable assumptions) and O(n) in worst case.

• Hashing can overcome these limitations of direct address tables.

1.1.1 Advantages
• Searching in O(1) Time: Direct address tables use arrays which are random access data structure, so, the
key values (which are also the index of the array) can be easily used to search the records in O(1) time.
• Insertion in O(1) Time: We can easily insert an element in an array in O(1) time. The same thing follows
in a direct address table also.

• Deletion in O(1) Time: Deletion of an element takes O(1) time in an array. Similarly, to delete an element
in a direct address table we need O(1) time.

1.1.2 Limitations
• Prior knowledge of maximum key value
• Practically useful only if the maximum value is very less.
• It causes wastage of memory space if there is a significant difference between total records and maximum
value.

1.2 Hashing
• Hashing is an improvement over Direct Access Table. The idea is to use hash function that converts a given
phone number or any other key to a smaller number and uses the small number as index in a table called
hash table.
• Hashing is one of the searching techniques that uses a constant time. The time complexity in hashing is O(1).
Till now, we read the two techniques for searching, i.e., linear search and binary search. The worst time
complexity in linear search is O(n), and O(log n) in binary search.
• In both the searching techniques, the searching depends upon the number of elements but we want the
technique that takes a constant time. So, hashing technique came that provides a constant time.
• In Hashing technique, the hash table and hash function are used.

• Using the hash function, we can calculate the address at which the value can be stored.
• The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then the algorithm
computes the index at which the value would be stored. It can be written as:
hash(key) = index
where hash is a hash function, key is the key and index is the hash value of key key. Now the key key is stored
at array index index, also known as it home address.

2
1.2.1 Hash Function and Hash Table
• A hash table is a type of data structure that stores key-value pairs. The key is sent to a hash function that
performs arithmetic operations on it to produce the hash value or hash, which is the index of the key-value
pair in the hash table. Hence, the process of generating the address from keys is called hashing and the array
in which insertion, deletion and searching perform through hashing is called hash table.
• A hash function is a function that can be used to map data of arbitrary size to fixed-size values . The
fixed-size values are usually used to index a fixed-size table called a hash table.
• The values returned by a hash function are called hash values, hash codes, message digests, or simply
hashes.
• Keys may be of different types like string, integer, etc but hash value will always be integer.

3
• To achieve a good hashing mechanism, It is important to have a good hash function with the following basic
requirements:
– Easy to compute: It should be easy to compute and must not become an algorithm in itself.
– Uniform distribution: It should provide a uniform distribution across the hash table and should not
result in clustering.
– Less collisions: Collisions occur when pairs of elements are mapped to the same hash value. These
should be avoided.

Here, searching will take O(n) time (where n is the number of strings) to access a specific string. This shows
that the hash function is not a good hash function.

By using a good hash function, hashing can work well. Under reasonable assumptions, the average time
required to search for an element in a hash table is O(1).

4
1.2.2 Complexity of Opertaions in Hash Table

1.2.3 Limitations of Hash Function

• Since a hash function gets us a small number for a big key, there is possibility that two keys result in same
value. The situation where a newly inserted key maps to an already occupied slot in hash table is called
collision.
• When the hash function generates the same index for multiple keys, there will be a conflict (what value to be
stored in that index). This is called a hash collision.

5
1.2.4 Types of Hash Function
There are three ways of calculating the hash function:
• Folding method

• Mid square method

6
• Division method (Modulo-Division)

7
1.3 Collision Resolution Techniques
While hashing, two or more key points to the same hash table index is called as collision. Irrespective of how
good a hash function is, collisions are bound to occur. Therefore, to maintain the performance of a hash table, it is
important to manage collisions through various collision resolution techniques.
• Separate Chaining (Open Hashing or Closed Addressing)
• Open Addressing (Closed Hashing)

– Linear Probing
– Quadratic Probing
– Double Hashing

8
1.3.1 Separate Chaining (Open Hashing or Closed Addressing)
• Open hashing is a collision avoidance method which uses array of linked list to resolve the collision.
• In separate chaining, each element of the hash table is a linked list.
• It is also known as the separate chaining method (each linked list is considered as a chain).

• Chaining is simple, but requires additional memory outside the table.

• For separate chaining, the worst-case scenario is when all the entries are inserted into the same linked list.
The lookup procedure may have to scan all its entries, so the worst-case cost is proportional to the number
(N) of entries in the table.

9
• Let us consider a simple hash function as key mod 7 and sequence of keys as 50, 700, 76, 85, 92, 73, 101.
• Draw an empty hash table. For the given hash function, the possible range of hash values is [0, 6]. So, draw
an empty hash table consisting of 7 buckets

• The first key to be inserted in the hash table is 50. Bucket of the hash table to which key 50 maps as
50 mod 7 = 1. So, key 50 will be inserted in bucket-1 of the hash table.
• Key 700 maps as 700 mod 7 = 0. So, key 700 will be inserted in bucket-0 of the hash table. Key 76 maps as
76 mod 7 = 6. So, key 76 will be inserted in bucket-6 of the hash table.
• Key 85 maps as 85 mod 7 = 1. Since bucket-1 is already occupied, so collision occurs. Separate chaining
handles the collision by creating a linked list to bucket-1. So, key 85 will be inserted in bucket-1 of the hash
table and so on.

• Advantages
– Simple to implement.
– Hash table never fills up, we can always add more elements to the chain.
– Less sensitive to the hash function or load factors.
– It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.
• Disadvantages
– Cache performance of chaining is not good as keys are stored using a linked list.
– Wastage of Space (Some parts of hash table are never used).
– If the chain becomes long, then search time can become O(n) in the worst case.
– Uses extra space for links.

10
1.3.2 Open Addressing (Closed Hashing)
• Unlike chaining, it does not insert elements to some other data-structures. It inserts the data into the hash
table itself.
• The size of the hash table should be larger or atleast equal to the number of keys i.e. it has at most one
element per bucket.

• In open addressing, all elements are stored in the hash table itself.
• When a new entry has to be inserted, the hash index of the hashed value is computed and then the array is
examined (starting with the hashed index). If the slot at the hashed index is unoccupied, then the entry record
is inserted in slot at the hashed index else it proceeds in some probe sequence until it finds an unoccupied
slot.

• The probe sequence is the sequence that is followed while traversing through entries. In different probe
sequences, you can have different intervals between successive entry slots or probes.
• When searching for an entry, the array is scanned in the same sequence until either the target element is
found or an unused slot is found.

• The name open addressing refers to the fact that the location or address of the item is not determined by its
hash value.

1.3.2.1 Linear Probing

• When collision occurs, in linear probing, we keep probing linearly for the next bucket until an empty bucket
is found.

• The probing sequence for linear probing will be:

Step 1 - Calculate the hash:

index = key%hashTableSize
Step 2 - If hashTable[index] is empty, store the value directly hashTable[index] = key.
Step 3 - If the hash index already has some value, check for next index.
index = (index + 1)%hashTableSize
If the next index is available hashTable[index], store the value. Otherwise try for next index using
index = (index + 2)%hashTableSize
index = (index + 3)%hashTableSize
and so on until an empty bucket is found.

• The formula for linear probing can be written as:

H(key, i) = (hash(key) + i) mod hashTableSize
where i = 0, 1, ....., (hashTableSize − 1)
• The search for empty location will be in sequence
H(key, i), H(key, i) + 1, H(key, i) + 2, H(key, i) + 3.......

11
• Example
Initial Hash Table

Insert 13

12
Insert 1

Insert 6

1%5 = 1
6%5 = 1
Both 1 and 6 points the same index under modulo 5. So that we have placed 6 in arr[2] which is next available
index.

13
Insert 11

1%5 = 1
6%5 = 1
11%5 = 1
1, 6 and 11 points the same index under modulo 5. So that we have placed 11 in arr[4] which is next available
index.
Insert 10

14
Insert 15

15
Let us consider a simple hash function as key mod 7 and a sequence of keys as 50, 700, 76, 85, 92, 73, 101.

• Advantages

– Easy to compute.
– Achieves good cache performance since the probing sequence is linear in memory
• Disadvantages
– A problem tends to create long sequences of occupied buckets. The existing chain will act as a "net"
and catch many of the new keys, which will be appended to the chain and exacerbate the problem.
– Key is storing far away from its home address, which is bad for performance since the average number
of buckets scanned during insert and lookup or search increases. The phenomenon is called primary
clustering or just clustering.
– Worst time to search an element in linear probing is O(N), where N is the size of the table.

16
1.3.2.2 Quadratic Probing
• In Quadratic Probing, the problem of primary clustering is solved by solved by storing the colliding keys
away from the initial collision point.
• Quadratic probing is an open-addressing scheme where we look for i2 -th slot in it h iteration if the given hash
value x collides in the hash table.

• The probing sequence for quadratic probing will be:

Step 1 - Calculate the hash:

index = key%hashTableSize
Step 2 - If hashTable[index] is empty, store the value directly hashTable[index] = key.
Step 3 - If the hash index already has some value, check for next index.
index = (index + i2 )%hashTableSize, where i = 0, 1, ..., (hashTableSize − 1)
i.e.
index = (index + 12 )%hashTableSize = (index + 1)%hashTableSize
index = (index + 22 )%hashTableSize = (index + 4)%hashTableSize
index = (index + 32 )%hashTableSize = (index + 9)%hashTableSize
and so on until an empty bucket is found.

• The formula for quadratic probing can be written as:

H(key, i) = (hash(key) + i2 ) mod hashTableSize
where i = 0, 1, ....., (hashTableSize − 1)
• The search for empty location will be in sequence:
H(key, i), H(key, i) + 1, H(key, i) + 4, H(key, i) + 9.......
• Advantages - This creates larger and larger gaps in the search sequence and avoids primary clustering.
• Disadvantage - If one key hashes to the same bucket as another key, the search sequence for the second key
will go in the footsteps of the first one. If this happens repeatedly (for example due to a poorly implemented
hash function) long chains will still form, and cause performance to degrade. The phenomenon is called
secondary clustering.

17
• Example

18
1.3.2.3 Double Hashing
• Double hashing uses the idea of applying a second hash function to key when a collision occurs.
• First hash function is
hash(key) or index = key%hashTableSize

• A popular second hash or secondary hash function is:

hash0 (key) = PRIME − (key%PRIME),
where PRIME is a prime smaller than the hashTableSize.
• The secondary hash function should have several characteristics:

– It should never yield an index of zero

– It should cycle through the whole table
– It should be very fast to compute

The formula for double hashing can be written as:

H(key, i) = (hash(key) + ihash0 (key))
where i = 0, 1, ....., (hashTableSize − 1)

• The search for empty location will be in sequence:

H(key, i), H(key, i) + hash0 (key), H(key, i) + 2hash0 (key), H(key, i) + 3hash0 (key), .......

19
• Advantage
– It does not suffer from any clustering.
– Although the computational cost may be high, double hashing can find the next free slot faster than the
linear probing approach.
• Disadvantages
– It requires more computation time as two hash functions need to be computed.

1.3.3 Comparison of Probing Techniques

• Linear Probing has the best cache performance but suffers from primary clustering. Linear probing is also
easy to compute.
• Quadratic probing lies between the two in terms of cache performance and secondary clustering.

• Double caching has poor cache performance but no clustering. Double hashing requires more computation
time as two hash functions need to be computed.

20
1.4 Separate Chaining vs Open Addressing

Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
DSA Theory Final
No ratings yet
DSA Theory Final
8 pages
Unit 1 Dsa Hashing 2022 Compressed 1
No ratings yet
Unit 1 Dsa Hashing 2022 Compressed 1
115 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Hashing Unit 1
No ratings yet
Hashing Unit 1
91 pages
UNIT 1 - Hashing
No ratings yet
UNIT 1 - Hashing
118 pages
DSA Unit 1
No ratings yet
DSA Unit 1
144 pages
SHA - 256 Algorithm
No ratings yet
SHA - 256 Algorithm
3 pages
Task 3 Hashing Quadratic Probing
No ratings yet
Task 3 Hashing Quadratic Probing
7 pages
11 Hash Tables Slides
No ratings yet
11 Hash Tables Slides
34 pages
SHA 256 Algorithm
No ratings yet
SHA 256 Algorithm
33 pages
Hashing New
No ratings yet
Hashing New
48 pages
UNIT 1 - Hashing
No ratings yet
UNIT 1 - Hashing
118 pages
Ds 17hashing
No ratings yet
Ds 17hashing
27 pages
Week 9 - Hash Functions and Collision
No ratings yet
Week 9 - Hash Functions and Collision
73 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hashing
No ratings yet
Hashing
37 pages
Hashing
No ratings yet
Hashing
37 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Hashing
No ratings yet
Hashing
30 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Hashing
No ratings yet
Hashing
44 pages
DSA Unit VI Hashing and File Organization
No ratings yet
DSA Unit VI Hashing and File Organization
56 pages
06 - APS - Hash Table
No ratings yet
06 - APS - Hash Table
28 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
March 2024 DSA InSem Solution
No ratings yet
March 2024 DSA InSem Solution
26 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing
No ratings yet
Hashing
30 pages
Unit 4 Hashing
No ratings yet
Unit 4 Hashing
35 pages
Module 5
No ratings yet
Module 5
33 pages
Vestel 17mb30-2 SCH
No ratings yet
Vestel 17mb30-2 SCH
11 pages
Hashing
No ratings yet
Hashing
25 pages
ACN LineeGuida Password
No ratings yet
ACN LineeGuida Password
19 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
13 Cryptographic Hash Functions
No ratings yet
13 Cryptographic Hash Functions
47 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing
No ratings yet
Hashing
42 pages
Hashing
No ratings yet
Hashing
23 pages
Hash Tables
No ratings yet
Hash Tables
21 pages
Hash-Data Structure
No ratings yet
Hash-Data Structure
16 pages
CSC 302 - Hashing Techniques
No ratings yet
CSC 302 - Hashing Techniques
19 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hash
No ratings yet
Hash
7 pages
12 Message Integrity and Authentication
No ratings yet
12 Message Integrity and Authentication
41 pages
Dsa Hashing (21CS32)
No ratings yet
Dsa Hashing (21CS32)
16 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
Lab 7 - Hash Table
No ratings yet
Lab 7 - Hash Table
4 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Dsa 5
No ratings yet
Dsa 5
22 pages
Introduce To Sha-512: Subject: Cryptography Theory Lecturer: Assoc. Prof. Do Trong Tuan Presentation: Team 01
No ratings yet
Introduce To Sha-512: Subject: Cryptography Theory Lecturer: Assoc. Prof. Do Trong Tuan Presentation: Team 01
30 pages
Hashing ClassNotes
No ratings yet
Hashing ClassNotes
8 pages
CME 2201-Lab6
No ratings yet
CME 2201-Lab6
7 pages
Overview of Merkle Trees
No ratings yet
Overview of Merkle Trees
4 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Assignment 2 - Group Assignment
No ratings yet
Assignment 2 - Group Assignment
6 pages
Hashing
No ratings yet
Hashing
4 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing
No ratings yet
Hashing
4 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
Hashing 1
No ratings yet
Hashing 1
26 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
SIGNATURES-4.1.0
No ratings yet
SIGNATURES-4.1.0
2 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
COMP2119 Assignment 4
No ratings yet
COMP2119 Assignment 4
4 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Badan Siber Dan Sandi Negara: Website
No ratings yet
Badan Siber Dan Sandi Negara: Website
6 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Cryptography and Network Security: Fifth Edition by William Stallings
No ratings yet
Cryptography and Network Security: Fifth Edition by William Stallings
23 pages
Whattomine Com Asic
No ratings yet
Whattomine Com Asic
4 pages
CrackStation - Online Password Hash Cracking - MD5, SHA1, Linux, Rainbow Tables, Etc
No ratings yet
CrackStation - Online Password Hash Cracking - MD5, SHA1, Linux, Rainbow Tables, Etc
2 pages
Hash Functions: Boban Spasic
No ratings yet
Hash Functions: Boban Spasic
10 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet

Uploaded by

Uploaded by

Data Structures Notes

1.1 Direct Addressing or Direct Access Table

• Hashing can overcome these limitations of direct address tables.

1.2.3 Limitations of Hash Function

• Mid square method

• Chaining is simple, but requires additional memory outside the table.

1.3.2.1 Linear Probing

• The probing sequence for linear probing will be:

Step 1 - Calculate the hash:

• The formula for linear probing can be written as:

• The probing sequence for quadratic probing will be:

Step 1 - Calculate the hash:

• The formula for quadratic probing can be written as:

• A popular second hash or secondary hash function is:

– It should never yield an index of zero

The formula for double hashing can be written as:

• The search for empty location will be in sequence:

1.3.3 Comparison of Probing Techniques

You might also like