50% found this document useful (2 votes)

1K views

Hashing

The document discusses hash tables and hashing. It defines a hash table as a data structure that uses a hash function to map keys to values for fast lookup. The key components are the hash function, which maps keys to indices in a table, and the table itself, which stores key-value pairs. Common operations are insertion, deletion, and search. Collisions can occur when different keys map to the same index, and techniques like chaining and open addressing are used to resolve collisions. Chaining stores colliding items in linked lists, while open addressing probes for empty slots elsewhere in the table. The document covers factors in hash table design like the hash function, table size, and collision handling schemes.

Uploaded by

Nimra Nazeer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

1K views

Hashing

Uploaded by

Nimra Nazeer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

Hashing & Hash Tables

Cpt S 223. School of EECS, WSU 1

Overview
 Hash Table Data Structure : Purpose
 To support insertion, deletion and search in
average-case constant time
 Assumption: Order of elements irrelevant
 ==> data structure *not* useful for if you want to
maintain and retrieve some kind of an order of the
elements
 Hash function
 Hash[ “string key”] ==> integer value
 Hash table ADT
 Implementations, Analysis, Applications
Cpt S 223. School of EECS, WSU 2
Hash table: Main components

key value

Hash index

TableSize
“john” h(“john”)

key
Hash
function

Hash table
How to determine … ? (implemented as a vector) 3
Cpt S 223. School of EECS, WSU
Hash Table Operations
Hash
function
Hash key
 Insert
 T [h(“john”)] = <“john”,25000>

Data
 Delete record
 T [h(“john”)] = NULL

 Search
 T [h(“john”)] returns the
element hashed for “john”
What happens if h(“john”) == h(“joe”) ?
“collision”
Cpt S 223. School of EECS, WSU 5
Factors affecting Hash Table
Design
 Hash function

 Table size
 Usually fixed at the start

 Collision handling scheme

Cpt S 223. School of EECS, WSU 6

Hash Function
 A hash function is one which maps an
element’s key into a valid hash table index
 h(key) => hash table index

Note that this is (slightly) different from saying:

h(string) => int
 Because the key can be of any type

 E.g., “h(int) => int” is also a hash function!

 But also note that any type can be converted into
an equivalent string form
Cpt S 223. School of EECS, WSU 7
h(key) ==> hash table index

Hash Function Properties

 A hash function maps key to integer
 Constraint: Integer should be between

[0, TableSize-1]
 A hash function can result in a many-to-one mapping
(causing collision)
 Collision occurs when hash function maps two or more keys
to same array index
 Collisions cannot be avoided but its chances can be
reduced using a “good” hash function

Cpt S 223. School of EECS, WSU 8

h(key) ==> hash table index

Hash Function Properties

 A “good” hash function should have the
properties:
1. Reduced chance of collision
Different keys should ideally map to different
indices
Distribute keys uniformly over table

2. Should be fast to compute

Cpt S 223. School of EECS, WSU 9

Hash Function - Effective use
of table size
 Simple hash function (assume integer keys)
 h(Key) = Key mod TableSize

 For random keys, h() distributes keys evenly

over table
 What if TableSize = 100 and keys are ALL
multiples of 10?
 Better if TableSize is a prime number

Cpt S 223. School of EECS, WSU 10

Different Ways to Design a
Hash Function for String Keys
A very simple function to map strings to integers:
 Add up character ASCII values (0-255) to produce
integer keys
 E.g., “abcd” = 97+98+99+100 = 394
 ==> h(“abcd”) = 394 % TableSize
Potential problems:
 Anagrams will map to the same index
 h(“abcd”) == h(“dbac”)
 Small strings may not use all of table
 Strlen(S) * 255 < TableSize
 Time proportional to length of the string

Cpt S 223. School of EECS, WSU 11

Different Ways to Design a
Hash Function for String Keys
 Approach 2
 Treat first 3 characters of string as base-27 integer (26
letters plus space)
 Key = S[0] + (27 * S[1]) + (272 * S[2])
 Better than approach 1 because … ?

Potential problems:
 Assumes first 3 characters randomly distributed

 Not true of English

Apple
Apply collision
Appointment
Apricot

Cpt S 223. School of EECS, WSU 12

Different Ways to Design a
Hash Function for String Keys
 Approach 3
Use all N characters of string as an
N-digit base-K number

 Choose K to be prime number

larger than number of different
digits (characters)
 I.e., K = 29, 31, 37
 If L = length of string S, then
 L 1 
h( S )   S[ L  i  1]  37i  mod TableSize
 i 0  Problems:
 Use Horner’s rule to compute h(S) potential overflow
 Limit L for long strings larger runtime

Cpt S 223. School of EECS, WSU 13

“Collision resolution techniques”

Techniques to Deal with

Collisions

Chaining
Open addressing
Double hashing
Etc.
Cpt S 223. School of EECS, WSU 14
Resolving Collisions
 What happens when h(k1) = h(k2)?
 ==> collision !
 Collision resolution strategies
 Chaining
 Store colliding keys in a linked list at the same
hash table index
 Open addressing
 Store colliding keys elsewhere in the table

Cpt S 223. School of EECS, WSU 15

Chaining
Collision resolution technique #1

Cpt S 223. School of EECS, WSU 16

Chaining strategy: maintains a linked list at
every hash index for collided elements
Insertion sequence: { 0 1 4 9 16 25 36 49 64 81 }

 Hash table T is a vector of

linked lists
 Insert element at the head
(as shown here) or at the tail
 Key k is stored in list at
T[h(k)]
 E.g., TableSize = 10
 h(k) = k mod 10
 Insert first 10 perfect
squares

Cpt S 223. School of EECS, WSU 17

Implementation of Chaining
Hash Table
Vector of linked lists
(this is the main
hashtable)

Current #elements in
the hashtable

Hash functions for

integers and string
keys
Cpt S 223. School of EECS, WSU 18
Collision Resolution by
Chaining: Analysis
 Load factor λ of a hash table T is defined as follows:
 N = number of elements in T (“current size”)
 M = size of T (“table size”)
 λ = N/M (“ load factor”)
 i.e., λ is the average length of a chain

 Unsuccessful search time: O(λ)

 Same for insert time

 Successful search time: O(λ/2)

 Ideally, want λ ≤ 1 (not a function of N)

Cpt S 223. School of EECS, WSU 23

Potential disadvantages of
Chaining
Linked lists could get long
 Especially when N approaches M

 Longer linked lists could negatively impact

performance

More memory because of pointers

Absolute worst-case (even if N << M):

 All N elements in one linked list!

 Typically the result of a bad hash function

Cpt S 223. School of EECS, WSU 24

Open Addressing
Collision resolution technique #2

Cpt S 223. School of EECS, WSU 25

An “inplace” approach

Collision Resolution by
Open Addressing
When a collision occurs, look elsewhere in the
table for an empty slot
 Advantages over chaining
 No need for list structures
 No need to allocate/deallocate memory during
insertion/deletion (slow)
 Disadvantages
 Slower insertion – May need several attempts to find an
empty slot
 Table needs to be bigger (than chaining-based table) to
achieve average-case constant-time performance
 Load factor λ ≈ 0.5
Cpt S 223. School of EECS, WSU 26
Collision Resolution by
Open Addressing
 A “Probe sequence” is a sequence of slots in hash table while
searching for an element x
 h0(x), h1(x), h2(x), …

 Needs to visit each slot exactly once

 Needs to be repeatable (so we can find/delete what we’ve

inserted)

 Hash function
 hi(x) = (h(x) + f(i)) mod TableSize
 f(0) = 0 ==> position for the 0th probe
 f(i) is “the distance to be traveled relative to the 0th probe
position, during the ith probe”.
Cpt S 223. School of EECS, WSU 27
Linear Probing
i probe th 0th probe
index = index +i
 f(i) = is a linear function of i,
Linear probing:

0th probe
i occupied
E.g., f(i) = i
1st probe
occupied
occupied
2nd probe hi(x) = (h(x) + i) mod TableSize
3rd probe
…

Probe sequence: +0, +1, +2, +3, +4, …

unoccupied
Populate x here

Continue until an empty slot is found

#failed probes is a measure of performance
Cpt S 223. School of EECS, WSU 28
Linear Probing Example
Insert sequence: 89, 18, 49, 58, 69 time

#unsuccessful 0 0 1 3 3 7
probes:
Cpt S 223. School of EECS, WSU total 30
Linear Probing: Issues
Probe sequences can get longer with time
Primary clustering
 Keys tend to cluster in one part of table
 Keys that hash into cluster will be added to
the end of the cluster (making it even
bigger)
 Side effect: Other keys could also get
affected if mapping to a crowded
neighborhood
Cpt S 223. School of EECS, WSU 31
Random Probing: Analysis
 Random probing does not suffer from
clustering
 Expected number of probes for insertion or
unsuccessful search: 1 1
ln
 1 
 Example
 λ = 0.5: 1.4 probes
 λ = 0.9: 2.6 probes

Cpt S 223. School of EECS, WSU 33

Linear vs. Random Probing
Linear probing
Random probing
# probes

good bad

U - unsuccessful search Load factor λ

S - successful search
I - insert
Cpt S 223. School of EECS, WSU 34
Quadratic Probing
Quadratic probing:
 Avoids primary clustering
0th probe
i occupied 1st probe  f(i) is quadratic in i
occupied
2nd probe
e.g., f(i) = i2
hi(x) = (h(x) + i2) mod
occupied TableSize
3rd probe  Probe sequence:
+0, +1, +4, +9, +16, …
…

occupied Continue until an empty slot is found

#failed probes is a measure of performance
Cpt S 223. School of EECS, WSU 35
Q) Delete(49), Find(69) - is there a problem?

Quadratic Probing Example

Insert sequence: 89, 18, 49, 58, 69

+12
+12

+22
+22

+02
+02
+02 +02 +12 +02

#unsuccessful 0 1 2 2
0 5
probes:
Cpt S 223. School of EECS, WSU total 37
Quadratic Probing
 May cause “secondary clustering”

 Deletion
 Emptying slots can break probe sequence and
could cause find stop prematurely
 Lazy deletion
 Differentiate between empty and deleted slot
 When finding skip and continue beyond deleted slots
 If you hit a non-deleted empty slot, then stop find procedure
returning “not found”

 May need compaction at some time

Cpt S 223. School of EECS, WSU 39
Double Hashing: keep two
hash functions h1 and h2
 Use a second hash function for all tries I
other than 0: f(i) = i * h2(x)
 Good choices for h2(x) ?
 Should never evaluate to 0
 h2(x) = R – (x mod R)
 R is prime number less than TableSize
 Previous example with R=7
 h0(49) = (h(49)+f(0)) mod 10 = 9 (X)
 h1(49) = (h(49)+1*(7 – 49 mod 7)) mod 10 = 6

Cpt S 223. School of EECS, WSU f(1) 45

Double Hashing Example

Cpt S 223. School of EECS, WSU 46

Probing Techniques - review
Linear probing: Quadratic probing: Double hashing*:

0th try 0th try 0th try

i i 1st try i
1st try
2nd try 2nd try
2nd try
3rd try
…

3rd try 1st try

… 3rd try

…
*(determined by a second
Cpt S 223. School of EECS, WSU hash function) 48
Rehashing
 Increases the size of the hash table when load factor
becomes “too high” (defined by a cutoff)
 Anticipating that prob(collisions) would become

higher
 Typically expand the table to twice its size (but still
prime)
 Need to reinsert all existing elements into new hash
table

Cpt S 223. School of EECS, WSU 49

Rehashing Example

h(x) = x mod 7 h(x) = x mod 17

λ = 0.57 λ = 0.29

Rehashing
Insert 23

λ = 0.71

Cpt S 223. School of EECS, WSU 50

Rehashing Analysis
 Rehashing takes time to do N insertions
 Therefore should do it infrequently
 Specifically
 Must have been N/2 insertions since last
rehash
 Amortizing the O(N) cost over the N/2 prior
insertions yields only constant additional
time per insertion
Cpt S 223. School of EECS, WSU 51
Rehashing Implementation
 When to rehash
 When load factor reaches some threshold
(e.g,. λ ≥0.5), OR
 When an insertion fails

 Applies across collision handling

schemes
Cpt S 223. School of EECS, WSU 52
Hash Tables in C++ STL
 Hash tables not part of the C++
Standard Library
 Some implementations of STL have
hash tables (e.g., SGI’s STL)
 hash_set
 hash_map

Cpt S 223. School of EECS, WSU 55

Hash Set in STL
#include <hash_set>

struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};

void lookup(const hash_set<const char, hash<const char>, eqstr>& Set,

const char* word)
{
hash_set<const char*, hash<const char*>, eqstr>::const_iterator it
= Set.find(word);
cout << word << ": "
<< (it != Set.end() ? "present" : "not present")
<< endl;
}
Key Hash fn Key equality test
int main()
{
hash_set<const char*, hash<const char*>, eqstr> Set;
Set.insert("kiwi");
lookup(Set, “kiwi");
} Cpt S 223. School of EECS, WSU 56
Hash Map in STL
#include <hash_map>

struct eqstr
{
bool operator() (const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};

int main() Key Data Hash fn Key equality test

{
hash_map<const char*, int, hash<const char*>, eqstr> months;
Internally months["january"] = 31;
treated
months["february"] = 28;
like insert
(or overwrite …
if key months["december"] = 31;
already present) cout << “january -> " << months[“january"] << endl;
}

Cpt S 223. School of EECS, WSU 57

Problem with Large Tables
 What if hash table is too large to store
in main memory?
 Solution: Store hash table on disk
 Minimize disk accesses
 But…
 Collisions require disk accesses
 Rehashing requires a lot of disk accesses
Solution: Extendible Hashing
Cpt S 223. School of EECS, WSU 58
Hash Table Applications
 Symbol table in compilers
 Accessing tree or graph nodes by name
 E.g., city names in Google maps

 Maintaining a transposition table in games

 Remember previous game situations and the move taken

(avoid re-computation)
 Dictionary lookups
 Spelling checkers

 Natural language understanding (word sense)

 Heavily used in text processing languages

 E.g., Perl, Python, etc.

Cpt S 223. School of EECS, WSU 59

Summary
 Hash tables support fast insert and
search
 O(1) average case performance
 Deletion possible, but degrades
performance
 Not suited if ordering of elements is
important
 Many applications
Cpt S 223. School of EECS, WSU 60

M.Tech JNTUK ADS UNIT-5
No ratings yet
M.Tech JNTUK ADS UNIT-5
13 pages
2nd Sem VTU Python Lab Manual
No ratings yet
2nd Sem VTU Python Lab Manual
40 pages
COSC2429 - Intro To Programming Assessment 2 - Sem A 2021: RMIT Classification: Trusted
100% (1)
COSC2429 - Intro To Programming Assessment 2 - Sem A 2021: RMIT Classification: Trusted
3 pages
Structures and Unions in C: Slides Taken From: Alan L. Cox Computer Science Department Rice University Texas
100% (2)
Structures and Unions in C: Slides Taken From: Alan L. Cox Computer Science Department Rice University Texas
11 pages
Data Structure Unit 5 (Searching and Sorting Notes)
No ratings yet
Data Structure Unit 5 (Searching and Sorting Notes)
26 pages
Implementation of Stack AIM: Write A Program To Implement Stack As An Abstract Data Type Using Linked List and Use
No ratings yet
Implementation of Stack AIM: Write A Program To Implement Stack As An Abstract Data Type Using Linked List and Use
57 pages
Unit - 1 1.what Are The Features of Python. Explain The PVM. Ans: Following Are Some Important Features of Python
No ratings yet
Unit - 1 1.what Are The Features of Python. Explain The PVM. Ans: Following Are Some Important Features of Python
15 pages
Python Lab
No ratings yet
Python Lab
27 pages
Artificial Intelligence Lab Manual: Python
No ratings yet
Artificial Intelligence Lab Manual: Python
15 pages
Ge8151 Phython Prog Unit 4 New
No ratings yet
Ge8151 Phython Prog Unit 4 New
33 pages
Chapter-2 - Array Searching and Sorting
No ratings yet
Chapter-2 - Array Searching and Sorting
21 pages
Lab Manual Bca 3 Sem Data Structures-I
No ratings yet
Lab Manual Bca 3 Sem Data Structures-I
16 pages
GE3151 Problem Solving and Python Programming Lecture Notes 2
No ratings yet
GE3151 Problem Solving and Python Programming Lecture Notes 2
158 pages
03 Strings in Python
No ratings yet
03 Strings in Python
29 pages
DS&Algo - Lab Assignment Sheet - New
No ratings yet
DS&Algo - Lab Assignment Sheet - New
7 pages
Cd3291 Dsa Unit 5 Notes Eduengg
No ratings yet
Cd3291 Dsa Unit 5 Notes Eduengg
23 pages
Lab-manual-Advanced Python Programming 4321602
No ratings yet
Lab-manual-Advanced Python Programming 4321602
24 pages
Ad3251 Unit 2 Notes Edu Engg
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
Write A Python Program To Solve Quadratic Equation
No ratings yet
Write A Python Program To Solve Quadratic Equation
6 pages
List of Python Programs
No ratings yet
List of Python Programs
6 pages
Sets Python
No ratings yet
Sets Python
20 pages
SL Important Questions
No ratings yet
SL Important Questions
3 pages
Unit 3 - Week 1 Quiz
No ratings yet
Unit 3 - Week 1 Quiz
3 pages
CS3353 Question Bank
No ratings yet
CS3353 Question Bank
35 pages
Java Lab Programs
No ratings yet
Java Lab Programs
18 pages
Python Program (Journal)
No ratings yet
Python Program (Journal)
67 pages
Python Record
No ratings yet
Python Record
35 pages
Lecture 1.7 - Array Traversing Insert Delete Presentation
No ratings yet
Lecture 1.7 - Array Traversing Insert Delete Presentation
38 pages
Data Compression
No ratings yet
Data Compression
4 pages
stacks ppt
No ratings yet
stacks ppt
9 pages
Variables, Expressions, and Statements: Python For Informatics: Exploring Information
No ratings yet
Variables, Expressions, and Statements: Python For Informatics: Exploring Information
33 pages
Question Bank: Subject: Data Structures and Algorithms
No ratings yet
Question Bank: Subject: Data Structures and Algorithms
6 pages
List of Programs Subject Code: PCS-307 Subject: OOP Using C++ Programming Lab
No ratings yet
List of Programs Subject Code: PCS-307 Subject: OOP Using C++ Programming Lab
4 pages
UNIT 1 - Array Based Implementation
No ratings yet
UNIT 1 - Array Based Implementation
19 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
CS8381 Data Structures Record
No ratings yet
CS8381 Data Structures Record
107 pages
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
Nptel - Python For Data Science: Assignment 1 - Solution
No ratings yet
Nptel - Python For Data Science: Assignment 1 - Solution
3 pages
Hashing
No ratings yet
Hashing
24 pages
Trees: 1/34 Data Structures and Algorithms in Java
No ratings yet
Trees: 1/34 Data Structures and Algorithms in Java
34 pages
PHP Lab Manual
No ratings yet
PHP Lab Manual
11 pages
Python Sets
No ratings yet
Python Sets
12 pages
Data Structures Unit 2 Notes
No ratings yet
Data Structures Unit 2 Notes
51 pages
Python Strings Programs
100% (1)
Python Strings Programs
4 pages
Applications of Binary Trees
No ratings yet
Applications of Binary Trees
4 pages
Python Ass 2
No ratings yet
Python Ass 2
7 pages
19.python OOPs Concepts
No ratings yet
19.python OOPs Concepts
28 pages
Advance Algorithms PDF
0% (2)
Advance Algorithms PDF
2 pages
Topological Sort Homework
No ratings yet
Topological Sort Homework
5 pages
Circular Linked List Program in C
100% (1)
Circular Linked List Program in C
3 pages
CAT Questions
No ratings yet
CAT Questions
2 pages
Stack and Queue
No ratings yet
Stack and Queue
13 pages
Python Practical
No ratings yet
Python Practical
18 pages
FDS Unit 5
No ratings yet
FDS Unit 5
22 pages
SQL Practical
No ratings yet
SQL Practical
6 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
Ad3311 - Artificial Intelligence Lab Manual
No ratings yet
Ad3311 - Artificial Intelligence Lab Manual
30 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
46 pages
C & Data Structures
From Everand
C & Data Structures
Prof. P. Padmanabham
No ratings yet
Hashing PDF
No ratings yet
Hashing PDF
61 pages
Nakagami Distribution PDF
No ratings yet
Nakagami Distribution PDF
1 page
CourseReview Exam2
No ratings yet
CourseReview Exam2
9 pages
Practice Exam
0% (2)
Practice Exam
25 pages
Section 5 4 Electrostatic Boundary Value Problems Package
No ratings yet
Section 5 4 Electrostatic Boundary Value Problems Package
24 pages
Progress in Electromagnetics Research C, Vol. 25, 209-221, 2012
No ratings yet
Progress in Electromagnetics Research C, Vol. 25, 209-221, 2012
13 pages
Unit 3 Python
No ratings yet
Unit 3 Python
35 pages
1 Preliminaries: Data Structures and Algorithms
No ratings yet
1 Preliminaries: Data Structures and Algorithms
21 pages
Lab Manual Windows PowerShell Scripting
100% (1)
Lab Manual Windows PowerShell Scripting
21 pages
Hash Tables An Advanced Sorting
No ratings yet
Hash Tables An Advanced Sorting
43 pages
Notes ML for Data science
No ratings yet
Notes ML for Data science
14 pages
JavaScript Arrays
No ratings yet
JavaScript Arrays
13 pages
Python - Tutorial: #!/usr/bin/python Print "Hello, Python!"
No ratings yet
Python - Tutorial: #!/usr/bin/python Print "Hello, Python!"
174 pages
ASAM_XIL_Generic-Simulator-Interface_BS-2-5_CSharp-API-Technology-Reference-Mapping-Rules_V2-2-0
No ratings yet
ASAM_XIL_Generic-Simulator-Interface_BS-2-5_CSharp-API-Technology-Reference-Mapping-Rules_V2-2-0
17 pages
PAP Module 3
No ratings yet
PAP Module 3
84 pages
Python Notes ch3
100% (1)
Python Notes ch3
22 pages
1) What Is The Collection Framework in Java?
No ratings yet
1) What Is The Collection Framework in Java?
14 pages
Kendriya Vidyalaya O.N.G.C Mehsana
No ratings yet
Kendriya Vidyalaya O.N.G.C Mehsana
21 pages
Python-Study Materials - All Units
100% (1)
Python-Study Materials - All Units
162 pages
Py4Inf 09 Dictionaries
No ratings yet
Py4Inf 09 Dictionaries
30 pages
Fikadefinal PDF
No ratings yet
Fikadefinal PDF
95 pages
Python Notes
No ratings yet
Python Notes
21 pages
JS Objects
No ratings yet
JS Objects
25 pages
CS 561, Lecture 2: Randomization in Data Structures: Jared Saia University of New Mexico
No ratings yet
CS 561, Lecture 2: Randomization in Data Structures: Jared Saia University of New Mexico
46 pages
DSA Interview Questions
No ratings yet
DSA Interview Questions
13 pages
Lab1 - Introduction To Python
No ratings yet
Lab1 - Introduction To Python
9 pages
Chapter 9 Dictionaries
No ratings yet
Chapter 9 Dictionaries
6 pages
Perl Interview Questions
No ratings yet
Perl Interview Questions
25 pages
CompTIA PenTest+ Certification All-in-One Exam Guide, Second Edition (Exam PT0-002), 2nd Edition Raymond Nutting - Download the ebook today and experience the full content
100% (1)
CompTIA PenTest+ Certification All-in-One Exam Guide, Second Edition (Exam PT0-002), 2nd Edition Raymond Nutting - Download the ebook today and experience the full content
75 pages
Apb Demo Scoreboard - SV
No ratings yet
Apb Demo Scoreboard - SV
2 pages
Cyberpunk University - Python - The No-Nonsense Guide - Learn Python Programming Within 12 Hours! (2017, CreateSpace Independent Publishing Platform) PDF
No ratings yet
Cyberpunk University - Python - The No-Nonsense Guide - Learn Python Programming Within 12 Hours! (2017, CreateSpace Independent Publishing Platform) PDF
140 pages
MCA Assignment (Semester 2 + 3 Full) Sikkim Manipal University, SMU
100% (1)
MCA Assignment (Semester 2 + 3 Full) Sikkim Manipal University, SMU
222 pages
5-Dictionariess and Inheritance
No ratings yet
5-Dictionariess and Inheritance
14 pages
Python Jumpstart
No ratings yet
Python Jumpstart
25 pages
S6 IP Question Bank2024
No ratings yet
S6 IP Question Bank2024
8 pages

Uploaded by

Uploaded by

Hashing & Hash Tables

Cpt S 223. School of EECS, WSU 1

 Collision handling scheme

Cpt S 223. School of EECS, WSU 6

Note that this is (slightly) different from saying:

 E.g., “h(int) => int” is also a hash function!

Hash Function Properties

Cpt S 223. School of EECS, WSU 8

Hash Function Properties

2. Should be fast to compute

Cpt S 223. School of EECS, WSU 9

 For random keys, h() distributes keys evenly

Cpt S 223. School of EECS, WSU 10

Cpt S 223. School of EECS, WSU 11

 Not true of English

Cpt S 223. School of EECS, WSU 12

 Choose K to be prime number

Cpt S 223. School of EECS, WSU 13

Techniques to Deal with

Cpt S 223. School of EECS, WSU 15

Cpt S 223. School of EECS, WSU 16

 Hash table T is a vector of

Cpt S 223. School of EECS, WSU 17

Hash functions for

 Unsuccessful search time: O(λ)

 Successful search time: O(λ/2)

Cpt S 223. School of EECS, WSU 23

 Longer linked lists could negatively impact

More memory because of pointers

Absolute worst-case (even if N << M):

 Typically the result of a bad hash function

Cpt S 223. School of EECS, WSU 24

Cpt S 223. School of EECS, WSU 25

 Needs to visit each slot exactly once

 Needs to be repeatable (so we can find/delete what we’ve

Probe sequence: +0, +1, +2, +3, +4, …

Continue until an empty slot is found

Cpt S 223. School of EECS, WSU 33

U - unsuccessful search Load factor λ

occupied Continue until an empty slot is found

Quadratic Probing Example

 May need compaction at some time

Cpt S 223. School of EECS, WSU f(1) 45

Cpt S 223. School of EECS, WSU 46

0th try 0th try 0th try

3rd try 1st try

Cpt S 223. School of EECS, WSU 49

h(x) = x mod 7 h(x) = x mod 17

Cpt S 223. School of EECS, WSU 50

 Applies across collision handling

Cpt S 223. School of EECS, WSU 55

void lookup(const hash_set<const char*, hash<const char*>, eqstr>& Set,

int main() Key Data Hash fn Key equality test

Cpt S 223. School of EECS, WSU 57

 Maintaining a transposition table in games

 Natural language understanding (word sense)

 Heavily used in text processing languages

Cpt S 223. School of EECS, WSU 59

You might also like

void lookup(const hash_set<const char, hash<const char>, eqstr>& Set,