0% found this document useful (0 votes)

38 views44 pages

IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing

Uploaded by

Jaime sayago heredia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views44 pages

IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing

Uploaded by

Jaime sayago heredia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

IN3020/4020 – Database Systems

Spring 2020, Week 3.1

INDEXING

Dr. M. Naci Akkøk, Chief Architect, Oracle Nordics

Based upon slides by E. Thorstensen from Spring 2019
We´ll be looking at indexes & indexing
o ConvenHonal indexes
o B-rees and hashing
o MulHdimensional indexes
o Tree structures
o Hash-like structures
o Bitmap indexes
Index

o An index on an attribute A is a data structure that facilitates

finding the elements with a certain value in A ( the search
key)
o The simplest type is probably the hash-index
o There are other (more advanced) indexing techniques based
upon binary search
Search keys are created to search for a single entry or a set of

Index
entries in an index. Search keys may only be constructed for
the key columns in the index, and may contain one or more
column values.

o An index on an attribute A is a data structure that facilitates finding the

elements with a certain values in A ( the search key)
o The index is sorted on the search key
o For each value of the search key, the index has a list of pointers to the
corresponding records
o More than one index in the same file means
o Faster search
o More complexity (changes will lead to updated indexes as well)
o Increased storage requirement, larger files
Different types of indexes
o Dense vs. sparse indexes
o Primary Index
o The data file is sorted (physically) on the search key
o Maximum one data entry for each search-key value
o Cluster index
o The data file is (sHll) sorted (physically) on the search key
o Allows more than one data entries with same search-key value
o Secondary index
The data file is NOT sorted (physically) on the related search key
A candidate key is a combination of attributes

Overview of indexes that uniquely identify a database record without

referring to any other data. Each table may have
one or more candidate. One of these candidate
keys is selected as the table primary key.

Search key is a candidate Search key is not a

Data ﬁle
key candidate key
Primary index dense or
Sorted on search key clustering dense or sparse
sparse
Not sorted on search key Secondary index dense Secondary index dense

Quick overview:
hVps://en.wikipedia.org/wiki/Database_index
Less quick but more detailed overview (for geeks):
hVps://www.geeksforgeeks.org/indexing-in-databases-set-1/
Primary index:
Dense vs. Sparse Indexes
A dense index has one
lookup for each value of the
search key

A sparse index has one

lookup for each data block
• Assume that we have

A simple comparison
• 1.000.000 entries of 300B,4B search key, 4B pointers
• 4K block size, average 5,6ms for fetching a block
• 13,6 records per block, i.e., 76924. blocks with data
• 512 indexes per block, i.e., 1954 blocks for a dense
o Without index: index and 151 blocks for a sparse index
o 76924/2 = 38462 average block access.
o Takes 38462 * 5,6 ms = 215,4 s
o With dense index and binary search:
o [log2(1954)] + 1 = 11 + 1 = 12 block accesses (max.),
o Takes 12 * 5,6 ms = 67,2 ms
o 3205 Hmes faster than the one without indexes!
o With sparse index and binary search:
o [log2(151)] + 1 = 8 + 1 = 9 block accesses (max.),
Takes 9 * 5,6 ms = 50,4 ms
o 4272 Hmes faster as compared to no index, and 1,33 Hmes faster than dense index
A simple comparison
o An index can occupy several blocks
o A mulH-level index (i.e., an index on an
index) can improve performance
o ConHnuing the example:
o Needs only 1954 / 512 = 4 blocks for 2. level h
o [log2(4)] + 1 + 1 = 2 + 1 + 1 = 4 block access,
takes 4 * 5,6 ms = 22,4 ms
o 2,25 faster than a simple sparse index,
3 Hmes faster than a dense index
o One can in principle have any number of
indices
Deletion in the case of
sparse index
o Delete entry where a = 60
o No change necessary for the
index

o Delete entry with a = 40

o The first entry in the block
is updated, which means
that the index needs to be
updated
Deletion in the case of
dense index
o Delete entry where a = 60
o Delete entry where a = 40

o One ususally prefers to

compress the data in the
blocks
o One can also compress the
whole data set, but we usually
like to keep free space for
future inserts
Insertion in the case of
sparse index
o Insert entry with a = 60
o Lucky! Place available exactly
where we need it!
o Insert entry with a = 25
o Must move entry with a = 30 to
next block to make space for 25
o The first entry on block two is
changed, and the index must be
updated
o NOTE: We could have inserted a
new block or overflow block
InserQon in the case of sparse
(and dense) index
o Insert entry with a = 95
o No space. Insert a new/overflow
block
o Overflow block: No need to do
much with the indexes. Need only
a pointer to the main blocks
o Overflow block: The index needs
to be updated too
o Insertion in the case of dense
indexes is the same, except that the
index must be updated every time
Cluster Index - Duplicate (or Multiple)
Search Keys
o If the file is sorted, a cluster
index can be used even if the
search key is not uniqe
o Example 1 – dense index:
o One index field per entry
o Makes it easy to find entries
and how there are of each
o Too many fields? More than
necessary? 🧐
Cluster Index – Dense Index
o Only one index field per
uniqe search key
o Less index – faster search
☺
o Finding subsequent
(intermediary) records is
more complex 🙃
Cluster Index – Sparse Index
o Index fields point to the
first entry in each block

o Even fewer indexes – faster

search ☺
o Finding records is more
complex 🙃
Unsorted Files and Secondary Indices
o One can use a secondary
index if the files is
unsorted (or sorted on
another attribute)
o Sorted on the search
key: fast search
o First level is always
dense, higher levels are
sparse
o Duplicates are allowed
Dense vs. Sparse Indices

DENSE SPARSE
Space required One index field per entry One index field per data block
Block access “Many” “Feẘ”
Access to entry Direct access Must search in the data block
Must always access the data
Exists-queries Uses only the index
block
Usage All cases Not on unsorted elements
Always updated if entry Updated only if the first entry of
Updates/changes
sequence is changed the data block is changed
Inverted Indices

o What if we want to search for items within an attribute?

o SELECT * from R where a like ´%cat%´
o Search for documents that contain certain keywords, e.g.
search engines like Google, Altavista, Excite, Lycos,
AllTheWeb etc.
Inverted Indices
(called “inverted” because…)
o Direct Index: Entries of the form
(id1: document1), (id2: document2), ...
o Look up an ID, access the document
o Inverted index: Entries of
the form
(computer: [id1]),
(disk: [id2, id3]), ...
o Look up a keyword,
access all relevant
document IDs
Data Structures for Indices

o So, how do we represent an index?

o Obvious idea: Sorted list.

o BeVer ideas?
B+ Trees (Balanced Binary Trees)
o The nodes are blocks. All leaf nodes are at the same level. Each node has n search keys and n + 1
pointers.
o Inner node: all pointers are to sub-nodes.
o Leaf node: n data-pointers and 1 next-pointer.

o All nodes must contain a certain amount of

search keys / pointers
o Inner node: at least ⌈ (n + 1) / 2 ⌉ pointers to sub-nodes.
o Leaf node: at least ⌊ (n + 1) / 2 ⌋ data pointers.
B+ Trees: Efficiency
o B+ trees:
☹ A search must always start from the root to a leaf node, i.e., the number of block accesses is
equal to the height of the tree plus access to the records themselves in the data file.
☺ The number of levels is usually very low (typically 3)
☺ Interval search is fast
☺ For large n, it is rarely necessary to split / merge nodes
☺ Disk I / O can be reduced by keeping some of the index blocks in memory
o Example: 4B search keys, 8B pointers, 4KB blocks
o How many values can be stored in each node?
4𝑛 + 8 𝑛 + 1 ≤ 4096 → 𝑛 = 𝟑𝟒𝟎
o The nodes are on the average 75% full. How many records can a 3-level B+ tree contain?
340 × 75% 3 = 16679103,875 ≈ 16,6 million records!
Hash Tables
o Uses a hash function on the search key to an array
index with points to which bucket possibly contains
information about the current record
o Each bucket is a block (with support for overflow
blocks)
o Array size is usually a prime number
o Important for a hash-function!
o Fast
o A good distribution of the search keys to
buckets
o Example: Array size B = 5; h(key) = mod(key, B)
Hash Table: Efficiency
o Ideally, the array size is large enough so that all elements of
one hash value fit into one bucket block.

☺ Then we get signiﬁcantly fewer disk operaHons than with

regular indices and B-trees
☺ Fast search for specific search key
☹ MulHple entries can lead to more blocks per bucket
☹ Poor on interval search
Dynamic Hash Tables
o Difficult to keep all items for one hash value within a bucket
block if the number of records increases while the hash
table remains staHc
o Dynamic hash tables allow the array size to vary so that it is
sufficient with one block per bucket
o Extensible hashing
o Linear hashing
Sequential vs. Hash Indices
o Sequential indexes such as B trees are good at interval
search:

select * from R where a > 5

o Hash indexes are good when searching for a specific key:

select * from R where a = 5

Indices in SQL
o Syntaks (DBMS-dependent):
o create index name on relaHonName (aVribute)
o create unique index name on relaHonName (aVribute)
o drop index name

o NOTE: Not all DBMSs allow us to specify

o type of index, e.g. B-three, hashing, etc.
o parameters such as load factor, hash size etc.
Indices in PSQL
Default index in Postgres is B-tree.
“B-trees can handle equality and range queries on data that
can be sorted into some ordering”
https://www.postgresql.org/docs/9.2/static/indexes-
types.html

Works for =, <, <=, BETWEEN and IN.

Also for LIKE, but limited.
Indices in PSQL

o There is also something called GiST. This is a mechanism for

implementing indexes for particular data types.

o Can be used in many cases. Some is included in postgres by

default.
Queries with several condiQons
select ... from R where a = 30 and b < 5

oStrategy 1:
o Use an index, f.ex. on a
o Find and fetch all records with a = 30.
o Search through these records to ﬁnd records with b < 5.
☺ Simple strategy
☹ Risks reading many unnecessary records from disk (storage)
Several conditions: Strategy 2
select c from R where a = 30 and b = ´x´

o Use two dense indexes: One

for a, one for b
o Find all pointers to records
with a = 30.
o Find all pointers to entries with
b=‘x’.
o Compare (intersect) pointers
and fetch the relevant records
o Pick the relevant attributes
Multidimensional indices
o A multidimensional index combines multiple dimensions in
one index
o A simple tree-like approach:
Multidimensional indices: Example (with
dense indices)
select c from R where a = 30 and b = ´x´

o Search key = (30, x)

o Read the a dimension.
o Look for 30, ﬁnd the
related index fo the b-
dimension
o Search for x, read relevant disk
block and fetch the record
o Pick ou the relevant aVributes
Multidimensional indices: For which queries
is this a good index to use?
😊 Find records with a = 10 and b = ´x´
😊 Find records with a = 10 and b >= ´x´
☹ Find records with a = 10
☹ Find records with b = ´x´
🤔 Find records with a >= 10 and b = ´x´
o Risks searching through many indexes in the next dimension
o Would have been betrer if the dimension order could be changed
o But there are many other alternaHves...
o Other tree-like structures,
o Hash-like structures
o Bitmap indices
Map View

o We can visulize a multidimensional

index with two dimensions as a map

o Search is then equivalent to a serach

on the map for:
o Points: a1 and b1
o Lines: a2 and < b2, b3 >
o Areas: < a3, a4 > and < b4, b5 >
Tree structures
o There are many tree-like structures that correspond to searching for map
areas:
o k-d trees (a binary search tree in which every leaf node is a k-dimensional
point. Very useful for range and nearest neighbor searches, and for
multidimensional search keys)
o Quad-trees (a tree data structure in which each internal node has exactly
four children. Often used to partition a two-dimensional space by
recursively subdividing it into four quadrants or regions. Used in image
compression, also in spatial search etc.)
o R-trees (tree data structures used for spatial access methods, i.e., for
indexing multi-dimensional information such as geographical coordinates,
rectangles or polygons. )
Tree structures - characterisQcs

o All these tree structures must saHsfy at least one of the

following characterisHcs of B trees:
o Balancing - all leaf nodes are at the same level
(a B-tree is a self-balancing tree)
o Correspondence between tree nodes and disk blocks
o Performance for update operaHons.
R-trees
o The basic idea:
Group geometric objects that are close to each other
o Nodes are blocks. All leaf nodes are at the same level
o Inner node: The smallest rectangle that covers all the
objects in the subtree
o Leaf node: An object
o All inner nodes must contain a certain number of pointers

o Search (intersecHon, contained in, nearest neighbor) is easy

o InserHon is challenging
o The tree should be completely balanced
o Rectangles should not contain too much space
o Rectangles with common parent node should not overlap
o May need to delete and re-insert objects for beVer
placement
Hash-like structures: Grid files
Grid files extend traditional hash indices to
multiple dimensions
o Hashes values for each attribute in a multi-
dimensional index
o Does not usually hash to individual values
but to regions: h(key) = <x, y>
o Grid lines partition areas into stripes
Example (2 dimensions):
Find record with (a, b) = (22, 31)
h1(22) = <ax, ay>
h2(31) = <bm, bn>
→ enter f
Grid files
o Grid files are good for finding records/entries with
o key1 = Vi and key2 =Xj
o key1 = Vi
o key2 = Vj
o key1 >= Vi and key2 < Xj
o Grid files:
☺ Are good for searches with mulHple keys
☹ Use too much space, and require some ornagizing
Bitmap indices
o StarHng point: Each recorder is assigned an unchanging,
unambiguous number
o Numbering from 1 to n
o The number can be considered a record ID and cannot be reused
even if the record is deleted
o Select the field F to be indexed
o For each value v used for F in one of the records, create a bit
vector bv of length n
o If record nr. i has F = v, let bv[i]=1
o If record nr. i has F ≠ v, let bv[i]=0
Bitmap indices: Example
Bitmap index characterisQcs
o Space requirements:
o Total number of bits is #records * #values
o In the worst case, n2 bits is needed (but then each bit vector has only one 1-bit)
o Bit vectors can be compressed; there is never more than n 1-bits total in the bit
vectors
o Effective for:
o Partial match queries (= state values for some fields, find all that have given
values)
o Calculate bit by bit and across the bitmap indexes for the relevant attributes
o Range queries (= enter intervals for some fields, find all that have values within
the ranges)
o Calculate bit by bit within the intervals

CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Indexing
No ratings yet
Indexing
24 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
INDEXING
No ratings yet
INDEXING
10 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
DINLect1.pptx
No ratings yet
DINLect1.pptx
69 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
Aplikasi DB-MKG 7
No ratings yet
Aplikasi DB-MKG 7
22 pages
Indexing
No ratings yet
Indexing
8 pages
Indexing and Hashing: B.Ramamurthy
No ratings yet
Indexing and Hashing: B.Ramamurthy
24 pages
Index Structures
No ratings yet
Index Structures
34 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
03 UW Indexing (1)
No ratings yet
03 UW Indexing (1)
97 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
File Organization and Indexing (1)
No ratings yet
File Organization and Indexing (1)
38 pages
Lec20Indexing_v1
No ratings yet
Lec20Indexing_v1
57 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
index1 (5)
No ratings yet
index1 (5)
25 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
UNIT IV
No ratings yet
UNIT IV
29 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
APznzaau2 Qp6kQQWFsnXwvNI9mRcLmOzCEkKD6XNs8L1jR BhV1PFdRVVjZd8UbQlgVD2em6PSiesHntJxnE6ihEzMeDrE4RhBGR5X9KstSsrDfvlwogTn9 FGEx0uSBSqEuhwJ 7XtrewN6wGhq1Q0hThOfEbaC 2lntBPcupU2TlkQP FEFF0tzLTzzZTo6he
No ratings yet
APznzaau2 Qp6kQQWFsnXwvNI9mRcLmOzCEkKD6XNs8L1jR BhV1PFdRVVjZd8UbQlgVD2em6PSiesHntJxnE6ihEzMeDrE4RhBGR5X9KstSsrDfvlwogTn9 FGEx0uSBSqEuhwJ 7XtrewN6wGhq1Q0hThOfEbaC 2lntBPcupU2TlkQP FEFF0tzLTzzZTo6he
22 pages
Indexing
No ratings yet
Indexing
11 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
File Organization
No ratings yet
File Organization
41 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Indexing: Contents
No ratings yet
Indexing: Contents
13 pages
Indexes
No ratings yet
Indexes
70 pages
Ch14, Veiws, Normalization_summary.pptx
No ratings yet
Ch14, Veiws, Normalization_summary.pptx
68 pages
Indexing and Hashing
No ratings yet
Indexing and Hashing
20 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
02 - Indices
No ratings yet
02 - Indices
208 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
7 Indexing
No ratings yet
7 Indexing
13 pages
02 Blocking - Addional
No ratings yet
02 Blocking - Addional
74 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
DBMS Unit9 (2)
No ratings yet
DBMS Unit9 (2)
44 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
Indexing
No ratings yet
Indexing
62 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages

Uploaded by

Uploaded by

IN3020/4020 – Database Systems

Spring 2020, Week 3.1

Dr. M. Naci Akkøk, Chief Architect, Oracle Nordics

o An index on an attribute A is a data structure that facilitates

o An index on an attribute A is a data structure that facilitates finding the

Overview of indexes that uniquely identify a database record without

Search key is a candidate Search key is not a

A sparse index has one

o Delete entry with a = 40

o One ususally prefers to

o Even fewer indexes – faster

o What if we want to search for items within an attribute?

o So, how do we represent an index?

o Obvious idea: Sorted list.

o All nodes must contain a certain amount of

☺ Then we get signiﬁcantly fewer disk operaHons than with

select * from R where a > 5

o Hash indexes are good when searching for a specific key:

select * from R where a = 5

o NOTE: Not all DBMSs allow us to specify

Works for =, <, <=, BETWEEN and IN.

o There is also something called GiST. This is a mechanism for

o Can be used in many cases. Some is included in postgres by

o Use two dense indexes: One

o Search key = (30, x)

o We can visulize a multidimensional

o Search is then equivalent to a serach

o All these tree structures must saHsfy at least one of the

o Search (intersecHon, contained in, nearest neighbor) is easy

You might also like