0% found this document useful (0 votes)

86 views14 pages

File Structure Data Storage Query Evaluation Indexing and Hashing

The document discusses file organization and query processing. It covers fixed-length records, variable-length records represented using byte strings, pointers, and reserved space. It also discusses sequential file organization, including deletion, insertion, and the need to reorganize files over time. Query processing involves parsing, optimization of evaluating queries in the most efficient way, and evaluation of the optimized query execution plan.

Uploaded by

Tsega Solomon Kidane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views14 pages

File Structure Data Storage Query Evaluation Indexing and Hashing

Uploaded by

Tsega Solomon Kidane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

File Organization

Chapter 6

File structure
Data storage
Query Evaluation
Indexing and hashing

Fixed-Length Records
Suppose we have a table that has the following organization:

type deposit = record

branch-name : char(22); 22

account-number : char(10); 10 40

balance : real; 08
File structure & Data storage End

Assumptions: If each character occupies 1 byte and a real occupies 8

bytes, then this record occupies 40 bytes. If the first record occupies the
first 40 bytes and the second record occupies the second 40 bytes.

1
Fixed-Length Records Fixed-Length Records

Problems with this approach are:

Difficult to delete a record, because there is no way to identify deleted

record. How it can be used for another record ? Record 2 Deleted and All Records
Moved

If fix the size then it may possible some records will cross block

boundaries and it would require two block access to read or write such a

record.

Fixed-Length Records Fixed-Length Records

Record 2 deleted and Final Record

Moved

2
Fixed-Length Records Variable-Length Records

Variable-Length Records Variable-Length Records

Variable-length records arise in database systems in several ways: Byte string representation

Storage of multiple record types in a file. Attach an end-of-record (^) control character to the end of each record
Difficulty with deletion and growth (how to reuse deleted space?)
Record types that allow variable lengths for one or more fields.
No space, in general, for a record to grow
Record types that allow repeating fields (used in some older data
models).

Different methods to represent variable-length records

Byte string representation

Slotted page structure

Fixed-length representation

3
Variable-Length Records Variable-Length Records
Pointer method

A variable-length record is represented by a list of fixed-length records,

chained together via pointers.

Can be used even if the maximum record length is not known

Variable-Length Records Variable-Length Records

Use one or more fixed length records:
Pointer method
reserved space
Disadvantage to pointer structure; space is wasted in all records except
pointers
the first in a chain.
Reserved space – can use fixed-length records of a known maximum length;
unused space in shorter records filled with a null or end-of-record symbol.

Disadvantage: useful when most of records are of length near to maximum

otherwise wastage of space

Wastage
of space

4
Variable-Length Records Sequential File Organization (Cont.)
Pointer method Deletion – use pointer chains
Insertion –locate the position where the record is to be inserted
Disadvantage to pointer structure; space is wasted in all records except the first in a chain.
if there is free space insert there
Solution is to allow two kinds of block in file:
if no free space, insert the record in an overflow block
Anchor block – contains the first records of chain
In either case, pointer chain must be updated
Overflow block – contains records other than those that are the first records
Need to reorganize the file from time to time to restore sequential order
of chairs.

Sequential File Organization

Suitable for applications that require sequential processing of the entire file

The records in the file are ordered by a search-key

Example: account ( account-number, branch-name, balance )

Query Evaluation

5
Query Processing & Optimization Processing Steps
What is Query Processing?

Steps required to transform high level SQL query into a correct and
“efficient” strategy for execution and retrieval.

What is Query Optimization?

The activity of choosing a single “efficient” execution strategy (from

hundreds) as determined by database catalog statistics.

Three Major Steps of Processing Three Major Steps of Processing

Parsing and Translation
1. Parsing and translation
Translate query into its internal form.

2. Optimization This is then translated into relational algebra

Parser checks syntax, verifies relations.

3. Evaluation
A relational algebra expression may have many equivalent expressions.

E.g.

σbalance<2500(Πbalance(account)) is equivalent to

Πbalance(σbalance<2500(account))

6
Three Major Steps of Processing Measures of Query Cost
Parsing and Translation Cost is generally measured as total elapsed time for answering query.
EVALUAION PRIMITIVE: Relational algebra expression annotated with instruction
Factors contribute to time cost including:
specifying how to evaluate those operation
Disk accesses
Evaluation plan: Annotated expression specifying detailed evaluation
strategy. “ a sequence of primitive operations that can be used to evaluate a How does the index/hashing approach impact?
query is a query execution plan or query evaluation plan” CPU time
E.g., use an index on balance to find accounts with balance < 2500, or perform
Network communication
complete relation scan and discard accounts with balance ≥ 2500
Measured by taking into account
Πbalance
Number of seeks * average-seek-cost

σbalance<2500 ; use index 1(e.g. index 1 is index on balance) Number of blocks read * average-blockread-cost

Number of blocks written * average-blockwrite-cost

Three Major Steps of Processing Measures of Query Cost

Optimization Cost to write a block is greater than cost to read a block as data is read
It is the process of selecting the most efficient query evaluation plan for a back after being written to ensure that the write was successful
query. For simplicity, we just use the number of block transfers from disk and the

To choose among different query evaluation plans, the optimizer has to number of seeks as the cost measures.

estimate the cost of each evaluation plan. Cost is estimated using tT – time to transfer one block

statistical information from the database catalog. E.g. number of tuples in tS – time for one seek

each relation, size of tuples, etc. Cost for b block transfers plus S seeks:

b * tT + S * tS
for simplicity we ignore.
Evaluation
CPU costs (Real systems do take CPU cost into account)
Once the query plan is chosen, the query is evaluated with that plan, and Costs of writing to disk. (Taken into account separately where necessary)
the result of the query is output.

7
Index Evaluation Metrics
Access types

Access time

Insertion time

Indexing and Hashing Deletion time

Space overhead

Basic Concepts Ordered Indices

Indexing mechanisms used to speed up access to desired data. In an ordered index, index entries are stored sorted on the search key

E.g., author catalog in library value. E.g., author catalog in library.

Search Key - attribute to set of attributes used to look up records in a Primary index: in a sequentially ordered file, the index whose search key

file. specifies the sequential order of the file.

An index file consists of records (called index entries) of the form Also called clustering index

search-key pointer The search key of a primary index is usually but not necessarily the
Index files are typically much smaller than the original file primary key.

Two basic kinds of indices: Secondary index: an index whose search key specifies an order different

Ordered indices: search keys are stored in sorted order from the sequential order of the file. Also called non-clustering index.

Hash indices: search keys are distributed uniformly across

“buckets” using a “hash function”.

8
Ordered Indices…Primary Index Primary Index…Sparse Index Files
Sparse Index: contains index records for only some search-key values.
Index-sequential file: ordered sequential file with a primary index.
Applicable when records are sequentially ordered on search-key

To locate a record with search-key value K we:

Find index record with largest search-key value < K

Search file sequentially starting at the record to which the index

record points

Sequential file for account records (sorted branch name)

Two type of ordered indices that we can use:

1) Dense Index 2) Sparse Index

Primary Index…Dense Index Files Primary Index…Sparse Index Files

Compared to dense indices:
Dense index — Index record (or index entry) appears for every search-
key value in the file. Less space and less maintenance overhead for insertions and
deletions.
Index entry for each search key values
Generally slower than dense index for locating records.

Good tradeoff: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.

9
Primary Index…Sparse Index Files Primary Index…Multilevel Indices
Even with a sparse index, index size may still grow too large.

For 100,000 records, 10 per block, at one index record per block, that's
10,000 index records! Even if we can fit 100 index records per block, this is
100 blocks.

If index is too large to be kept in main memory, a search results in several

disk reads.

Binary search method can be used for search : for b blocks blocks
to be read

Primary Index…Multilevel Indices Primary Index…Index Update: Deletion

If primary index does not fit in memory, access becomes expensive. If deleted record was the only record in the file with its particular search-
key value, the search-key is deleted from the index also.
Solution: treat primary index kept on disk as a sequential file and construct
a sparse index on it. Single-level index deletion:

outer index – a sparse index of primary index Dense indices – deletion of search-key: similar to file record deletion.

inner index – the primary index file Sparse indices –

If even outer index is too large to fit in main memory, yet another level of if an entry for the search key exists in the index, it is deleted by
index can be created, and so on. replacing the entry in the index with the next search-key value in the
file (in search-key order).
Indices at all levels must be updated on insertion or deletion from the file.
If the next search-key value already has an index entry, the entry is
deleted instead of being replaced.

10
Primary Index…Index Update: Deletion Primary Index…final words

In primary index sequential search is allowed because record file is in

sorted according to search key.

So we can also use sparse index form in primary index.

Primary Index…Index Update: Insertion Ordered Indices…Secondary Indices

Single-level index insertion:

Perform a lookup using the search-key value appearing in the record

to be inserted.

Dense indices – if the search-key value does not appear in the

index, insert it.

Sparse indices – if index stores an entry for each block of the file, no
change needs to be made to the index unless a new block is created. Can we perform sequential search if we use dense index for

If a new block is created, the first search-key value appearing in secondary index?

the new block is inserted into the index. Can we use sparse index for secondary index?

Multilevel insertion (as well as deletion) algorithms are simple extensions

of the single-level algorithms

11
Ordered Indices…Secondary Indices Primary and Secondary Indices
It is not enough to point to just the first record with each search-key value Indices offer substantial benefits when searching for records.
because the remaining records with the same search-key value could be
BUT: Updating indices imposes overhead on database modification
anywhere in the file.
--when a file is modified, every index on the file must be updated,
Therefore, a secondary index must contain pointers to all the
Sequential scan using primary index is efficient, but a sequential scan
records.
using a secondary index is expensive
Use an extra-level of indirection to implement secondary indices on
Each record access may fetch a new block from disk
search keys that are not candidate keys. A pointer does not point directly
Block fetch requires about 5 to 10 micro seconds, versus about 100
to the file but to a bucket that contains pointers to the file.
nanoseconds for memory access

Ordered Indices…Secondary Indices

Hashing

Secondary index on balance field of account

Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.

Secondary indices have to be dense

12
Static Hashing Hash Functions
Worst hash function maps all search-key values to the same bucket;
A bucket is a unit of storage containing one or more records (a bucket
search time proportional to number of search key values.
is typically a disk block).
Two properties of hash function
In a hash file organization we obtain the bucket of a record directly
Uniform : each bucket is assigned the same number of search-key
from its search-key value using a hash function.
values
Hash function h is a function from the set of all search-key values K to
Random : each bucket will have the same number of records
the set of all bucket addresses B. h: K B
assigned.
Hash function is used to locate records for access, insertion as well as
Typical hash functions perform computation on the internal binary
deletion.
representation of the search-key.
Records with different search-key values may be mapped to the same For example, for a string search-key, the binary representations of all
bucket; thus entire bucket has to be searched sequentially to locate a the characters in the string could be added and the sum modulo the
record. number of buckets could be returned. .

Example of Hash File Organization Handling of Bucket Overflows

Bucket overflow can occur because of

Insufficient buckets : nb > nr / fr

Hash file organization of Skew in distribution of records (some buckets are overflow while
account file, using
some are free). This can occur due to two reasons:
branch_name as key
multiple records have same search-key value

chosen hash function produces non-uniform distribution of key

values

Although the probability of bucket overflow can be reduced, it cannot be

eliminated; it is handled by using overflow buckets.

13
Handling of Bucket Overflows (Cont.)
Overflow chaining – the overflow buckets of a given bucket are chained
together in a linked list.

Above scheme is called closed hashing.

An alternative, called open hashing, which does not use overflow

buckets, is not suitable for database applications.

This Study Resource Was: IS 665 Data Analysis For Information Systems Technical Assignment 1 Part I. Statistics (40 PTS.)
No ratings yet
This Study Resource Was: IS 665 Data Analysis For Information Systems Technical Assignment 1 Part I. Statistics (40 PTS.)
7 pages
ATS CONTROLLER HAT600 Module Protocol-En
100% (1)
ATS CONTROLLER HAT600 Module Protocol-En
9 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Indexing
No ratings yet
Indexing
62 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
File Organization
No ratings yet
File Organization
41 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Unit 5
No ratings yet
Unit 5
185 pages
Unit v Dbms Question and Answer
No ratings yet
Unit v Dbms Question and Answer
9 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
File Organization
No ratings yet
File Organization
11 pages
Layers of a DBMS
No ratings yet
Layers of a DBMS
38 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
file organization
No ratings yet
file organization
9 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
8 Query Optimization
No ratings yet
8 Query Optimization
39 pages
Unit-1-Lecture-9
No ratings yet
Unit-1-Lecture-9
22 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
09_FIle.pptx
No ratings yet
09_FIle.pptx
22 pages
Presentation ON File Organisation: Submitted To: Mrs. Sonal Beniwal
No ratings yet
Presentation ON File Organisation: Submitted To: Mrs. Sonal Beniwal
23 pages
Unit 6 notes DBMS final
No ratings yet
Unit 6 notes DBMS final
14 pages
Class 6
No ratings yet
Class 6
15 pages
Unit -5 - part 1
No ratings yet
Unit -5 - part 1
49 pages
Unit 6 File Indexing and Transaction Processing
No ratings yet
Unit 6 File Indexing and Transaction Processing
21 pages
DBMSNOTes
No ratings yet
DBMSNOTes
14 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Chapter_3_File_Organization_Indexed_methods
No ratings yet
Chapter_3_File_Organization_Indexed_methods
31 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
chapter 5
No ratings yet
chapter 5
20 pages
DBMS-UNIT 4
No ratings yet
DBMS-UNIT 4
26 pages
DS_TM_Study_Material_Presentations_Unit-4_1TM
No ratings yet
DS_TM_Study_Material_Presentations_Unit-4_1TM
22 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
58 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
UNIT-6 Important Questions & Answers
No ratings yet
UNIT-6 Important Questions & Answers
20 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
IT3031-L06-Indexing
No ratings yet
IT3031-L06-Indexing
45 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
DBMS-Unit5
No ratings yet
DBMS-Unit5
25 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Unit 1 Introduction To Dbms
No ratings yet
Unit 1 Introduction To Dbms
27 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Cha 4 PS I
No ratings yet
Cha 4 PS I
23 pages
Lecture - 01 - PPT PS I
100% (9)
Lecture - 01 - PPT PS I
67 pages
Introduction To Power Systems: (ECEG-3154)
No ratings yet
Introduction To Power Systems: (ECEG-3154)
14 pages
Introduction To Power Systems: (ECEG-3154)
No ratings yet
Introduction To Power Systems: (ECEG-3154)
65 pages
CH - 3. Power System Stablity
No ratings yet
CH - 3. Power System Stablity
72 pages
Introduction To Power Systems: (ECEG-3154)
No ratings yet
Introduction To Power Systems: (ECEG-3154)
41 pages
Difference Between Web of Things and Internet of Things
No ratings yet
Difference Between Web of Things and Internet of Things
8 pages
Package Telegram - Bot': R Topics Documented
No ratings yet
Package Telegram - Bot': R Topics Documented
56 pages
BPXBatch
No ratings yet
BPXBatch
18 pages
Pentium M: - 1.30 GHZ To 1.70 GHZ - Primary 32-Kb Instruction Cache - 1-Mb Second Level Cache
No ratings yet
Pentium M: - 1.30 GHZ To 1.70 GHZ - Primary 32-Kb Instruction Cache - 1-Mb Second Level Cache
27 pages
Shift Registers - L14
No ratings yet
Shift Registers - L14
16 pages
SG7000 Maintenance Manual-Emergency Maintenance PDF
No ratings yet
SG7000 Maintenance Manual-Emergency Maintenance PDF
45 pages
102184B RM ATCommands Released
No ratings yet
102184B RM ATCommands Released
206 pages
Java MP
No ratings yet
Java MP
14 pages
Quiz 1 - Final - Solution
No ratings yet
Quiz 1 - Final - Solution
6 pages
Java Create Class
No ratings yet
Java Create Class
65 pages
Carlos CV
No ratings yet
Carlos CV
2 pages
AMA Nachi Robot Manual1
No ratings yet
AMA Nachi Robot Manual1
19 pages
BPM Performance
No ratings yet
BPM Performance
106 pages
JD Webdev
No ratings yet
JD Webdev
2 pages
Audinate Dante: (Part 2)
No ratings yet
Audinate Dante: (Part 2)
5 pages
Supervisor CheatSheet
No ratings yet
Supervisor CheatSheet
3 pages
Mongodb - The Nosql Database
No ratings yet
Mongodb - The Nosql Database
7 pages
Verilog Semaphore
No ratings yet
Verilog Semaphore
7 pages
Topic 01 Introduction To Python
No ratings yet
Topic 01 Introduction To Python
40 pages
BSC CS Syllabus
No ratings yet
BSC CS Syllabus
50 pages
Creation Myth: Xerox, Apple, and The Truth About Innovation
No ratings yet
Creation Myth: Xerox, Apple, and The Truth About Innovation
7 pages
DOC-20250223-WA0008.
No ratings yet
DOC-20250223-WA0008.
8 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Spark SQL
100% (1)
Spark SQL
34 pages
Getting Started - Res3DInv
No ratings yet
Getting Started - Res3DInv
12 pages
15-Semaphores-05-09-2024
No ratings yet
15-Semaphores-05-09-2024
63 pages
Food Ordering App in Android Development
No ratings yet
Food Ordering App in Android Development
46 pages
Installation Instructions
No ratings yet
Installation Instructions
2 pages

Uploaded by

Uploaded by

File Organization

type deposit = record

Assumptions: If each character occupies 1 byte and a real occupies 8

Problems with this approach are:

Difficult to delete a record, because there is no way to identify deleted

Fixed-Length Records Fixed-Length Records

Record 2 deleted and Final Record

Variable-Length Records Variable-Length Records

Different methods to represent variable-length records

Byte string representation

Slotted page structure

A variable-length record is represented by a list of fixed-length records,

Can be used even if the maximum record length is not known

Variable-Length Records Variable-Length Records

Disadvantage: useful when most of records are of length near to maximum

Sequential File Organization

The records in the file are ordered by a search-key

Example: account ( account-number, branch-name, balance )

What is Query Optimization?

The activity of choosing a single “efficient” execution strategy (from

Three Major Steps of Processing Three Major Steps of Processing

2. Optimization This is then translated into relational algebra

Parser checks syntax, verifies relations.

Number of blocks written * average-blockwrite-cost

Three Major Steps of Processing Measures of Query Cost

Indexing and Hashing Deletion time

Basic Concepts Ordered Indices

E.g., author catalog in library value. E.g., author catalog in library.

file. specifies the sequential order of the file.

Hash indices: search keys are distributed uniformly across

To locate a record with search-key value K we:

Find index record with largest search-key value < K

Search file sequentially starting at the record to which the index

Sequential file for account records (sorted branch name)

Two type of ordered indices that we can use:

1) Dense Index 2) Sparse Index

Primary Index…Dense Index Files Primary Index…Sparse Index Files

If index is too large to be kept in main memory, a search results in several

Primary Index…Multilevel Indices Primary Index…Index Update: Deletion

inner index – the primary index file Sparse indices –

In primary index sequential search is allowed because record file is in

So we can also use sparse index form in primary index.

Primary Index…Index Update: Insertion Ordered Indices…Secondary Indices

Perform a lookup using the search-key value appearing in the record

Dense indices – if the search-key value does not appear in the

Multilevel insertion (as well as deletion) algorithms are simple extensions

Ordered Indices…Secondary Indices

Secondary index on balance field of account

Secondary indices have to be dense

Example of Hash File Organization Handling of Bucket Overflows

Bucket overflow can occur because of

Insufficient buckets : nb > nr / fr

chosen hash function produces non-uniform distribution of key

Although the probability of bucket overflow can be reduced, it cannot be

Above scheme is called closed hashing.

An alternative, called open hashing, which does not use overflow

You might also like