Indexing in DBMS
Indexing in DBMS
and Indexing
By
Dr. P. Tirupathi
Assistant Professor
Computer Science and Engineering
Data on External Storage
• The storage system in a DBMS refers to the
hierarchical arrangement of storage devices
and media to store, manage, and retrieve data
efficiently.
• The system is designed to handle different
storage capacities, access speeds, volatility,
and costs.
Data on External Storage
Storage System Hierarchy in DBMS
• The storage hierarchy typically has multiple
levels, each with its specific characteristics.
• A typical hierarchy from fastest (and usually
most expensive per byte) to slowest (and
usually least expensive per byte).
Data on External Storage
Storage System Hierarchy in DBMS
1. Registers
• Located within the CPU.
• Smallest and fastest type of storage.
• Used to hold data currently being processed.
2. Cache Memory
• On or very close to the CPU.
• Extremely fast but small in size.
• Acts as a buffer for frequently used data.
Data on External Storage
Storage System Hierarchy in DBMS
3. Main Memory (RAM)
• Data that's actively being processed is loaded here.
• Faster than secondary storage.
• Volatile in nature (data is lost when power is turned off
).
4. Flash Storage (Solid State Drives - SSD)
• No moving parts.
• Faster than traditional hard drives.
• More durable and reliable
Data on External Storage
Storage System Hierarchy in DBMS
5. Magnetic Disks (Hard Disk Drives - HDD)
• Non-volatile.
• Data is stored in tracks, sectors, and cylinders.
• Slower than main memory but offers a large storage
capacity at a lower cost.
6. Optical Disks (CD, DVD, Blu-Ray)
• Data is read using lasers.
• Slower than magnetic disks and usually have less
storage capacity.
• commonly used for media and software distribution.
File Organization and Indexing
Types of File Organization
There are three types of organizing the file:
1. Sequential File Organization
2. Direct File Organization
3. Indexed sequential File Organization
File Organization and Indexing
Sequential file organization
• Records are stored in sequence, one after the
other, based on a key field.
• This key field is a unique identifier for records,
ensuring that they have some order.
• The records are inserted at the end of the file,
ensuring the sequence is maintained
File Organization and Indexing
Sequential file organization: Advantages
• Ordered Records
• Continuous Memory Allocation:
• No Direct Access
• Simple
• Less Overhead
File Organization and Indexing
Sequential file organization: Disadvantages
• Inefficient for Random Access
• Expensive Insertion and Deletion
File Organization and Indexing
Example:
Id | Name
1 | Madhu
2 | Naveen
4 | Shivaji
5 | Durga
File Organization and Indexing
Direct (or Hashed) File Organization
• a hash function is used to compute the address
of a block (or bucket) where the record is
stored.
• The value returned by the hash function using
a record's key value is its address in the
database.
File Organization and Indexing
Direct (or Hashed) File Organization
Hash Function: A hash function converts a
record's key value into an address.
Buckets: A bucket typically stores one or more
records. A hash function might map multiple
keys to the same bucket.
No Ordering of Records: Records are not stored
in any specific logical order.
File Organization and Indexing
Direct File Organization: Advantages
Rapid Access: If the hash function is efficient
and there's minimal collision, the retrieval of a
record is very quick.
Uniform Distribution: A good hash function will
distribute records uniformly across all buckets.
Efficient Search: Searching becomes efficient as
only a specific bucket needs to be searched
rather than the entire file.
File Organization and Indexing
Direct File Organization: Disadvantage
Collisions: A collision occurs when two different
keys hash to the same bucket. Handling
collisions can be tricky and might affect access
time.
Dependency on Hash Function: The efficiency of
a hash file organization heavily depends on the
hash function used. A bad hash function can
lead to clustering and inefficient utilization of
space.
File Organization and Indexing
Direct File Organization
File Organization and Indexing
Indexed File Organization
• It is a method used to store and retrieve data in
databases.
• It is designed to provide quick random access
to records based on key values.
• In this organization, an index is created which
helps in achieving faster search and access
times.
File Organization and Indexing
Indexed File Organization
• Primary Data File: The actual database file where
records are stored.
• Index: An auxiliary file that contains key values
and pointers to the corresponding records in the
data file.
• Multi-level Index: Sometimes, if the index
becomes large, a secondary (or even tertiary)
index can be created on the primary index to
expedite searching further..
File Organization and Indexing
Indexed File Organization: Advantages
• Quick Random Access: Direct access to records is
possible using the index.
• Flexible Searches: Since an index provides a
mechanism to jump directly to records, different
types of search operations (like range queries) can
be efficiently supported.
• Ordered Access: If the primary file is ordered,
then indexed file organization can support
efficient sequential access too.
File Organization and Indexing
Indexed File Organization: Disadvantages
• Overhead of Maintaining Index: Every time a
record is added, deleted, or updated, the index
also needs to be updated. This can introduce
overhead.
• Space Overhead: Indexes consume additional
storage space.
• Complexity: Maintaining multiple levels of
indexes can introduce complexity in terms of
design and implementation..
Indexed Sequential Access Method
• It is a file organization technique that provides
efficient ways to both sequentially and
randomly access records.
• In ISAM, data is stored in sorted order based
on a key, and an index is maintained to
facilitate quick search operations.
• The primary advantage of ISAM is that it
combines the benefits of both sequential and
direct-access (or random-access) file systems.
Indexed Sequential Access Method
Components of ISAM
• Data File: The records in the data file are stored in
sorted order based on a key field.
• Index File: The index file contains index entries,
which are pointers to the data file. These index
entries are also sorted based on the key.
• Overflow Area: This is used to store new records
that are inserted but cannot fit into the primary
data area. The overflow area is also organized in a
sorted manner.
Indexed Sequential Access Method
Operations
• Searching: When searching for a record, the system
first looks in the index to find the disk block that
contains the record. Then, that specific block is fetched
to retrieve the record. This usually takes two disk I/O
operations.
• Insertion: If a new record needs to be inserted, it is
placed in its correct sorted position. If the block is full,
the record is placed in the overflow area.
• Deletion: Deleting a record involves marking it as
deleted. The space is then reclaimed during a
subsequent reorganization of the data file.
Indexed Sequential Access Method