0% found this document useful (0 votes)
21 views38 pages

Unit 5 DBMS

The document discusses various database storage structures used in Database Management Systems (DBMS), including heap files, indexed files, and B+ trees, each with unique advantages and disadvantages. It also covers the types of memory used for data storage, such as primary, secondary, and tertiary memory, along with their characteristics and examples. Additionally, the document explains the concept of transactions in DBMS, detailing the operations involved in executing transactions to maintain data integrity.

Uploaded by

tyagih682
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views38 pages

Unit 5 DBMS

The document discusses various database storage structures used in Database Management Systems (DBMS), including heap files, indexed files, and B+ trees, each with unique advantages and disadvantages. It also covers the types of memory used for data storage, such as primary, secondary, and tertiary memory, along with their characteristics and examples. Additionally, the document explains the concept of transactions in DBMS, detailing the operations involved in executing transactions to maintain data integrity.

Uploaded by

tyagih682
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

DBMS

Unit-5

Database Storage Structures


Database tables and indexes may be stored on disk in one of a number of forms, including
ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own
particular advantages and disadvantages. The most commonly used forms are B-trees and ISAM.
Such forms or structures are one aspect of the overall schema used by a database engine to store
information.

In database management systems (DBMS), data is stored using various storage structures that
organize and manage data efficiently. Here are some common storage structures used in DBMS:
Heap File Structure: In a heap file structure, records are stored in no par cular order, o en
appended to the end of the file. This makes inser on fast but retrieval can be slower since a full
scan might be required to find specific records.
Sequen al File Structure: Records are stored in a sorted order according to a key field. This
structure speeds up retrieval opera ons as binary search or other efficient search algorithms can
be used to find records quickly. However, inser on and dele on opera ons can be slower as
maintaining the sorted order requires shi ing records.
Hashed File Structure: In a hashed file structure, records are stored in loca ons determined by
applying a hash func on to a key field. This allows for very fast retrieval of records since the
loca on can be directly calculated from the hash value. However, collisions (two records mapping
to the same loca on) need to be handled.
Indexed File Structure: An indexed file structure includes one or more indexes that provide
pointers to the actual records in the file. This allows for efficient retrieval based on the indexed
fields. B-trees, B+ trees, and hash indexes are commonly used indexing techniques.
Clustered File Structure: In a clustered file structure, records with similar characteris cs are
stored together physically on disk. This can improve retrieval performance for queries that access
mul ple related records since they are likely to be stored close to each other.
Par oned File Structure: In a par oned file structure, data is divided into par ons based
on some criteria such as a range of values or a hash func on. This can improve performance by
distribu ng the data across mul ple disks or servers, allowing for parallel processing.
Mul dimensional Storage Structures: These are used for storing data with mul ple
dimensions, such as in data warehouses or mul dimensional databases. Examples include star
schemas, snowflake schemas, and bitmap indexes, which are op mized for analy cal queries and
OLAP opera ons.

Page 1 of 38
Each storage structure has its advantages and disadvantages, and the choice of structure depends
on factors such as the type of applica on, the size and nature of the data, and the types of queries
and opera ons expected to be performed on the data.

Unordered
Unordered storage typically stores the records in the order they are inserted. Such storage offers
good insertion efficiency (), but inefficient retrieval times (). Typically these retrieval times are
better, however, as most databases use indexes on the primary keys, resulting in retrieval times
of or for keys that are the same as the database row offsets within the storage system.

Ordered
Ordered storage typically stores the records in order and may have to rearrange or increase the
file size when a new record is inserted, resulting in lower insertion efficiency. However, ordered
storage provides more efficient retrieval as the records are pre-sorted.

Structured files
Heap files
Heap files are lists of unordered records of variable size. Although sharing a similar name, heap
files are widely different from in-memory heaps. In-memory heaps are ordered, as opposed to
heap files.
 Simplest and most basic method
o insert efficient, with new records added at the end of the file, providing chronological order
o retrieval efficient when the handle to the memory is the address of the memory
o search inefficient, as searching has to be linear
o dele on is accomplished by marking selected records as "deleted"
o requires periodic reorganiza on if file is very vola le (changed frequently)
 Advantages
o efficient for bulk loading data
o efficient for rela vely small rela ons as indexing overheads are avoided
o efficient when retrievals involve large propor on of stored records
 Disadvantages
o not efficient for selec ve retrieval using key values, especially if large
o sor ng may be me-consuming
o not suitable for vola le tables

Hash buckets
 Hash func ons calculate the address of the page in which the record is to be stored based on one
or more fields in the record
o hashing func ons chosen to ensure that addresses are spread evenly across the address space
o ‘occupancy’ is generally 40% to 60% of the total file size
o unique address not guaranteed so collision detec on and collision resolu on mechanisms are
required
 Open addressing

Page 2 of 38
 Chained/unchained overflow
 Pros and cons
o efficient for exact matches on key field
o not suitable for range retrieval, which requires sequen al storage
o calculates where the record is stored based on fields in the record
o hash func ons ensure even spread of data
o collisions are possible, so collision detec on and restora on is required
B+ trees
These are the most commonly used in practice.
 Time taken to access any record is the same because the same number of nodes is searched
 Index is a full index so data file does not have to be ordered
 Pros and cons
o versa le data structure – sequen al as well as random access
o access is fast
o supports exact, range, part key and pa ern matches efficiently.
o vola le files are handled efficiently because index is dynamic – expands and contracts as table
grows and shrinks
o less well suited to rela vely stable files – in this case, ISAM is more efficient

Data Orientation
Most conventional relational databases use "row-oriented" storage, meaning that all data associated with
a given row is stored together. By contrast, column-oriented DBMS store all data from a given column
together in order to more quickly serve data warehouse-style queries. Correlation databases are similar
to row-based databases, but apply a layer of indirection to map multiple instances of the same value to
the same numerical identifier.

Storage Types in DBMS


The records in databases are stored in file formats. Physically, the data is stored in electromagnetic format
on a device. The electromagnetic devices used in database systems for data storage are classified as
follows:
1. Primary Memory
2. Secondary Memory
3. Ter ary Memory

Types of Memory
1. Primary Memory
The primary memory of a server is the type of data storage that is directly accessible by the central
processing unit, meaning that it doesn’t require any other devices to read from it. The primary
memory must, in general, function flawlessly with equal contributions from the electric power supply, the

Page 3 of 38
hardware backup system, the supporting devices, the coolant that moderates the system temperature,
etc.
 The size of these devices is considerably smaller and they are vola le.
 According to performance and speed, the primary memory devices are the fastest devices, and this
feature is in direct correla on with their capacity.
 These primary memory devices are usually more expensive due to their increased speed and
performance.
The cache is one of the types of Primary Memory.
 Cache Memory: Cache Memory is a special very high-speed memory. It is used to speed up and
synchronize with a high-speed CPU. Cache memory is costlier than main memory or disk memory
but more economical than CPU registers. Cache memory is an extremely fast memory type that acts
as a buffer between RAM and the CPU.

2. Secondary Memory
Data storage devices known as secondary storage, as the name suggests, are devices that can be accessed
for storing data that will be needed at a later point in time for various purposes or database actions.
Therefore, these types of storage systems are sometimes called backup units as well. Devices that are
plugged or connected externally fall under this memory category, unlike primary memory, which is part
of the CPU. The size of this group of devices is noticeably larger than the primary devices and smaller than
the tertiary devices.
 It is also regarded as a temporary storage system since it can hold data when needed and delete
it when the user is done with it. Compared to primary storage devices as well as ter ary devices,
these secondary storage devices are slower with respect to ac ons and pace.
 It usually has a higher capacity than primary storage systems, but it changes with the
technological world, which is expanding every day.
Some commonly used Secondary Memory types that are present in almost every system are:
 Flash Memory: Flash memory, also known as flash storage, is a type of nonvola le memory that
erases data in units called blocks and rewrites data at the byte level. Flash memory is widely used
for storage and data transfer in consumer devices, enterprise systems, and industrial applica ons.
Unlike tradi onal hard drives, flash memories are able to retain data even a er the power has been
turned off
 Magne c Disk Storage: A Magne c Disk is a type of secondary memory that is a flat disc covered
with a magne c coa ng to hold informa on. It is used to store various programs and files. The
polarized informa on in one direc on is represented by 1, and vice versa. The direc on is indicated
by 0.

3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold a large amount of data without
being constantly connected to the server or the peripherals. A device of this type is connected either
to a server or to a device where the database is stored from the outside.
 Due to the fact that ter ary storage provides more space than other types of device memory but is
most slowly performing, the cost of ter ary storage is lower than primary and secondary. As a
means to make a backup of data, this type of storage is commonly used for making copies from
servers and databases.
 The ability to use secondary devices and to delete the contents of the ter ary devices is similar.
Some commonly used Tertiary Memory types that are almost present in every system are:
 Op cal Storage: It is a type of storage where reading and wri ng are to be performed with the help
of a laser. Typically data wri en on CDs and DVDs are examples of Op cal Storage.

Page 4 of 38
 Tape Storage: Tape Storage is a type of storage data where we use magne c tape to store data. It is
used to store data for a long me and also helps in the backup of data in case of data loss.

Memory Hierarchy
A computer system has a hierarchy of memory. Direct access to a CPU’s main memory and inbuilt registers
is available. Accessing the main memory takes less time than running a CPU. Cache memory is introduced
to minimize this difference in speed. Data that is most frequently accessed by the CPU resides in cache
memory, which provides the fastest access time to data. Fastest-accessing memory is the most expensive.
Although large storage devices are slower and less expensive than CPU registers and cache memory, they
can store a greater amount of data.

1. Magnetic Disks
Present-day computer systems use hard disk drives as secondary storage devices. Magnetic disks store
information using the concept of magnetism. Metal disks are coated with magnetizable material to create
hard disks. Spindles hold these disks vertically. As the read/write head moves between the disks, it de-
magnetizes or magnetizes the spots under it. There are two magnetized spots: 0 (zero) and 1 (one).
Formatted hard disks store data efficiently by storing them in a defined order. The hard disk plate is
divided into many concentric circles, called tracks. Each track contains a number of sectors. Data on a hard
disk is typically stored in sectors of 512 bytes.

2. Redundant Array of Independent Disks(RAID)


In the Redundant Array of Independent Disks technology, two or more secondary storage devices are
connected so that the devices operate as one storage medium. A RAID array consists of several disks linked
together for a variety of purposes. Disk arrays are categorized by their RAID levels.
 RAID 0: At this level, disks are organized in a striped array. Blocks of data are divided into disks and
distributed over disks. Parallel wri ng and reading of data occur on each disk. This improves
performance and speed. Level 0 does not support parity and backup.

Fig: Raid-0
 RAID 1: Mirroring is used in RAID 1. A RAID controller copies data across all disks in an array when
data is sent to it. In case of failure, RAID level 1 provides 100% redundancy.

Page 5 of 38
Fig.: Raid-1

 RAID 2: The data in RAID 2 is striped on different disks, and the Error Correc on Code is recorded
using Hamming distance. Similarly to level 0, each bit within a word is stored on a separate disk, and
ECC codes for the data words are saved on a separate set of disks. As a result of its complex structure
and high cost, RAID 2 cannot be commercially deployed.

Fig: Raid-2

 RAID 3: Data is striped across mul ple disks in RAID 3. Data words are parsed to generate a parity
bit. It is stored on a different disk. Thus, single-disk failures can be avoided.

Raid-3
 RAID 4: This level involves wri ng an en re block of data onto data disks, and then genera ng the
parity and storing it somewhere else. At level 3, bytes are striped, while at level 4, blocks are striped.
Both levels 3 and 4 require a minimum of three disks.

Raid-4

Page 6 of 38
 RAID 5: The data blocks in RAID 5 are wri en to different disks, but the parity bits are spread out
across all the data disks rather than being stored on a separate disk.

Raid-5

 RAID 6: The RAID 6 level extends the level 5 concept. A pair of independent pari es are generated
and stored on mul ple disks at this level. A pair of independent pari es are generated and stored
on mul ple disks at this level. Ideally, you need four disk drives for this level.

Raid-6

Storage Hierarchy
Rather than the storage devices mentioned above, there are also other devices that are also used in day-
to-day life. These are mentioned below in the form of faster speed to lower speed from top to down.

Fig: Storage Hierarchy

Page 7 of 38
What is Transaction in DBMS?
Transactions in Database Management Systems (DBMS) are sets of operations performed to
modify data, including insertion, updates, or deletions. These transactions have various states
indicating their progress and required actions. They ensure data consistency even during system
failures, demonstrating a key advantage of DBMS. Transactions, executed repeatedly, like ATM
withdrawals, generate multiple instances, each maintaining database integrity through specific
properties.

Operations in Transaction
A certain set of operations takes place when a transaction is done that is used to perform some
logical set of operations. For example: When we go to withdraw money from ATM, we encounter
the following set of operations:
1. Transac on Ini ated
2. You have to insert an ATM card
3. Select your choice of language
4. Select whether savings or current account
5. Enter the amount to withdraw
6. Entering your ATM pin
7. Transac on processes
8. You collect the cash
9. You press finish to end transac on
The above mentioned are the set of operations done by you. But in the case of a transaction in
DBMS there are three major operations that are used for a transaction to get executed in an
efficient manner. These are:
1. Read/ Access Data 2. Write/ Change Data 3. Commit
Let's understand the above three sets of operations in a transaction with a real-life example of
transferring money from Account1 to Account2.
Initial balance in both the banks before the start of the transaction
Account1 - ₹ 5000 Account2 - ₹ 2000
This data before the start of the transaction is stored in the secondary memory (Hard disk) which
once initiated is bought to the primary memory (RAM) of the system for faster and better access.
Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the following set of operations
will take place.
Read (Account1) --> 5000 Account1 = Account1 - 500 Write (Account1) --> 4500 Read (Account2)
--> 2000 Account2 = Account2 + 500 Write (Account2) --> 2500 commit

The COMMIT statement permanently saves the changes made by the current transaction.
When a transaction is successful, COMMIT is applied. If the system fails before a COMMIT is
applied, the transaction reaches its previous state after ROLLBACK.

After commit operation the transaction ends and updated values of Account1 = ₹ 4500 and
Account2 = ₹ 2500. Every single operation that occurs before the commit is said to be in a

Page 8 of 38
partially committed state and is stored in the primary memory (RAM). After the transaction
is committed, the updated data is accepted and updated in the secondary memory (Hard Disk).
If in some case, the transaction failed anywhere before committing, then that transaction gets
aborted and have to start from the beginning as it can’t be continued from the previous state
of failure. This is known as Roll Back. :::

Transaction States in DBMS


During the lifetime of a transaction, there are a lot of states to go through. These states update
the operating system about the current state of the transaction and also tell the user about how
to plan further processing of the transaction. These states decide the regulations which decide
the fate of a transaction whether it will commit or abort.

The ROLLBACK statement undo the changes made by the current transaction. A transaction
cannot undo changes after COMMIT execution.

Following are the different types of transaction States :


 Active State: When the operations of a transaction are running then the transaction is
said to be active state. If all the read and write operations are performed without any
error then it progresses to the partially committed state, if somehow any operation fails,
then it goes to a state known as failed state.
 Partially Committed: After all the read and write operations are completed, the changes
which were previously made in the main memory are now made permanent in the
database, after which the state will progress to committed state but in case of a failure it
will go to the failed state.
 Failed State: If any operation during the transaction fails due to some software or
hardware issues, then it goes to the failed state . The occurrence of a failure during a
transaction makes a permanent change to data in the database. The changes made into
the local memory data are rolled back to the previous consistent state.

Page 9 of 38
 Aborted State: If the transaction fails during its execution, it goes from failed
state to aborted state and because in the previous states all the changes were only made
in the main memory, these uncommitted changes are either deleted or rolled back. The
transaction at this point can restart and start afresh from the active state.
 Committed State: If the transaction completes all sets of operations successfully, all the
changes made during the partially committed state are permanently stored and the
transaction is stated to be completed, thus the transaction can progress to finally get
terminated in the terminated state.
 Terminated State: If the transaction gets aborted after roll-back or the transaction comes
from the committed state, then the database comes to a consistent state and is ready for
further new transactions since the previous transaction is now terminated.

Properties of Transaction in DBMS


There are four major properties that are vital for a transaction to be successful. These are used
to maintain state consistency in the database, both before and after the transaction. These are
called ACID properties.
1. Atomicity: This property means that either the transaction takes place completely at once
or doesn’t happen at all. There is no middle option, i.e., transactions do not occur
partially. Each transaction is considered as one single step which either runs completely
or is not executed at all. Click here, to learn more about Atomicity in DBMS.
2. Consistency: This property means that the integrity constraints of a database are
maintained so that the database is consistent before and after the transaction. It refers
to the correctness of a database.
3. Isolation: This property means that multiple transactions can occur concurrently without
causing any inconsistency to the database state. These transactions occur independently
without any external interference. Changes that occur in a particular transaction are not
visible/ accessible to any other transaction until that particular change in that transaction
has been committed.
4. Durability: This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and they
remain intact even if a system failure occurs. These updates become permanent and are
stored in the non-volatile memory.

ACID Properties in DBMS


ACID properties refer to a set of fundamental guarantees provided to ensure the reliability and
consistency of data transactions. ACID stands for Atomicity, Consistency, Isolation,
and Durability.

Page 10 of 38
Atomicity ensures that a transaction is treated as a single indivisible unit, either executing all its
operations or none at all.
Consistency ensures that the database remains in a valid state before and after a transaction.
Isolation ensures that concurrent transactions do not interfere with each other, maintaining data
integrity.
Durability guarantees that once a transaction is committed, its effects are permanent and survive
any system failures. Together, these properties ensure reliability and maintain data integrity in
DBMS operations.

What is Database Transaction?


A transaction is a logical unit of work that accesses and updates the contents of a database. Read
and write operations are used by transactions to access data. A transaction has several states:

State Descrip on
Ac ve Transac ons are in progress.
Par ally Commi ed Opera ons completed, but data not yet saved.
Failed Transac on fails database recovery system checks.

Page 11 of 38
State Descrip on
Commi ed Successful comple on, changes permanently saved.
Aborted Transac on fails tests, rolled back or aborted.
Terminated Transac on terminated, system ready for new transac ons.

Transaction control
In database management systems (DBMS), transac on control refers to the mechanisms used to
ensure the ACID proper es (Atomicity, Consistency, Isola on, Durability) of database
transac ons.

What are ACID Properties in DBMS?


ACID properties are a set of properties that guarantee reliable processing of transactions in a
database management system (DBMS). Transactions are a sequence of database operations that
are executed as a single unit of work, and the ACID properties ensure that transactions are
processed reliably and consistently in a DBMS. Here's a breakdown of transaction control
mechanisms:
 The Atomicity property ensures that a transac on is either executed completely or not at all.
 The Consistency property ensures that the database remains in a consistent state before and a er
a transac on.
 The Isola on property ensures that mul ple transac ons can run concurrently without interfering
with each other.
 The Durability property ensures that the results of a commi ed transac on are permanent and
cannot be lost due to system failure.
Together, these properties ensure that transactions are processed reliably and consistently in a
DBMS, which is essential for the integrity and accuracy of data in a database.

1. Atomicity in DBMS
The term atomicity is the ACID Property in DBMS that refers to the fact that the data is kept
atomic. It means that if any operation on the data is conducted, it should either be executed
completely or not at all. It also implies that the operation should not be interrupted or just half
completed. When performing operations on a transaction, the operation should be completed
totally rather than partially. If any of the operations aren’t completed fully, the transaction gets
aborted.

Example Sometimes, a current operation will be running and then, an operation with a higher
priority enters. This discontinues the current operation and the current operation will be aborted.
In the given scenario, if two users simultaneously try to book the only available seat on a train,
the transaction is considered incomplete. According to atomicity, the first user who successfully
clicks the booking button will reserve the seat and receive a notification, while the second user's
transaction will be rolled back, and they will be notified that no more seats are available.

In a simpler example, if a person tries to book a ticket, selects a seat, and proceeds to the
payment gateway but encounters a failure due to bank server issues, their booked seat will not

Page 12 of 38
be reserved for them. A complete transaction involves reserving the seat and completing the
payment. If any step fails, the operation is aborted, and the user is brought back to the initial
state without their seat being reserved.
Atomicity in DBMS is often referred to as the ‘all or nothing’ rule.

2. Consistency in DBMS
This ACID Property will verify the total sum of seats left in the train + sum of seats booked by
users = total the number of seats present in the train. After each transaction, consistency is
checked to ensure nothing has gone wrong.
Example Let us consider an example where one person is trying to book a ticket. They are able
to reserve their seat but their payment hasn’t gone through due to bank issues. In this case, their
transaction is rolled back. But just doing that isn’t sufficient. The number of available seats must
also be updated. Otherwise, if it isn’t updated, there will be an inconsistency where the seat given
up by the person is not accounted for. Hence, the total sum of seats left in the train + the sum of
seats booked by users would not be equal to the total number of seats present in the train if not
for consistency.

3. Isolation in DBMS
Isolation is defined as a state of separation. Isolation is an ACID Property in DBMS where no data
from one database should impact the other and where many transactions can take place at the
same time. In other words, when the operation on the first state of the database is finished, the
process on the second state of the database should begin. It indicates that if two actions are
conducted on two different databases, the value of one database may not be affected by the
value of the other. When two or more transactions occur at the same time in the case of
transactions, consistency should be maintained. Any modifications made in one transaction will
not be visible to other transactions until the change is committed to the memory.

Example Suppose two people try to book the same seat simultaneously. Transactions are
serialized to maintain data consistency. The first person's transaction succeeds, and they receive
a ticket. The second person's transaction fails as the seat is already booked. They receive an error
message indicating no available seats.

4. Durability in DBMS
The ACID Property durability in DBMS refers to the fact that if an operation is completed
successfully, the database remains permanent in the disk. The database’s durability should be
such that even if the system fails or crashes, the database will survive. However, if the database
is lost, the recovery manager is responsible for guaranteeing the database’s long-term viability.
Every time we make a change, we must use the COMMIT command to commit the values.

Example Suppose that there is a system failure in the railway management system resulted in
the loss of all booked train details. Millions of users who had paid for their seats are now unable
to board the train, causing significant financial losses and eroding trust in the company. The

Page 13 of 38
situation is particularly critical as these trains are needed for important reasons, causing
widespread panic and inconvenience.

Advantages of ACID properties in DBMS


 Data Integrity: ACID proper es ensure data remains consistent and free from corrup on.
 Reliability: ACID proper es provide consistent and reliable execu on of transac ons.
 Concurrency Control: ACID proper es enable simultaneous access to data without conflicts.
 Fault Tolerance: ACID proper es ensure data durability, surviving system failures.
 Transac on Management: ACID proper es offer structured transac on handling.
 Compliance and Auditability: ACID proper es facilitate regulatory compliance and audi ng.

Disadvantages of ACID properties in DBMS


 Performance Overhead: ACID proper es can impact system performance and throughput due to
addi onal processing and resource u liza on.
 Complexity: Implemen ng ACID proper es adds complexity to database systems, increasing
design and maintenance challenges.
 Scalability Challenges: ACID proper es can pose difficul es in highly distributed or scalable
systems, limi ng scalability.
 Poten al for Deadlocks: ACID transac ons using locking mechanisms can lead to deadlocks and
system halts.
 Limited Concurrency: ACID proper es may restrict concurrency, impac ng overall system
throughput.
 Recovery Complexity: ACID proper es introduce complexi es in recovery and backup strategies.
 Trade-off with Availability: Strict adherence to ACID proper es may affect system availability in
certain situa ons.

Transac on control in DBMS involves the following key concepts and opera ons:
Transac on Begin: The start of a transac on is ini ated using a BEGIN TRANSACTION or START
TRANSACTION statement. This marks the beginning of a logical unit of work that will be treated
atomically.
Transac on Commit: When all opera ons within a transac on complete successfully, the
changes made by the transac on are permanently saved to the database using a COMMIT
statement. A er a commit, the changes become visible to other transac ons.
Transac on Rollback: If an error occurs during the execu on of a transac on or if the
transac on is explicitly aborted, the changes made by the transac on are undone using a
ROLLBACK statement. This restores the database to its state before the transac on began.
Savepoints: Savepoints allow transac ons to be divided into smaller, nested units. They provide
a way to roll back only part of a transac on without affec ng the en re transac on.
Effec ve transac on control ensures data integrity, concurrency, and recovery in database
systems, enabling reliable and consistent database opera ons.

Page 14 of 38
Deadlock in DBMS
Deadlock in a database management system (DBMS) is a problematic scenario where two or
more transactions are stuck in an indefinite wait for each other to finish, yet neither relinquishes
the necessary CPU and memory resources. This impasse halts the system, as tasks remain
uncompleted and perpetually waiting.

To illustrate, consider a student database: Transaction 1 locks certain Student table records,
needing to update corresponding Grade table entries. Concurrently, Transaction 2 locks different
Grade table records, requiring updates in the Student table, already locked by Transaction 1.
The deadlock dilemma emerges as Transaction 1 waits for Transaction 2 to unlock resources and
vice versa, leading to a system-wide standstill. This deadlock persists until the DBMS intervenes,
typically by aborting one of the transactions to resolve the gridlock.

Coffman Conditions
Regarding deadlock in DBMS, there were four conditions stated by Coffman. A deadlock
might occur if all of these four Coffman conditions hold true at any point in time.
1. Mutual Exclusion: There should be at least one resource that cannot be u lized by more than one
transac on at a me.
2. Hold and wait condi on: When any transac on is holding a resource, requests for some more
addi onal resources which are already being held by some other transac ons in the system.
3. No preemp on condi on: Access to a par cular resource can never be forcibly taken from a
running transac on. Only that running transac on can release a resource that is being held by it.
4. Circular wait condi on: In this condi on, a transac on is kept wai ng for a resource that is at the
same me is held by some other transac on and which is further wai ng for a third transac on
so on and the last transac on is wai ng for the very first one. Thus, giving rise to a circular chain
of wai ng transac ons.

Page 15 of 38
Deadlock Handling
Ostrich Algorithm
Ostrich Algorithm is an approach of handling deadlock that involves ignoring the deadlock and
pretending that it would never occur. This approach derives its name from the behavior of an
Ostrich which is “to stick one’s head in the sand and pretend there is no problem”. Windows and
UNIX-based systems use this approach of handling a deadlock.

But now you might question, Why ignore the deadlock?


This is because deadlock is a very rare case and the cost of handling a deadlock is very high. You
might have encountered a situation when your system got hanged up and to fix it a restart was
needed. In this case, the Operating system ignores the deadlock as the time required to handle
the deadlock is higher than the rebooting time of windows. Rebooting is a preferred choice,
considering the rarity of deadlock in Windows.

Deadlock Avoidance
 Deadlock avoidance is a technique of detec ng any deadlock in advance. Methods like Wait-For
graph can be used in smaller databases to detect deadlocks, but in the case of larger databases
deadlock preven on measures have to be used.
 When a database gets stuck in a state of deadlock, it is preferred to avoid using that database
instead of abor ng or reboo ng the database server as it wastes of both me and resources.

Deadlock Detection
While a database transaction, if a task waits indefinitely to obtain CPU resources, then DBMS has
to check whether that task is in a state of deadlock or not. To detect a deadlock a resource
scheduler is used. A resource scheduler can detect deadlock by keeping track of resources
allocated to a specific transaction and requested by another transaction.
 Wait-For Graph: This method of detec ng deadlocks involves crea ng a graph based on a
transac on and its acquired lock (a lock is a way of preven ng mul ple transac ons from accessing
any resource simultaneously). If the created graph contains a cycle/loop, it means a deadlock
exists.
DBMS creates a wait-for graph for every transaction/task that is in a waiting state and keeps on
checking whether there exists a cycle in any of the graphs or not

The above is a wait-for graph representation for two transactions T1 and T2 in a deadlock
situation.

Page 16 of 38
Deadlock Prevention
Avoiding one or more of the above-stated Coffman conditions can lead to the prevention of a
deadlock. Deadlock prevention in larger databases is a much more feasible situation rather than
handling it.

The DBMS is made to efficiently analyze every database transaction, whether they can cause a
deadlock or not, if any of the transactions can lead to a deadlock, then that transaction is never
executed.
1. Wait-Die Scheme: When a transaction requests a resource that is already locked by some
other transaction, then the DBMS checks the timestamp of both the transactions and
makes the older transaction wait until that resource is available for execution.
2. Wound-wait Scheme: When an older transaction demands a resource that is already
locked by a younger transaction (a transaction that is initiated later), the younger
transaction is forced to kill/stop its processing and release the locked resource for the
older transaction's own execution. The younger transaction is now restarted with a one-
minute delay, but the timestamp remains the same. If a younger transaction requests a
resource held by an older one, the younger transaction is made to wait until the older one
releases the resource.

Deadlock Detection in Distributed Systems


Distributed systems in DBMS have multiple components spread across different computing
devices that coordinate and communicate actions to appear as a single system to the user. These
are also known as distributed computing.

Distributed systems are very vast because of which deadlocks can neither be prevented nor
avoided. Thus deadlock detection is the only possible option. Following are the requirements for
deadlock detection in distributed systems.
1. Progress: The method should be able to detect all the deadlocks in the system.
2. Safety: The method should not detect false or phantom deadlocks.

Approaches to detect deadlock in the distributed system


1. Centralized Approach: This is the simplest and easiest way of deadlock detection as in
this only a single resource is responsible for detecting the deadlock. But it also has its own
disadvantages, such as excessive load on a single node and having only a single point of
failure that makes the system less reliable.
2. Distributed Approach: Unlike the centralized approach, multiple nodes are responsible
for detecting deadlock. Because of this approach multiple nodes there is proper load
balancing and no single point of failure that helps to further increase the speed of
detecting deadlock.
3. Hierarchical Approach: This Approach integrates both centralized and distributed
approaches for deadlock detection. In this, a single node is made to handle a particular
selected set of nodes responsible for detecting deadlock.

Page 17 of 38
Serializability in DBMS
Q. What is a serializable schedule, and what is it used for?
If a non-serial schedule can be transformed into its corresponding serial schedule, it is said to be
serializable. Simply said, a non-serial schedule is referred to as a serializable schedule if it yields
the same results as a serial timetable.

Non-serial Schedule

A schedule where the transactions are overlapping or switching places. As they are used to carry
out actual database operations, multiple transactions are running at once. It’s possible that these
transactions are focusing on the same data set. Therefore, it is crucial that non-serial schedules can
be serialized in order for our database to be consistent both before and after the transactions are
executed.

Example:
Transaction-1 Transaction-2
R(a)

W(a)

R(b)

W(b)

R(b)

R(a)

W(b)

W(a)

We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and they
are both working on the same data, i.e., “a” and “b”, interchangeably. Where “R”-Read, “W”-
Write

Serializability testing
We can utilize the Serialization Graph or Precedence Graph to examine a schedule’s serializability.
A schedule’s full transactions are organized into a Directed Graph, what a serialization graph is.

Page 18 of 38
Fig: Precedence Graph

It can be described as a Graph G(V, E) with vertices V = “V1, V2, V3,…, Vn” and directed edges
E = “E1, E2, E3,…, En”. One of the two operations—READ or WRITE—performed by a certain
transaction is contained in the collection of edges. Where Ti -> Tj, means Transaction-Ti is either
performing read or write before the transaction-Tj.

Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.

Fig: Types of Serializability – Conflict & View

1. Conflict serializability

Conflict serializability refers to a subset of serializability that focuses on maintaining the


consistency of a database while ensuring that identical data items are executed in an order. In a
DBMS each transaction has a value and all the transactions, in the database rely on this uniqueness.
This uniqueness ensures that no two operations with the conflict value can occur simultaneously.

For example, let’s consider an order table and a customer table as two instances. Each order is
associated with one customer even though a single client may place orders. However, there are
restrictions for achieving conflict serializability in the database. Here are a few of them.

1. Different transac ons should be used for the two procedures.

Page 19 of 38
2. The iden cal data item should be present in both transac ons.

3. Between the two opera ons, there should be at least one write opera on.

Example

Three transactions—t1, t2, and t3—are active on a schedule “S” at once. Let’s create a graph of
precedence.

Transaction – 1 (t1) Transaction – 2 (t2) Transaction – 3 (t3)


R(a)

R(b)

R(b)

W(b)

W(a)

W(a)

R(a)

W(a)

It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.

Fig: DAG of transactions

Page 20 of 38
As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will finally be
executed.

Therefore, the serial schedule’s equivalent order is: t1 –> t3 –> t2

Note: A schedule is unquestionably consistent if it is conflicting serializable. A non-conflicting


serializable schedule, on the other hand, might or might not be serial. We employ the idea of View
Serializability to further examine its serial behavior.

2. View Serializability

View serializability is a kind of operation in a serializable in which each transaction should provide
some results, and these outcomes are the output of properly sequentially executing the data item.
The view serializability, in contrast to conflict serialized, is concerned with avoiding database
inconsistency. The view serializability feature of DBMS enables users to see databases in
contradictory ways.

To further understand view serializability in DBMS, we need to understand the schedules S1 and
S2. The two transactions T1 and T2 should be used to establish these two schedules. Each schedule
must follow the three transactions in order to retain the equivalent of the transaction. These three
circumstances are listed below.

1. The first prerequisite is that the same kind of transac on appears on every schedule. This
requirement means that the same kind of group of transac ons cannot appear on both schedules
S1 and S2. The schedules are not equal to one another if one schedule commits a transac on but
it does not match the transac on of the other schedule.

2. The second requirement is that different read or write opera ons should not be used in either
schedule. On the other hand, we say that two schedules are not similar if schedule S1 has two
write opera ons whereas schedule S2 only has one. The number of the write opera on must be
the same in both schedules, however there is no issue if the number of the read opera on is
different.

3. The second to last requirement is that there should not be a conflict between either metable.
execu on order for a single data item. Assume, for instance, that schedule S1’s transac on is T1,
and schedule S2’s transac on is T2. The data item A is wri en by both the transac on T1 and the
transac on T2. The schedules are not equal in this instance. However, we referred to the schedule
as equivalent to one another if it had the same number of all write opera ons in the data item.

What is view equivalency?


Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as equivalent:

1. The same piece of data must be read for the first me. For instance, if transac on t1 is reading
“A” from the database in schedule S1, then t1 must also read A in schedule S2.

Page 21 of 38
2. The same piece of data must be used for the final write. As an illustra on, if transac on t1
updated A last in S1, it should also conduct final write in S2.

3. The middle sequence need to follow suit. As an illustra on, if in S1 t1 is reading A, and t2
updates A, then in S2 t1 should read A, and t2 should update A.

View Serializability refers to the process of determining whether a schedule’s views are
equivalent.

Example

We have a schedule “S” with two concurrently running transactions, “t1” and “t2.”

Schedule – S:

Transaction-1 (t1) Transaction-2 (t2)


R(a)

W(a)

R(a)

W(a)

R(b)

W(b)

R(b)

W(b)

By switching between both transactions’ mid-read-write operations, let’s create its view
equivalent schedule (S’).

Schedule – S’:

Transaction-1 (t1) Transaction-2 (t2)


R(a)

W(a)

R(b)

W(b)

Page 22 of 38
Transaction-1 (t1) Transaction-2 (t2)
R(a)

W(a)

R(b)

W(b)

It is a view serializable schedule since a view similar schedule is conceivable.

Note: A conflict serializable schedule is always viewed as serializable, but


vice versa is not always true.

Advantages of Serializability
1. Execu on is predictable: In serializable, the DBMS’s threads are all performed simultaneously.
The DBMS doesn’t include any such surprises. In DBMS, no data loss or corrup on occurs and all
variables are updated as intended.

2. DBMS executes each thread independently, making it much simpler to understand and
troubleshoot each database thread. This can greatly simplify the debugging process. The
concurrent process is therefore not a concern for us.

3. Lower Costs: The cost of the hardware required for the efficient opera on of the database can
be decreased with the aid of the serializable property. It may also lower the price of developing
the so ware.

4. Increased Performance: Since serializable execu ons provide developers the opportunity to
op mize their code for performance, they occasionally outperform non-serializable equivalents.

For a DBMS transaction to be regarded as serializable, it must adhere to the ACID properties. In
DBMS, serializability comes in a variety of forms, each having advantages and disadvantages of
its own. Most of the time, choosing the best sort of serializability involves making a choice
between performance and correctness.

Making the incorrect choice for serializability might result in database issues that are challenging
to track down and resolve. You should now have a better knowledge of how serializability in
DBMS functions and the different types that are available thanks to this guide.

FAQs on Serializability in DBMS


Q.1: How does a DBMS achieve serializability?

Answer:

Page 23 of 38
Through concurrency control techniques like locking, timestamp ordering, and optimistic
concurrency control, DBMS accomplish serializability. The simultaneous access to the database
is still permitted while these methods make sure that transactions are carried out in a serializable
order.

Q.2: How is View Serializability different from Conflict Serializability?

Answer:

A lower type of serializability than conflict serializability is view serializability. View


serializability just needs that transactions yield the same final result as a serial schedule, whereas
conflict serializability demands that transactions do not have conflicting accesses to the same data
item. As a result, some schedules that are considered conflict serializable may not be.

Q.3: What distinguishes strong serializability from weak serializability?

Answer:

A stronger type of serializability than weak serializability is strong serializability. A schedule must
be comparable to a serial schedule in strong serializability, where transactions are carried out in
the same sequence as they were in the original schedule. A schedule just needs to be conflict equal
to a serial schedule in weak serializability.

Q.4: What does serializability testing’s precedence graph entail?

Answer:

An instrument for determining if conflicts can be serialized in a schedule is a precedence graph.


Each transaction is represented as a node in a precedence graph, and if the operation in the second
transaction depends on the operation in the first transaction, a directed edge is drawn from one
node to the next. The schedule can be serialized to resolve conflicts if the graph is acyclic.

Page 24 of 38
Concurrency Control in DBMS
When several transactions execute concurrently without any rules and protocols, various
problems arise that may harm the data integrity of several databases. These problems are known
as concurrency control problems. Therefore, several rules are designed, to maintain consistency
in the transactions while they are executing concurrently which are known as concurrency
control protocols.

A transaction is a single reasonable unit of work that can retrieve or may change the data of a
database. Executing each transaction individually increases the waiting time for the other
transactions and the overall execution also gets delayed. Hence, to increase the throughput and
to reduce the waiting time, transactions are executed concurrently.

Example: Suppose, between two railway stations, A and B, 5 trains have to travel, if all the trains
are set in a row and only one train is allowed to move from station A to B and others have to wait
for the first train to reach its destination then it will take a lot of time for all the trains to travel
from station A to B. To reduce time all the trains should be allowed to move concurrently from
station A to B ensuring no risk of collision between them.
When several transactions execute simultaneously, then there is a risk of violation of the data
integrity of several databases. Concurrency Control in DBMS is a procedure of managing
simultaneous transactions ensuring their atomicity, isolation, consistency and serializability.

Concurrent Execution in DBMS


 In a mul -user system, mul ple users can access and use the same database at one me, which is
known as the concurrent execu on of the database. It means that the same database is executed
simultaneously on a mul -user system by different users.
 While working on the database transac ons, there occurs the requirement of using the database
by mul ple users for performing different opera ons, and in that case, concurrent execu on of
the database is performed.
 The thing is that the simultaneous execu on that is performed should be done in an interleaved
manner, and no opera on should affect the other execu ng opera ons, thus maintaining the
consistency of the database. Thus, on making the concurrent execu on of the transac on
opera ons, there occur several challenging problems that need to be solved.

Concurrency Control Problems


Several problems that arise when numerous transactions execute simultaneously in a random
manner are referred to as Concurrency Control Problems.

Dirty Read Problem


The dirty read problem in DBMS occurs when a transaction reads the data that has been updated
by another transaction that is still uncommitted. It arises due to multiple uncommitted
transactions executing simultaneously.

Page 25 of 38
Example: Consider two transactions A and B performing read/write operations on a data DT in
the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.

Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ READ(DT)
T5 ------ COMMIT
T6 ROLLBACK ------

Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored in the
temporary buffer. The transaction B reads the data DT as 1500 and commits it and the value of
DT permanently gets changed to 1500 in the database DB. Then some server errors occur in
transaction A and it wants to get rollback to its initial value, i.e., 1000 and then the dirty read
problem occurs.

Unrepeatable Read Problem


The unrepeatable read problem occurs when two or more different values of the same data are
read during the read operations in the same transaction.

Example: Consider two transactions A and B performing read/write operations on a data DT in


the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.

Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DT=DT+500 ------
T4 WRITE(DT) ------
T5 ------ READ(DT)

Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value of DT
from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500.
Transaction B finds two different values of DT in its two different read operations.

Phantom Read Problem


In the phantom read problem, data is read through two different read operations in the same
transaction. In the first read operation, a value of the data is obtained but in the second
operation, an error is obtained saying the data does not exist.

Page 26 of 38
Example: Consider two transactions A and B performing read/write operations on a data DT in
the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.

Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DELETE(DT) ------
T4 ------ READ(DT)

Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from the
database DB and then again transaction B reads the value and finds an error saying the data DT
does not exist in the database DB.

Lost Update Problem


The Lost Update problem arises when an update in the data is done over another update but by
two different transactions.

Example: Consider two transactions A and B performing read/write operations on a data DT in


the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.
Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ DT=DT+300
T5 ------ WRITE(DT)
T6 READ(DT) ------

Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT from
1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A again reads
DT and finds 1800 in DT and therefore the update done by transaction A has been lost.

Incorrect Summary Problem


The Incorrect summary problem occurs when there is an incorrect sum of the two data. This
happens when a transaction tries to sum two data using an aggregate function and the value of
any one of the data get changed by another transaction.

Example: Consider two transactions A and B performing read/write operations on two data DT1
and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following
table shows the read/write operations in A and B transactions.
Time A B
T1 READ(DT1) ------

Page 27 of 38
Time A B
T2 add=0 ------
T3 add=add+DT1 ------
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 READ(DT2) ------
T7 add=add+DT2 ------

Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which calculates
the sum of two data DT1 and DT2 in variable add but in between the value of DT2 get changed
from 2000 to 2500 by transaction B. Variable add uses the modified value of DT2 and gives the
resultant sum as 3500 instead of 3000.

Concurrency Control Protocols


To avoid concurrency control problems and to maintain consistency and serializability during the
execution of concurrent transactions some rules are made. These rules are known as
Concurrency Control Protocols.

Lock-Based Protocols
To attain consistency, isolation between the transactions is the most important tool. Isolation is
achieved if we disable the transaction to perform a read/write operation. This is known as locking
an operation in a transaction. Through lock-based protocols, desired operations are freely
allowed to perform locking the undesired operations.

There are two kinds of locks used in Lock-based protocols:


Shared Lock(S): The locks which disable the write operations but allow read operations for any
data in a transaction are known as shared locks. They are also known as read-only locks and
are represented by 'S'.

Exclusive Lock(X): The locks which allow both the read and write operations for any data in a
transaction are known as exclusive locks. This is a one-time use mode that can't be utilized on
the exact data item twice. They are represented by 'X'.

There are four kinds of lock-based protocols:


Simplistic Lock Protocol: This protocol instructs to lock all the other operations on the data when
the data is going to get updated. All the transactions may unlock all the operations on the data
after the write operation.

Pre-claiming Lock Protocol: According to the pre-claiming lock protocol initially, an assessment
of the operations that are going to be performed is conducted. Then a list is prepared to contain
the data items on which locks will be imposed. The transaction requests the system all the locks
before starting the execution of the operations. If all the locks are provided then the operations

Page 28 of 38
in the transaction run smoothly and then locks are returned to the system on completion. The
transaction rolls back if all the locks are not provided.

Two-phase Locking Protocol: This protocol consists of three phases. The transaction starts its
execution with the first phase, where it asks for the locks. Once the locks are granted, the second
phase begins, where the transaction contains all the locks. When the transaction releases the
first lock, the third phase begins where all the locks are getting released after the execution of
every operation in the transaction.

Strict Two-Phase Locking Protocol: The strict 2PL is almost similar to 2PL. The only difference is
that the strict 2PL does not allow releasing the locks just after the execution of the operations,
but it carries all the locks and releases them when the commit is triggered. Check out this article
to learn more about Loock-Based Protocols.

Time-based Protocols
According to this protocol, every transaction has a timestamp attached to it. The timestamp is
based on the time in which the transaction is entered into the system. There is read and write
timestamps associated with every transaction which consists of the time at which the latest read
and write operations are performed respectively.

Timestamp Ordering Protocol:


The timestamp ordering protocol uses timestamp values of the transactions to resolve the
conflicting pairs of operations. Thus, ensuring serializability among transactions. Following are
the denotations of the terms used to define the protocol for transaction A on the data item DT:

Terms Denota ons


Timestamp of transac on A TS(A)
Read me-stamp of data-item DT R- mestamp(DT)
Write me-stamp of data-item DT W- mestamp(DT)

Following are the rules on which the Time-ordering protocol works:


1. When transac on A is going to perform a read opera on on data item DT:
o TS(A) < W- mestamp(DT): Transac on will rollback. If the mestamp of transac on A at
which it has entered in the system is less than the write mestamp of DT that is the latest
me at which DT has been updated then the transac on will roll back.
o TS(A) >= W- mestamp(DT): Transac on will be executed. If the mestamp of transac on
A at which it has entered in the system is greater than or equal to the write mestamp of
DT that is the latest me at which DT has been updated then the read opera on will be
executed.
o All data-item mestamps updated.
2. When transac on A is going to perform a write opera on on data item DT:
o TS(A) < R- mestamp(DT): Transac on will rollback. If the mestamp of transac on A at
which it has entered in the system is less than the read mestamp of DT that is the latest
me at which DT has been read then the transac on will rollback.

Page 29 of 38
o TS(A) < W- mestamp(DT): Transac on will rollback. If the mestamp of transac on A at
which it has entered in the system is less than the write mestamp of DT that is the latest
me at which DT has been updated then the transac on will rollback.
o All the opera ons other than this will be executed.

Thomas' Write Rule: The rule alters the timestamp-ordering protocol to make the schedule view
serializable. For the case TS(A) < W-timestamp(DT), in the timestamp-ordering protocol, the
transaction will get rollback but according to Thomas Write Rule, whenever the write operation
comes up, it will get ignored.

Validation Based Protocol


This protocol executes the transaction undergoing through the following three phases:
Read phase: In this phase, the transaction stores all the values of data in its local buffer that
occurs after the execution of every operation in the transaction. There is no modification done
in the database.

Validation phase: In this phase, validation tests are performed that check whether the values of
data present in the local buffer can replace the original value of the database without causing
any harm to serializability.

Validation Test: Validation tests have performed on transaction A executing concurrently with
transaction B such that TS(A)<TS(B). The transactions must follow one of the following conditions:
1. Finish(A)<Start(B): The opera ons in transac on A are finished its execu on before transac on B
starts. Consider two transac ons A and B execu ng its opera ons. Hence serializability order is
maintained.
2. Start(B)<Finish(A)<Validate(B): The list of data items wri en by transac on A during its write
opera on should not intersect with the read of the transac on B.

Write phase: If the transaction passes the tests of the validation phase, then the values get
copied to the database, otherwise the transaction rolls back.

Example: Consider two transactions A and B performing read/write operations on two data DT1
and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following
table shows the read/write operations in A and B transactions.

Time A B
T1 READ(DT1) ------
T2 ------ READ(DT1)
T3 ------ DT1=DT1-100
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 ------ READ(DT2)
T7 ------
T8 PRINT(DT2-DT1) ------

Page 30 of 38
Time A B
T9 ------
T10 ------ WRITE(DT1)
T11 ------ WRITE(DT2)

The schedule passes the validation test of the validation phase due to the timestamp transaction
B being less than transaction A. It should be observed that the write operations are implemented
after the validation of both transactions. All the operations before the final write are performed
in the local buffer.

Page 31 of 38
Concurrency control algorithms
Concurrency control algorithms are crucial in database management systems (DBMS) to ensure
that mul ple transac ons can execute concurrently without interfering with each other, while
s ll maintaining the ACID proper es (Atomicity, Consistency, Isola on, Durability). Here are some
common concurrency control algorithms used in DBMS:

Lock-Based Concurrency Control:

Two-Phase Locking (2PL): Transac ons acquire locks on data items before accessing them
and release locks a er comple ng their opera ons. 2PL ensures serializability by preven ng
conflicts between transac ons.
Strict Two-Phase Locking (Strict 2PL): Similar to 2PL, but locks are held un l the transac on
commits or aborts. This prevents the possibility of cascading aborts.
Deadlock Detec on and Preven on: Techniques such as deadlock detec on algorithms (e.g.,
wait-for graph) and preven on strategies (e.g., meouts, deadlock avoidance) are employed to
handle deadlocks that may occur due to lock-based concurrency control.

Timestamp-Based Concurrency Control:

Timestamp Ordering Protocol (TO): Transac ons are assigned mestamps, and their
execu on order is determined based on these mestamps. Conflicts are resolved by comparing
transac on mestamps, and older transac ons are given priority.
Thomas' Write Rule: In mestamp-based concurrency control, Thomas' Write Rule ensures
strictness by allowing a transac on to write to a data item only if its mestamp is the highest
among all transac ons that have previously read or wri en to that item.

Multiversion Concurrency Control (MVCC):

Snapshot Isola on: Each transac on sees a consistent snapshot of the database at the me
it starts. Reads are performed on this snapshot, ensuring that transac ons do not see changes
made by other concurrent transac ons.
Versioning: Instead of blocking transac ons, MVCC maintains mul ple versions of each data
item to allow concurrent reads and writes without conflicts. Transac ons access the appropriate
version based on their mestamps.

Optimistic Concurrency Control:

Timestamp-Based OCC: Transac ons execute op mis cally without acquiring locks. Before
commi ng, the system checks for conflicts by comparing mestamps or version numbers. If
conflicts are detected, one or more transac ons may need to be aborted and restarted.

Page 32 of 38
Valida on-Based OCC: Transac ons execute op mis cally, and at commit me, the system
validates the transac on to ensure that it does not conflict with other concurrent transac ons. If
valida on fails, the transac on is aborted and restarted.
These concurrency control algorithms ensure that transac ons execute correctly and
concurrently while maintaining the integrity and consistency of the database. The choice of
algorithm depends on factors such as the applica on requirements, system workload, and
performance considera ons.

Issues in Concurrent execution


Concurrent execu on in database management systems (DBMS) introduces several issues that
need to be addressed to ensure data consistency, integrity, and isola on between transac ons.
Here are some key issues:
Lost Updates: Occur when two or more transac ons a empt to update the same data
concurrently, leading to one transac on's update being lost. This can happen when transac ons
do not lock the data they are upda ng, allowing other transac ons to overwrite changes.
Dirty Reads: Occur when one transac on reads data that has been modified by another
transac on but not yet commi ed. If the modifying transac on is later rolled back, the read
transac on has read invalid or "dirty" data.
Non-Repeatable Reads: Occur when a transac on reads the same data mul ple mes during
its execu on, but the data changes between reads due to updates from other transac ons. This
inconsistency can lead to unexpected results.
Phantom Reads: Occur when a transac on performs a query mul ple mes and receives
different sets of rows each me due to concurrent inser ons or dele ons by other transac ons.
This violates the transac on's isola on level, as it sees changes made by other transac ons.
Concurrency Control Overhead: Implemen ng concurrency control mechanisms such as
locking, mestamping, or mul version concurrency control (MVCC) introduces overhead in terms
of system resources and performance. Lock conten on, in par cular, can lead to decreased
throughput and increased latency.
Deadlocks: Occur when two or more transac ons are wai ng for each other to release locks,
resul ng in a cyclic dependency where no transac on can proceed. Deadlocks can lead to system-
wide stalls and must be detected and resolved by the DBMS.
Starva on: Occurs when a transac on is indefinitely delayed or denied access to resources due
to other transac ons con nually acquiring and holding locks. This can result in unfairness and
reduced throughput for certain transac ons.
Concurrency Anomalies: Various anomalies can occur when mul ple transac ons execute
concurrently, such as write skew, read skew, and write-write conflicts. These anomalies can lead
to inconsistencies in the database and violate the ACID proper es.

Page 33 of 38
Addressing these issues requires careful design and implementa on of concurrency control
mechanisms, such as locking, mestamp-based methods, and mul version concurrency control.
Addi onally, choosing an appropriate isola on level for transac ons can help mi gate
concurrency-related problems while balancing performance and consistency requirements.

Failures and Recovery algorithms


In database management systems (DBMS), failures can occur due to hardware malfunc ons,
so ware errors, power outages, or other unforeseen events. Recovery algorithms are essen al
to ensure that the database remains consistent and durable despite these failures. Here are some
common types of failures and recovery algorithms used in DBMS:

Types of Failures:

Transac on Failures: Occur when a transac on encounters an error and cannot con nue its
execu on. This may result in a par al update to the database.
System Failures: Occur due to hardware or so ware errors, such as disk crashes, memory
corrup on, or power outages, leading to poten al data loss or corrup on.
Media Failures: Occur when there is physical damage to storage devices, such as disk failures
or data corrup on, requiring data recovery from backups or redundant copies.

Recovery Algorithms:

Undo Logging (Rollback): In this algorithm, before modifying any data item, the DBMS logs
the old value of the item. If a transac on fails, the system uses these logs to undo the changes
made by the failed transac on, ensuring atomicity and durability.
Redo Logging (Forward Recovery): In this algorithm, the DBMS logs the changes made by
each transac on before the changes are applied to the database. If a failure occurs, the system
replays the log records to reapply the changes, ensuring durability.
Checkpoin ng: Checkpoin ng is a technique used to reduce the me required for recovery
by periodically wri ng the current state of the database to stable storage. This reduces the
number of log records that need to be processed during recovery.
Shadow Paging: Shadow paging maintains mul ple versions of the database, with each
version represen ng a consistent snapshot of the database. When a transac on modifies the
database, it creates a new version, and the old version remains unchanged un l the transac on
commits. If a transac on fails, the system can simply discard the changes made by the failed
transac on by rever ng to the previous version.
Write-Ahead Logging (WAL): In WAL, the DBMS ensures that log records are wri en to stable
storage before the corresponding data pages are modified. This ensures that the log contains a

Page 34 of 38
record of all changes made to the database before the changes are applied, providing a
mechanism for recovery in the event of a failure.
Database Replica on and Mirroring: Replica ng or mirroring the database across mul ple
physical loca ons ensures high availability and fault tolerance. If one copy of the database fails,
the system can failover to a redundant copy without losing data.
These recovery algorithms ensure that the database can recover to a consistent and durable state
a er a failure, minimizing data loss and maintaining data integrity. The choice of recovery
algorithm depends on factors such as the recovery me objec ve (RTO), recovery point objec ve
(RPO), and performance requirements of the system.

Case study: Demonstration of Entire project by applying all the


concepts learnt with minimum Front end requirements
Let's consider a case study for a simple inventory management system for a small retail store. The
system will track inventory items, manage orders, and handle customer informa on. We'll
demonstrate the implementa on of this project, covering various concepts learned in database
management systems (DBMS), with minimal front-end requirements.

Project Overview:

Database: We'll use a rela onal database to store data, such as MySQL or SQLite.
Programming Language: Python for backend development.
Concepts to Apply: En ty-rela onship modeling, database normaliza on, SQL queries,
transac on management, concurrency control, and recovery algorithms.

Functionalities:

Inventory Management:
Add, update, and delete inventory items.
Track item quan ty, price, and other details.
Order Management:
Place orders for items.
Update order status (e.g., pending, delivered).
Customer Management:
Add, update, and delete customer informa on.

Implementation Steps:

Page 35 of 38
Database Design:
Define the en ty-rela onship model (ERD) for the inventory system.
Design tables for items, orders, customers, and other relevant en es.
Normalize the database schema to reduce redundancy and improve efficiency.
Database Implementa on:
Create the necessary tables in the chosen DBMS.
Establish rela onships between tables using foreign keys.
Backend Development:
Use Python to develop the backend logic of the system.
Implement func ons/methods to interact with the database:
Add, update, delete inventory items.
Place orders, update order status.
Manage customer informa on.
Transac on Management:
Use transac on control mechanisms to ensure data consistency during opera ons involving
mul ple steps.
Apply techniques like two-phase locking or mestamp-based concurrency control.
Error Handling:
Implement error handling mechanisms to manage excep ons and errors gracefully.
Rollback transac ons in case of errors to maintain data integrity.
Recovery:
Implement recovery algorithms such as undo logging or redo logging to recover from failures
and ensure database consistency.
Perform regular backups to prevent data loss.

Minimal Front-End Requirements:

As this project focuses on demonstra ng DBMS concepts, the front end can be minimal.
Use command-line interface (CLI) or a simple text-based interface for user interac on.
Accept user inputs for performing CRUD (Create, Read, Update, Delete) opera ons.
Display relevant informa on and status messages to users.

Page 36 of 38
Conclusion:

This case study demonstrates the applica on of various DBMS concepts, including database
design, implementa on, transac on management, concurrency control, and recovery, in
developing an inventory management system. While the front-end requirements are minimal, the
focus is on showcasing the robustness and efficiency of the database system in handling real-
world scenarios and ensuring data integrity and consistency.

NoSQL Databases-Document Oriented, Key value pairs, Column


Oriented and Graph
NoSQL databases offer flexibility and scalability for managing large volumes of unstructured or
semi-structured data. Let's explore four common types of NoSQL databases:

1. Document-Oriented Databases:

Overview: Document-oriented databases store data in a semi-structured format, typically JSON


or BSON (binary JSON) documents. Each document can have its own structure, and collec ons of
documents are analogous to tables in rela onal databases.
Example: MongoDB is a popular document-oriented NoSQL database.
Use Cases: Content management systems, real- me analy cs, and product catalogs benefit
from the flexibility to store varying document structures.

2. Key-Value Pair Databases:

Overview: Key-value pair databases store data as a collec on of key-value pairs. Each key is
unique and associated with a value, which can be simple or complex data structures.
Example: Redis and Amazon DynamoDB are key-value pair databases.
Use Cases: Caching, session management, and real- me messaging systems where fast access
to data by key is cri cal.

3. Column-Oriented Databases:

Overview: Column-oriented databases store data in columns rather than rows, op mizing for
read-heavy workloads and analy cal queries. Each column family can contain mul ple columns,
and rows are grouped together based on a primary key.
Example: Apache Cassandra and Apache HBase are column-oriented databases.
Use Cases: Time-series data analysis, log processing, and data warehousing where aggrega ng
data across columns is common.

Page 37 of 38
4. Graph Databases:

Overview: Graph databases represent data as a network of nodes (en es) connected by edges
(rela onships). Nodes, edges, and proper es are fundamental concepts, and the database excels
at traversing rela onships efficiently.
Example: Neo4j and Amazon Neptune are graph databases.
Use Cases: Social networks, recommenda on systems, and network analysis where
understanding rela onships between data points is crucial.

Comparison:

Schema Flexibility: Document-oriented and key-value pair databases offer schema flexibility,
allowing each document or key-value pair to have a different structure. Column-oriented
databases have a fixed schema for each column family, while graph databases have a flexible
schema but emphasize rela onships.
Query Model: Document-oriented databases o en use query languages similar to SQL but
op mized for JSON/BSON documents. Key-value pair databases primarily support simple CRUD
opera ons. Column-oriented databases use SQL-like query languages for analy cal queries.
Graph databases use graph query languages like Cypher.
Performance Characteris cs: Key-value pair databases excel in read and write throughput.
Column-oriented databases op mize for read-heavy analy cal workloads. Document-oriented
databases offer a balance between read and write performance. Graph databases priori ze
rela onship traversal efficiency.
Each type of NoSQL database has its strengths and is suitable for different use cases depending
on the nature of the data and the requirements of the applica on.

Page 38 of 38

You might also like