Unit 5 DBMS
Unit 5 DBMS
Unit-5
In database management systems (DBMS), data is stored using various storage structures that
organize and manage data efficiently. Here are some common storage structures used in DBMS:
Heap File Structure: In a heap file structure, records are stored in no par cular order, o en
appended to the end of the file. This makes inser on fast but retrieval can be slower since a full
scan might be required to find specific records.
Sequen al File Structure: Records are stored in a sorted order according to a key field. This
structure speeds up retrieval opera ons as binary search or other efficient search algorithms can
be used to find records quickly. However, inser on and dele on opera ons can be slower as
maintaining the sorted order requires shi ing records.
Hashed File Structure: In a hashed file structure, records are stored in loca ons determined by
applying a hash func on to a key field. This allows for very fast retrieval of records since the
loca on can be directly calculated from the hash value. However, collisions (two records mapping
to the same loca on) need to be handled.
Indexed File Structure: An indexed file structure includes one or more indexes that provide
pointers to the actual records in the file. This allows for efficient retrieval based on the indexed
fields. B-trees, B+ trees, and hash indexes are commonly used indexing techniques.
Clustered File Structure: In a clustered file structure, records with similar characteris cs are
stored together physically on disk. This can improve retrieval performance for queries that access
mul ple related records since they are likely to be stored close to each other.
Par oned File Structure: In a par oned file structure, data is divided into par ons based
on some criteria such as a range of values or a hash func on. This can improve performance by
distribu ng the data across mul ple disks or servers, allowing for parallel processing.
Mul dimensional Storage Structures: These are used for storing data with mul ple
dimensions, such as in data warehouses or mul dimensional databases. Examples include star
schemas, snowflake schemas, and bitmap indexes, which are op mized for analy cal queries and
OLAP opera ons.
Page 1 of 38
Each storage structure has its advantages and disadvantages, and the choice of structure depends
on factors such as the type of applica on, the size and nature of the data, and the types of queries
and opera ons expected to be performed on the data.
Unordered
Unordered storage typically stores the records in the order they are inserted. Such storage offers
good insertion efficiency (), but inefficient retrieval times (). Typically these retrieval times are
better, however, as most databases use indexes on the primary keys, resulting in retrieval times
of or for keys that are the same as the database row offsets within the storage system.
Ordered
Ordered storage typically stores the records in order and may have to rearrange or increase the
file size when a new record is inserted, resulting in lower insertion efficiency. However, ordered
storage provides more efficient retrieval as the records are pre-sorted.
Structured files
Heap files
Heap files are lists of unordered records of variable size. Although sharing a similar name, heap
files are widely different from in-memory heaps. In-memory heaps are ordered, as opposed to
heap files.
Simplest and most basic method
o insert efficient, with new records added at the end of the file, providing chronological order
o retrieval efficient when the handle to the memory is the address of the memory
o search inefficient, as searching has to be linear
o dele on is accomplished by marking selected records as "deleted"
o requires periodic reorganiza on if file is very vola le (changed frequently)
Advantages
o efficient for bulk loading data
o efficient for rela vely small rela ons as indexing overheads are avoided
o efficient when retrievals involve large propor on of stored records
Disadvantages
o not efficient for selec ve retrieval using key values, especially if large
o sor ng may be me-consuming
o not suitable for vola le tables
Hash buckets
Hash func ons calculate the address of the page in which the record is to be stored based on one
or more fields in the record
o hashing func ons chosen to ensure that addresses are spread evenly across the address space
o ‘occupancy’ is generally 40% to 60% of the total file size
o unique address not guaranteed so collision detec on and collision resolu on mechanisms are
required
Open addressing
Page 2 of 38
Chained/unchained overflow
Pros and cons
o efficient for exact matches on key field
o not suitable for range retrieval, which requires sequen al storage
o calculates where the record is stored based on fields in the record
o hash func ons ensure even spread of data
o collisions are possible, so collision detec on and restora on is required
B+ trees
These are the most commonly used in practice.
Time taken to access any record is the same because the same number of nodes is searched
Index is a full index so data file does not have to be ordered
Pros and cons
o versa le data structure – sequen al as well as random access
o access is fast
o supports exact, range, part key and pa ern matches efficiently.
o vola le files are handled efficiently because index is dynamic – expands and contracts as table
grows and shrinks
o less well suited to rela vely stable files – in this case, ISAM is more efficient
Data Orientation
Most conventional relational databases use "row-oriented" storage, meaning that all data associated with
a given row is stored together. By contrast, column-oriented DBMS store all data from a given column
together in order to more quickly serve data warehouse-style queries. Correlation databases are similar
to row-based databases, but apply a layer of indirection to map multiple instances of the same value to
the same numerical identifier.
Types of Memory
1. Primary Memory
The primary memory of a server is the type of data storage that is directly accessible by the central
processing unit, meaning that it doesn’t require any other devices to read from it. The primary
memory must, in general, function flawlessly with equal contributions from the electric power supply, the
Page 3 of 38
hardware backup system, the supporting devices, the coolant that moderates the system temperature,
etc.
The size of these devices is considerably smaller and they are vola le.
According to performance and speed, the primary memory devices are the fastest devices, and this
feature is in direct correla on with their capacity.
These primary memory devices are usually more expensive due to their increased speed and
performance.
The cache is one of the types of Primary Memory.
Cache Memory: Cache Memory is a special very high-speed memory. It is used to speed up and
synchronize with a high-speed CPU. Cache memory is costlier than main memory or disk memory
but more economical than CPU registers. Cache memory is an extremely fast memory type that acts
as a buffer between RAM and the CPU.
2. Secondary Memory
Data storage devices known as secondary storage, as the name suggests, are devices that can be accessed
for storing data that will be needed at a later point in time for various purposes or database actions.
Therefore, these types of storage systems are sometimes called backup units as well. Devices that are
plugged or connected externally fall under this memory category, unlike primary memory, which is part
of the CPU. The size of this group of devices is noticeably larger than the primary devices and smaller than
the tertiary devices.
It is also regarded as a temporary storage system since it can hold data when needed and delete
it when the user is done with it. Compared to primary storage devices as well as ter ary devices,
these secondary storage devices are slower with respect to ac ons and pace.
It usually has a higher capacity than primary storage systems, but it changes with the
technological world, which is expanding every day.
Some commonly used Secondary Memory types that are present in almost every system are:
Flash Memory: Flash memory, also known as flash storage, is a type of nonvola le memory that
erases data in units called blocks and rewrites data at the byte level. Flash memory is widely used
for storage and data transfer in consumer devices, enterprise systems, and industrial applica ons.
Unlike tradi onal hard drives, flash memories are able to retain data even a er the power has been
turned off
Magne c Disk Storage: A Magne c Disk is a type of secondary memory that is a flat disc covered
with a magne c coa ng to hold informa on. It is used to store various programs and files. The
polarized informa on in one direc on is represented by 1, and vice versa. The direc on is indicated
by 0.
3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold a large amount of data without
being constantly connected to the server or the peripherals. A device of this type is connected either
to a server or to a device where the database is stored from the outside.
Due to the fact that ter ary storage provides more space than other types of device memory but is
most slowly performing, the cost of ter ary storage is lower than primary and secondary. As a
means to make a backup of data, this type of storage is commonly used for making copies from
servers and databases.
The ability to use secondary devices and to delete the contents of the ter ary devices is similar.
Some commonly used Tertiary Memory types that are almost present in every system are:
Op cal Storage: It is a type of storage where reading and wri ng are to be performed with the help
of a laser. Typically data wri en on CDs and DVDs are examples of Op cal Storage.
Page 4 of 38
Tape Storage: Tape Storage is a type of storage data where we use magne c tape to store data. It is
used to store data for a long me and also helps in the backup of data in case of data loss.
Memory Hierarchy
A computer system has a hierarchy of memory. Direct access to a CPU’s main memory and inbuilt registers
is available. Accessing the main memory takes less time than running a CPU. Cache memory is introduced
to minimize this difference in speed. Data that is most frequently accessed by the CPU resides in cache
memory, which provides the fastest access time to data. Fastest-accessing memory is the most expensive.
Although large storage devices are slower and less expensive than CPU registers and cache memory, they
can store a greater amount of data.
1. Magnetic Disks
Present-day computer systems use hard disk drives as secondary storage devices. Magnetic disks store
information using the concept of magnetism. Metal disks are coated with magnetizable material to create
hard disks. Spindles hold these disks vertically. As the read/write head moves between the disks, it de-
magnetizes or magnetizes the spots under it. There are two magnetized spots: 0 (zero) and 1 (one).
Formatted hard disks store data efficiently by storing them in a defined order. The hard disk plate is
divided into many concentric circles, called tracks. Each track contains a number of sectors. Data on a hard
disk is typically stored in sectors of 512 bytes.
Fig: Raid-0
RAID 1: Mirroring is used in RAID 1. A RAID controller copies data across all disks in an array when
data is sent to it. In case of failure, RAID level 1 provides 100% redundancy.
Page 5 of 38
Fig.: Raid-1
RAID 2: The data in RAID 2 is striped on different disks, and the Error Correc on Code is recorded
using Hamming distance. Similarly to level 0, each bit within a word is stored on a separate disk, and
ECC codes for the data words are saved on a separate set of disks. As a result of its complex structure
and high cost, RAID 2 cannot be commercially deployed.
Fig: Raid-2
RAID 3: Data is striped across mul ple disks in RAID 3. Data words are parsed to generate a parity
bit. It is stored on a different disk. Thus, single-disk failures can be avoided.
Raid-3
RAID 4: This level involves wri ng an en re block of data onto data disks, and then genera ng the
parity and storing it somewhere else. At level 3, bytes are striped, while at level 4, blocks are striped.
Both levels 3 and 4 require a minimum of three disks.
Raid-4
Page 6 of 38
RAID 5: The data blocks in RAID 5 are wri en to different disks, but the parity bits are spread out
across all the data disks rather than being stored on a separate disk.
Raid-5
RAID 6: The RAID 6 level extends the level 5 concept. A pair of independent pari es are generated
and stored on mul ple disks at this level. A pair of independent pari es are generated and stored
on mul ple disks at this level. Ideally, you need four disk drives for this level.
Raid-6
Storage Hierarchy
Rather than the storage devices mentioned above, there are also other devices that are also used in day-
to-day life. These are mentioned below in the form of faster speed to lower speed from top to down.
Page 7 of 38
What is Transaction in DBMS?
Transactions in Database Management Systems (DBMS) are sets of operations performed to
modify data, including insertion, updates, or deletions. These transactions have various states
indicating their progress and required actions. They ensure data consistency even during system
failures, demonstrating a key advantage of DBMS. Transactions, executed repeatedly, like ATM
withdrawals, generate multiple instances, each maintaining database integrity through specific
properties.
Operations in Transaction
A certain set of operations takes place when a transaction is done that is used to perform some
logical set of operations. For example: When we go to withdraw money from ATM, we encounter
the following set of operations:
1. Transac on Ini ated
2. You have to insert an ATM card
3. Select your choice of language
4. Select whether savings or current account
5. Enter the amount to withdraw
6. Entering your ATM pin
7. Transac on processes
8. You collect the cash
9. You press finish to end transac on
The above mentioned are the set of operations done by you. But in the case of a transaction in
DBMS there are three major operations that are used for a transaction to get executed in an
efficient manner. These are:
1. Read/ Access Data 2. Write/ Change Data 3. Commit
Let's understand the above three sets of operations in a transaction with a real-life example of
transferring money from Account1 to Account2.
Initial balance in both the banks before the start of the transaction
Account1 - ₹ 5000 Account2 - ₹ 2000
This data before the start of the transaction is stored in the secondary memory (Hard disk) which
once initiated is bought to the primary memory (RAM) of the system for faster and better access.
Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the following set of operations
will take place.
Read (Account1) --> 5000 Account1 = Account1 - 500 Write (Account1) --> 4500 Read (Account2)
--> 2000 Account2 = Account2 + 500 Write (Account2) --> 2500 commit
The COMMIT statement permanently saves the changes made by the current transaction.
When a transaction is successful, COMMIT is applied. If the system fails before a COMMIT is
applied, the transaction reaches its previous state after ROLLBACK.
After commit operation the transaction ends and updated values of Account1 = ₹ 4500 and
Account2 = ₹ 2500. Every single operation that occurs before the commit is said to be in a
Page 8 of 38
partially committed state and is stored in the primary memory (RAM). After the transaction
is committed, the updated data is accepted and updated in the secondary memory (Hard Disk).
If in some case, the transaction failed anywhere before committing, then that transaction gets
aborted and have to start from the beginning as it can’t be continued from the previous state
of failure. This is known as Roll Back. :::
The ROLLBACK statement undo the changes made by the current transaction. A transaction
cannot undo changes after COMMIT execution.
Page 9 of 38
Aborted State: If the transaction fails during its execution, it goes from failed
state to aborted state and because in the previous states all the changes were only made
in the main memory, these uncommitted changes are either deleted or rolled back. The
transaction at this point can restart and start afresh from the active state.
Committed State: If the transaction completes all sets of operations successfully, all the
changes made during the partially committed state are permanently stored and the
transaction is stated to be completed, thus the transaction can progress to finally get
terminated in the terminated state.
Terminated State: If the transaction gets aborted after roll-back or the transaction comes
from the committed state, then the database comes to a consistent state and is ready for
further new transactions since the previous transaction is now terminated.
Page 10 of 38
Atomicity ensures that a transaction is treated as a single indivisible unit, either executing all its
operations or none at all.
Consistency ensures that the database remains in a valid state before and after a transaction.
Isolation ensures that concurrent transactions do not interfere with each other, maintaining data
integrity.
Durability guarantees that once a transaction is committed, its effects are permanent and survive
any system failures. Together, these properties ensure reliability and maintain data integrity in
DBMS operations.
State Descrip on
Ac ve Transac ons are in progress.
Par ally Commi ed Opera ons completed, but data not yet saved.
Failed Transac on fails database recovery system checks.
Page 11 of 38
State Descrip on
Commi ed Successful comple on, changes permanently saved.
Aborted Transac on fails tests, rolled back or aborted.
Terminated Transac on terminated, system ready for new transac ons.
Transaction control
In database management systems (DBMS), transac on control refers to the mechanisms used to
ensure the ACID proper es (Atomicity, Consistency, Isola on, Durability) of database
transac ons.
1. Atomicity in DBMS
The term atomicity is the ACID Property in DBMS that refers to the fact that the data is kept
atomic. It means that if any operation on the data is conducted, it should either be executed
completely or not at all. It also implies that the operation should not be interrupted or just half
completed. When performing operations on a transaction, the operation should be completed
totally rather than partially. If any of the operations aren’t completed fully, the transaction gets
aborted.
Example Sometimes, a current operation will be running and then, an operation with a higher
priority enters. This discontinues the current operation and the current operation will be aborted.
In the given scenario, if two users simultaneously try to book the only available seat on a train,
the transaction is considered incomplete. According to atomicity, the first user who successfully
clicks the booking button will reserve the seat and receive a notification, while the second user's
transaction will be rolled back, and they will be notified that no more seats are available.
In a simpler example, if a person tries to book a ticket, selects a seat, and proceeds to the
payment gateway but encounters a failure due to bank server issues, their booked seat will not
Page 12 of 38
be reserved for them. A complete transaction involves reserving the seat and completing the
payment. If any step fails, the operation is aborted, and the user is brought back to the initial
state without their seat being reserved.
Atomicity in DBMS is often referred to as the ‘all or nothing’ rule.
2. Consistency in DBMS
This ACID Property will verify the total sum of seats left in the train + sum of seats booked by
users = total the number of seats present in the train. After each transaction, consistency is
checked to ensure nothing has gone wrong.
Example Let us consider an example where one person is trying to book a ticket. They are able
to reserve their seat but their payment hasn’t gone through due to bank issues. In this case, their
transaction is rolled back. But just doing that isn’t sufficient. The number of available seats must
also be updated. Otherwise, if it isn’t updated, there will be an inconsistency where the seat given
up by the person is not accounted for. Hence, the total sum of seats left in the train + the sum of
seats booked by users would not be equal to the total number of seats present in the train if not
for consistency.
3. Isolation in DBMS
Isolation is defined as a state of separation. Isolation is an ACID Property in DBMS where no data
from one database should impact the other and where many transactions can take place at the
same time. In other words, when the operation on the first state of the database is finished, the
process on the second state of the database should begin. It indicates that if two actions are
conducted on two different databases, the value of one database may not be affected by the
value of the other. When two or more transactions occur at the same time in the case of
transactions, consistency should be maintained. Any modifications made in one transaction will
not be visible to other transactions until the change is committed to the memory.
Example Suppose two people try to book the same seat simultaneously. Transactions are
serialized to maintain data consistency. The first person's transaction succeeds, and they receive
a ticket. The second person's transaction fails as the seat is already booked. They receive an error
message indicating no available seats.
4. Durability in DBMS
The ACID Property durability in DBMS refers to the fact that if an operation is completed
successfully, the database remains permanent in the disk. The database’s durability should be
such that even if the system fails or crashes, the database will survive. However, if the database
is lost, the recovery manager is responsible for guaranteeing the database’s long-term viability.
Every time we make a change, we must use the COMMIT command to commit the values.
Example Suppose that there is a system failure in the railway management system resulted in
the loss of all booked train details. Millions of users who had paid for their seats are now unable
to board the train, causing significant financial losses and eroding trust in the company. The
Page 13 of 38
situation is particularly critical as these trains are needed for important reasons, causing
widespread panic and inconvenience.
Transac on control in DBMS involves the following key concepts and opera ons:
Transac on Begin: The start of a transac on is ini ated using a BEGIN TRANSACTION or START
TRANSACTION statement. This marks the beginning of a logical unit of work that will be treated
atomically.
Transac on Commit: When all opera ons within a transac on complete successfully, the
changes made by the transac on are permanently saved to the database using a COMMIT
statement. A er a commit, the changes become visible to other transac ons.
Transac on Rollback: If an error occurs during the execu on of a transac on or if the
transac on is explicitly aborted, the changes made by the transac on are undone using a
ROLLBACK statement. This restores the database to its state before the transac on began.
Savepoints: Savepoints allow transac ons to be divided into smaller, nested units. They provide
a way to roll back only part of a transac on without affec ng the en re transac on.
Effec ve transac on control ensures data integrity, concurrency, and recovery in database
systems, enabling reliable and consistent database opera ons.
Page 14 of 38
Deadlock in DBMS
Deadlock in a database management system (DBMS) is a problematic scenario where two or
more transactions are stuck in an indefinite wait for each other to finish, yet neither relinquishes
the necessary CPU and memory resources. This impasse halts the system, as tasks remain
uncompleted and perpetually waiting.
To illustrate, consider a student database: Transaction 1 locks certain Student table records,
needing to update corresponding Grade table entries. Concurrently, Transaction 2 locks different
Grade table records, requiring updates in the Student table, already locked by Transaction 1.
The deadlock dilemma emerges as Transaction 1 waits for Transaction 2 to unlock resources and
vice versa, leading to a system-wide standstill. This deadlock persists until the DBMS intervenes,
typically by aborting one of the transactions to resolve the gridlock.
Coffman Conditions
Regarding deadlock in DBMS, there were four conditions stated by Coffman. A deadlock
might occur if all of these four Coffman conditions hold true at any point in time.
1. Mutual Exclusion: There should be at least one resource that cannot be u lized by more than one
transac on at a me.
2. Hold and wait condi on: When any transac on is holding a resource, requests for some more
addi onal resources which are already being held by some other transac ons in the system.
3. No preemp on condi on: Access to a par cular resource can never be forcibly taken from a
running transac on. Only that running transac on can release a resource that is being held by it.
4. Circular wait condi on: In this condi on, a transac on is kept wai ng for a resource that is at the
same me is held by some other transac on and which is further wai ng for a third transac on
so on and the last transac on is wai ng for the very first one. Thus, giving rise to a circular chain
of wai ng transac ons.
Page 15 of 38
Deadlock Handling
Ostrich Algorithm
Ostrich Algorithm is an approach of handling deadlock that involves ignoring the deadlock and
pretending that it would never occur. This approach derives its name from the behavior of an
Ostrich which is “to stick one’s head in the sand and pretend there is no problem”. Windows and
UNIX-based systems use this approach of handling a deadlock.
Deadlock Avoidance
Deadlock avoidance is a technique of detec ng any deadlock in advance. Methods like Wait-For
graph can be used in smaller databases to detect deadlocks, but in the case of larger databases
deadlock preven on measures have to be used.
When a database gets stuck in a state of deadlock, it is preferred to avoid using that database
instead of abor ng or reboo ng the database server as it wastes of both me and resources.
Deadlock Detection
While a database transaction, if a task waits indefinitely to obtain CPU resources, then DBMS has
to check whether that task is in a state of deadlock or not. To detect a deadlock a resource
scheduler is used. A resource scheduler can detect deadlock by keeping track of resources
allocated to a specific transaction and requested by another transaction.
Wait-For Graph: This method of detec ng deadlocks involves crea ng a graph based on a
transac on and its acquired lock (a lock is a way of preven ng mul ple transac ons from accessing
any resource simultaneously). If the created graph contains a cycle/loop, it means a deadlock
exists.
DBMS creates a wait-for graph for every transaction/task that is in a waiting state and keeps on
checking whether there exists a cycle in any of the graphs or not
The above is a wait-for graph representation for two transactions T1 and T2 in a deadlock
situation.
Page 16 of 38
Deadlock Prevention
Avoiding one or more of the above-stated Coffman conditions can lead to the prevention of a
deadlock. Deadlock prevention in larger databases is a much more feasible situation rather than
handling it.
The DBMS is made to efficiently analyze every database transaction, whether they can cause a
deadlock or not, if any of the transactions can lead to a deadlock, then that transaction is never
executed.
1. Wait-Die Scheme: When a transaction requests a resource that is already locked by some
other transaction, then the DBMS checks the timestamp of both the transactions and
makes the older transaction wait until that resource is available for execution.
2. Wound-wait Scheme: When an older transaction demands a resource that is already
locked by a younger transaction (a transaction that is initiated later), the younger
transaction is forced to kill/stop its processing and release the locked resource for the
older transaction's own execution. The younger transaction is now restarted with a one-
minute delay, but the timestamp remains the same. If a younger transaction requests a
resource held by an older one, the younger transaction is made to wait until the older one
releases the resource.
Distributed systems are very vast because of which deadlocks can neither be prevented nor
avoided. Thus deadlock detection is the only possible option. Following are the requirements for
deadlock detection in distributed systems.
1. Progress: The method should be able to detect all the deadlocks in the system.
2. Safety: The method should not detect false or phantom deadlocks.
Page 17 of 38
Serializability in DBMS
Q. What is a serializable schedule, and what is it used for?
If a non-serial schedule can be transformed into its corresponding serial schedule, it is said to be
serializable. Simply said, a non-serial schedule is referred to as a serializable schedule if it yields
the same results as a serial timetable.
Non-serial Schedule
A schedule where the transactions are overlapping or switching places. As they are used to carry
out actual database operations, multiple transactions are running at once. It’s possible that these
transactions are focusing on the same data set. Therefore, it is crucial that non-serial schedules can
be serialized in order for our database to be consistent both before and after the transactions are
executed.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)
We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and they
are both working on the same data, i.e., “a” and “b”, interchangeably. Where “R”-Read, “W”-
Write
Serializability testing
We can utilize the Serialization Graph or Precedence Graph to examine a schedule’s serializability.
A schedule’s full transactions are organized into a Directed Graph, what a serialization graph is.
Page 18 of 38
Fig: Precedence Graph
It can be described as a Graph G(V, E) with vertices V = “V1, V2, V3,…, Vn” and directed edges
E = “E1, E2, E3,…, En”. One of the two operations—READ or WRITE—performed by a certain
transaction is contained in the collection of edges. Where Ti -> Tj, means Transaction-Ti is either
performing read or write before the transaction-Tj.
Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.
1. Conflict serializability
For example, let’s consider an order table and a customer table as two instances. Each order is
associated with one customer even though a single client may place orders. However, there are
restrictions for achieving conflict serializability in the database. Here are a few of them.
Page 19 of 38
2. The iden cal data item should be present in both transac ons.
3. Between the two opera ons, there should be at least one write opera on.
Example
Three transactions—t1, t2, and t3—are active on a schedule “S” at once. Let’s create a graph of
precedence.
R(b)
R(b)
W(b)
W(a)
W(a)
R(a)
W(a)
It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.
Page 20 of 38
As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will finally be
executed.
2. View Serializability
View serializability is a kind of operation in a serializable in which each transaction should provide
some results, and these outcomes are the output of properly sequentially executing the data item.
The view serializability, in contrast to conflict serialized, is concerned with avoiding database
inconsistency. The view serializability feature of DBMS enables users to see databases in
contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1 and
S2. The two transactions T1 and T2 should be used to establish these two schedules. Each schedule
must follow the three transactions in order to retain the equivalent of the transaction. These three
circumstances are listed below.
1. The first prerequisite is that the same kind of transac on appears on every schedule. This
requirement means that the same kind of group of transac ons cannot appear on both schedules
S1 and S2. The schedules are not equal to one another if one schedule commits a transac on but
it does not match the transac on of the other schedule.
2. The second requirement is that different read or write opera ons should not be used in either
schedule. On the other hand, we say that two schedules are not similar if schedule S1 has two
write opera ons whereas schedule S2 only has one. The number of the write opera on must be
the same in both schedules, however there is no issue if the number of the read opera on is
different.
3. The second to last requirement is that there should not be a conflict between either metable.
execu on order for a single data item. Assume, for instance, that schedule S1’s transac on is T1,
and schedule S2’s transac on is T2. The data item A is wri en by both the transac on T1 and the
transac on T2. The schedules are not equal in this instance. However, we referred to the schedule
as equivalent to one another if it had the same number of all write opera ons in the data item.
1. The same piece of data must be read for the first me. For instance, if transac on t1 is reading
“A” from the database in schedule S1, then t1 must also read A in schedule S2.
Page 21 of 38
2. The same piece of data must be used for the final write. As an illustra on, if transac on t1
updated A last in S1, it should also conduct final write in S2.
3. The middle sequence need to follow suit. As an illustra on, if in S1 t1 is reading A, and t2
updates A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule’s views are
equivalent.
Example
We have a schedule “S” with two concurrently running transactions, “t1” and “t2.”
Schedule – S:
W(a)
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
By switching between both transactions’ mid-read-write operations, let’s create its view
equivalent schedule (S’).
Schedule – S’:
W(a)
R(b)
W(b)
Page 22 of 38
Transaction-1 (t1) Transaction-2 (t2)
R(a)
W(a)
R(b)
W(b)
Advantages of Serializability
1. Execu on is predictable: In serializable, the DBMS’s threads are all performed simultaneously.
The DBMS doesn’t include any such surprises. In DBMS, no data loss or corrup on occurs and all
variables are updated as intended.
2. DBMS executes each thread independently, making it much simpler to understand and
troubleshoot each database thread. This can greatly simplify the debugging process. The
concurrent process is therefore not a concern for us.
3. Lower Costs: The cost of the hardware required for the efficient opera on of the database can
be decreased with the aid of the serializable property. It may also lower the price of developing
the so ware.
4. Increased Performance: Since serializable execu ons provide developers the opportunity to
op mize their code for performance, they occasionally outperform non-serializable equivalents.
For a DBMS transaction to be regarded as serializable, it must adhere to the ACID properties. In
DBMS, serializability comes in a variety of forms, each having advantages and disadvantages of
its own. Most of the time, choosing the best sort of serializability involves making a choice
between performance and correctness.
Making the incorrect choice for serializability might result in database issues that are challenging
to track down and resolve. You should now have a better knowledge of how serializability in
DBMS functions and the different types that are available thanks to this guide.
Answer:
Page 23 of 38
Through concurrency control techniques like locking, timestamp ordering, and optimistic
concurrency control, DBMS accomplish serializability. The simultaneous access to the database
is still permitted while these methods make sure that transactions are carried out in a serializable
order.
Answer:
Answer:
A stronger type of serializability than weak serializability is strong serializability. A schedule must
be comparable to a serial schedule in strong serializability, where transactions are carried out in
the same sequence as they were in the original schedule. A schedule just needs to be conflict equal
to a serial schedule in weak serializability.
Answer:
Page 24 of 38
Concurrency Control in DBMS
When several transactions execute concurrently without any rules and protocols, various
problems arise that may harm the data integrity of several databases. These problems are known
as concurrency control problems. Therefore, several rules are designed, to maintain consistency
in the transactions while they are executing concurrently which are known as concurrency
control protocols.
A transaction is a single reasonable unit of work that can retrieve or may change the data of a
database. Executing each transaction individually increases the waiting time for the other
transactions and the overall execution also gets delayed. Hence, to increase the throughput and
to reduce the waiting time, transactions are executed concurrently.
Example: Suppose, between two railway stations, A and B, 5 trains have to travel, if all the trains
are set in a row and only one train is allowed to move from station A to B and others have to wait
for the first train to reach its destination then it will take a lot of time for all the trains to travel
from station A to B. To reduce time all the trains should be allowed to move concurrently from
station A to B ensuring no risk of collision between them.
When several transactions execute simultaneously, then there is a risk of violation of the data
integrity of several databases. Concurrency Control in DBMS is a procedure of managing
simultaneous transactions ensuring their atomicity, isolation, consistency and serializability.
Page 25 of 38
Example: Consider two transactions A and B performing read/write operations on a data DT in
the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.
Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ READ(DT)
T5 ------ COMMIT
T6 ROLLBACK ------
Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored in the
temporary buffer. The transaction B reads the data DT as 1500 and commits it and the value of
DT permanently gets changed to 1500 in the database DB. Then some server errors occur in
transaction A and it wants to get rollback to its initial value, i.e., 1000 and then the dirty read
problem occurs.
Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DT=DT+500 ------
T4 WRITE(DT) ------
T5 ------ READ(DT)
Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value of DT
from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500.
Transaction B finds two different values of DT in its two different read operations.
Page 26 of 38
Example: Consider two transactions A and B performing read/write operations on a data DT in
the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.
Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DELETE(DT) ------
T4 ------ READ(DT)
Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from the
database DB and then again transaction B reads the value and finds an error saying the data DT
does not exist in the database DB.
Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT from
1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A again reads
DT and finds 1800 in DT and therefore the update done by transaction A has been lost.
Example: Consider two transactions A and B performing read/write operations on two data DT1
and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following
table shows the read/write operations in A and B transactions.
Time A B
T1 READ(DT1) ------
Page 27 of 38
Time A B
T2 add=0 ------
T3 add=add+DT1 ------
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 READ(DT2) ------
T7 add=add+DT2 ------
Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which calculates
the sum of two data DT1 and DT2 in variable add but in between the value of DT2 get changed
from 2000 to 2500 by transaction B. Variable add uses the modified value of DT2 and gives the
resultant sum as 3500 instead of 3000.
Lock-Based Protocols
To attain consistency, isolation between the transactions is the most important tool. Isolation is
achieved if we disable the transaction to perform a read/write operation. This is known as locking
an operation in a transaction. Through lock-based protocols, desired operations are freely
allowed to perform locking the undesired operations.
Exclusive Lock(X): The locks which allow both the read and write operations for any data in a
transaction are known as exclusive locks. This is a one-time use mode that can't be utilized on
the exact data item twice. They are represented by 'X'.
Pre-claiming Lock Protocol: According to the pre-claiming lock protocol initially, an assessment
of the operations that are going to be performed is conducted. Then a list is prepared to contain
the data items on which locks will be imposed. The transaction requests the system all the locks
before starting the execution of the operations. If all the locks are provided then the operations
Page 28 of 38
in the transaction run smoothly and then locks are returned to the system on completion. The
transaction rolls back if all the locks are not provided.
Two-phase Locking Protocol: This protocol consists of three phases. The transaction starts its
execution with the first phase, where it asks for the locks. Once the locks are granted, the second
phase begins, where the transaction contains all the locks. When the transaction releases the
first lock, the third phase begins where all the locks are getting released after the execution of
every operation in the transaction.
Strict Two-Phase Locking Protocol: The strict 2PL is almost similar to 2PL. The only difference is
that the strict 2PL does not allow releasing the locks just after the execution of the operations,
but it carries all the locks and releases them when the commit is triggered. Check out this article
to learn more about Loock-Based Protocols.
Time-based Protocols
According to this protocol, every transaction has a timestamp attached to it. The timestamp is
based on the time in which the transaction is entered into the system. There is read and write
timestamps associated with every transaction which consists of the time at which the latest read
and write operations are performed respectively.
Page 29 of 38
o TS(A) < W- mestamp(DT): Transac on will rollback. If the mestamp of transac on A at
which it has entered in the system is less than the write mestamp of DT that is the latest
me at which DT has been updated then the transac on will rollback.
o All the opera ons other than this will be executed.
Thomas' Write Rule: The rule alters the timestamp-ordering protocol to make the schedule view
serializable. For the case TS(A) < W-timestamp(DT), in the timestamp-ordering protocol, the
transaction will get rollback but according to Thomas Write Rule, whenever the write operation
comes up, it will get ignored.
Validation phase: In this phase, validation tests are performed that check whether the values of
data present in the local buffer can replace the original value of the database without causing
any harm to serializability.
Validation Test: Validation tests have performed on transaction A executing concurrently with
transaction B such that TS(A)<TS(B). The transactions must follow one of the following conditions:
1. Finish(A)<Start(B): The opera ons in transac on A are finished its execu on before transac on B
starts. Consider two transac ons A and B execu ng its opera ons. Hence serializability order is
maintained.
2. Start(B)<Finish(A)<Validate(B): The list of data items wri en by transac on A during its write
opera on should not intersect with the read of the transac on B.
Write phase: If the transaction passes the tests of the validation phase, then the values get
copied to the database, otherwise the transaction rolls back.
Example: Consider two transactions A and B performing read/write operations on two data DT1
and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following
table shows the read/write operations in A and B transactions.
Time A B
T1 READ(DT1) ------
T2 ------ READ(DT1)
T3 ------ DT1=DT1-100
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 ------ READ(DT2)
T7 ------
T8 PRINT(DT2-DT1) ------
Page 30 of 38
Time A B
T9 ------
T10 ------ WRITE(DT1)
T11 ------ WRITE(DT2)
The schedule passes the validation test of the validation phase due to the timestamp transaction
B being less than transaction A. It should be observed that the write operations are implemented
after the validation of both transactions. All the operations before the final write are performed
in the local buffer.
Page 31 of 38
Concurrency control algorithms
Concurrency control algorithms are crucial in database management systems (DBMS) to ensure
that mul ple transac ons can execute concurrently without interfering with each other, while
s ll maintaining the ACID proper es (Atomicity, Consistency, Isola on, Durability). Here are some
common concurrency control algorithms used in DBMS:
Two-Phase Locking (2PL): Transac ons acquire locks on data items before accessing them
and release locks a er comple ng their opera ons. 2PL ensures serializability by preven ng
conflicts between transac ons.
Strict Two-Phase Locking (Strict 2PL): Similar to 2PL, but locks are held un l the transac on
commits or aborts. This prevents the possibility of cascading aborts.
Deadlock Detec on and Preven on: Techniques such as deadlock detec on algorithms (e.g.,
wait-for graph) and preven on strategies (e.g., meouts, deadlock avoidance) are employed to
handle deadlocks that may occur due to lock-based concurrency control.
Timestamp Ordering Protocol (TO): Transac ons are assigned mestamps, and their
execu on order is determined based on these mestamps. Conflicts are resolved by comparing
transac on mestamps, and older transac ons are given priority.
Thomas' Write Rule: In mestamp-based concurrency control, Thomas' Write Rule ensures
strictness by allowing a transac on to write to a data item only if its mestamp is the highest
among all transac ons that have previously read or wri en to that item.
Snapshot Isola on: Each transac on sees a consistent snapshot of the database at the me
it starts. Reads are performed on this snapshot, ensuring that transac ons do not see changes
made by other concurrent transac ons.
Versioning: Instead of blocking transac ons, MVCC maintains mul ple versions of each data
item to allow concurrent reads and writes without conflicts. Transac ons access the appropriate
version based on their mestamps.
Timestamp-Based OCC: Transac ons execute op mis cally without acquiring locks. Before
commi ng, the system checks for conflicts by comparing mestamps or version numbers. If
conflicts are detected, one or more transac ons may need to be aborted and restarted.
Page 32 of 38
Valida on-Based OCC: Transac ons execute op mis cally, and at commit me, the system
validates the transac on to ensure that it does not conflict with other concurrent transac ons. If
valida on fails, the transac on is aborted and restarted.
These concurrency control algorithms ensure that transac ons execute correctly and
concurrently while maintaining the integrity and consistency of the database. The choice of
algorithm depends on factors such as the applica on requirements, system workload, and
performance considera ons.
Page 33 of 38
Addressing these issues requires careful design and implementa on of concurrency control
mechanisms, such as locking, mestamp-based methods, and mul version concurrency control.
Addi onally, choosing an appropriate isola on level for transac ons can help mi gate
concurrency-related problems while balancing performance and consistency requirements.
Types of Failures:
Transac on Failures: Occur when a transac on encounters an error and cannot con nue its
execu on. This may result in a par al update to the database.
System Failures: Occur due to hardware or so ware errors, such as disk crashes, memory
corrup on, or power outages, leading to poten al data loss or corrup on.
Media Failures: Occur when there is physical damage to storage devices, such as disk failures
or data corrup on, requiring data recovery from backups or redundant copies.
Recovery Algorithms:
Undo Logging (Rollback): In this algorithm, before modifying any data item, the DBMS logs
the old value of the item. If a transac on fails, the system uses these logs to undo the changes
made by the failed transac on, ensuring atomicity and durability.
Redo Logging (Forward Recovery): In this algorithm, the DBMS logs the changes made by
each transac on before the changes are applied to the database. If a failure occurs, the system
replays the log records to reapply the changes, ensuring durability.
Checkpoin ng: Checkpoin ng is a technique used to reduce the me required for recovery
by periodically wri ng the current state of the database to stable storage. This reduces the
number of log records that need to be processed during recovery.
Shadow Paging: Shadow paging maintains mul ple versions of the database, with each
version represen ng a consistent snapshot of the database. When a transac on modifies the
database, it creates a new version, and the old version remains unchanged un l the transac on
commits. If a transac on fails, the system can simply discard the changes made by the failed
transac on by rever ng to the previous version.
Write-Ahead Logging (WAL): In WAL, the DBMS ensures that log records are wri en to stable
storage before the corresponding data pages are modified. This ensures that the log contains a
Page 34 of 38
record of all changes made to the database before the changes are applied, providing a
mechanism for recovery in the event of a failure.
Database Replica on and Mirroring: Replica ng or mirroring the database across mul ple
physical loca ons ensures high availability and fault tolerance. If one copy of the database fails,
the system can failover to a redundant copy without losing data.
These recovery algorithms ensure that the database can recover to a consistent and durable state
a er a failure, minimizing data loss and maintaining data integrity. The choice of recovery
algorithm depends on factors such as the recovery me objec ve (RTO), recovery point objec ve
(RPO), and performance requirements of the system.
Project Overview:
Database: We'll use a rela onal database to store data, such as MySQL or SQLite.
Programming Language: Python for backend development.
Concepts to Apply: En ty-rela onship modeling, database normaliza on, SQL queries,
transac on management, concurrency control, and recovery algorithms.
Functionalities:
Inventory Management:
Add, update, and delete inventory items.
Track item quan ty, price, and other details.
Order Management:
Place orders for items.
Update order status (e.g., pending, delivered).
Customer Management:
Add, update, and delete customer informa on.
Implementation Steps:
Page 35 of 38
Database Design:
Define the en ty-rela onship model (ERD) for the inventory system.
Design tables for items, orders, customers, and other relevant en es.
Normalize the database schema to reduce redundancy and improve efficiency.
Database Implementa on:
Create the necessary tables in the chosen DBMS.
Establish rela onships between tables using foreign keys.
Backend Development:
Use Python to develop the backend logic of the system.
Implement func ons/methods to interact with the database:
Add, update, delete inventory items.
Place orders, update order status.
Manage customer informa on.
Transac on Management:
Use transac on control mechanisms to ensure data consistency during opera ons involving
mul ple steps.
Apply techniques like two-phase locking or mestamp-based concurrency control.
Error Handling:
Implement error handling mechanisms to manage excep ons and errors gracefully.
Rollback transac ons in case of errors to maintain data integrity.
Recovery:
Implement recovery algorithms such as undo logging or redo logging to recover from failures
and ensure database consistency.
Perform regular backups to prevent data loss.
As this project focuses on demonstra ng DBMS concepts, the front end can be minimal.
Use command-line interface (CLI) or a simple text-based interface for user interac on.
Accept user inputs for performing CRUD (Create, Read, Update, Delete) opera ons.
Display relevant informa on and status messages to users.
Page 36 of 38
Conclusion:
This case study demonstrates the applica on of various DBMS concepts, including database
design, implementa on, transac on management, concurrency control, and recovery, in
developing an inventory management system. While the front-end requirements are minimal, the
focus is on showcasing the robustness and efficiency of the database system in handling real-
world scenarios and ensuring data integrity and consistency.
1. Document-Oriented Databases:
Overview: Key-value pair databases store data as a collec on of key-value pairs. Each key is
unique and associated with a value, which can be simple or complex data structures.
Example: Redis and Amazon DynamoDB are key-value pair databases.
Use Cases: Caching, session management, and real- me messaging systems where fast access
to data by key is cri cal.
3. Column-Oriented Databases:
Overview: Column-oriented databases store data in columns rather than rows, op mizing for
read-heavy workloads and analy cal queries. Each column family can contain mul ple columns,
and rows are grouped together based on a primary key.
Example: Apache Cassandra and Apache HBase are column-oriented databases.
Use Cases: Time-series data analysis, log processing, and data warehousing where aggrega ng
data across columns is common.
Page 37 of 38
4. Graph Databases:
Overview: Graph databases represent data as a network of nodes (en es) connected by edges
(rela onships). Nodes, edges, and proper es are fundamental concepts, and the database excels
at traversing rela onships efficiently.
Example: Neo4j and Amazon Neptune are graph databases.
Use Cases: Social networks, recommenda on systems, and network analysis where
understanding rela onships between data points is crucial.
Comparison:
Schema Flexibility: Document-oriented and key-value pair databases offer schema flexibility,
allowing each document or key-value pair to have a different structure. Column-oriented
databases have a fixed schema for each column family, while graph databases have a flexible
schema but emphasize rela onships.
Query Model: Document-oriented databases o en use query languages similar to SQL but
op mized for JSON/BSON documents. Key-value pair databases primarily support simple CRUD
opera ons. Column-oriented databases use SQL-like query languages for analy cal queries.
Graph databases use graph query languages like Cypher.
Performance Characteris cs: Key-value pair databases excel in read and write throughput.
Column-oriented databases op mize for read-heavy analy cal workloads. Document-oriented
databases offer a balance between read and write performance. Graph databases priori ze
rela onship traversal efficiency.
Each type of NoSQL database has its strengths and is suitable for different use cases depending
on the nature of the data and the requirements of the applica on.
Page 38 of 38