DBMS-UNIT-4
DBMS-UNIT-4
1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.
A transaction can include the following basic database access operation.
● Read/Access data (R): Accessing the database item from disk (where the database
stored data) to memory variable.
● Write/Change data (W): Write the data item from the memory variable to the disk.
● Commit: Commit is a transaction control language that is used to permanently save
the changes done in a transaction
Example: Transfer of 50₹ from Account A to Account B. Initially A= 500₹, B= 800₹. This data
is brought to RAM from the Hard Disk.
All instructions before committing come under a partially committed state and are stored in
RAM. When the commit is read the data is fully accepted and is stored on the Hard Disk.
If the transaction is failed anywhere before committing we have to go back and start from the
beginning. We can’t continue from the same state. This is known as Roll Back.
****************************************************************************
Transaction States
Transactions can be implemented using SQL queries and Servers. In the below-given diagram,
you can see how transaction states work.
Following are the different types of transaction States :
Active State:
When the operations of a transaction are running then the transaction is said to be in an active
state. If all the read and write operations are performed without any error then it progresses to
the partially committed state, if somehow any operation fails, then it goes to a state known as
failed state.
Partially Committed:
After all the read and write operations are completed, the changes which were previously made
in the main memory are now made permanent in the database, after which the state will progress
to committed state but in case of a failure it will go to the failed state.
Failed State:
If any operation during the transaction fails due to some software or hardware issues, then it goes
to the failed state . The occurrence of a failure during a transaction makes a permanent change to
data in the database. The changes made into the local memory data are rolled back to the
previous consistent state.
Aborted State:
If the transaction fails during its execution, it goes from failed state to aborted state and because
in the previous states all the changes were only made in the main memory, these uncommitted
changes are either deleted or rolled back. The transaction at this point can restart and start afresh
from the active state.
Committed State:
If the transaction completes all sets of operations successfully, all the changes made during the
partially committed state are permanently stored and the transaction is stated to be completed,
thus the transaction can progress to finally get terminated in the terminated state.
Terminated State:
If the transaction gets aborted after roll-back or the transaction comes from the committed state,
then the database comes to a consistent state and is ready for further new transactions since the
previous transaction is now terminated.
******************************************************************************
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all. It involves the following two
operations.
—Abort: If a transaction aborts, changes made to the database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X)
but before write(Y)), then the amount has been deducted from X but not added to Y. This results
in an inconsistent database state. Therefore, the transaction must be executed in its entirety in
order to ensure the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to the correctness of a database. Referring to the example
above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As
a result, T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of the database state. Transactions occur independently without interference.
Changes occurring in a particular transaction will not be visible to any other transaction until that
particular change in that transaction is written to memory or has been committed. This property
ensures that the execution of transactions concurrently will result in a state that is equivalent to a
state achieved if these were executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of
operations takes place due to which T’’ reads the correct value of X but the incorrect value of Y
and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory.
The effects of the transaction, thus, are never lost.
Some important points:
The ACID properties, in totality, provide a mechanism to ensure the correctness and consistency
of a database in a way such that each transaction is a group of operations that acts as a single
unit, produces consistent results, acts in isolation from other operations, and updates that it
makes are durably stored.
1. Data Consistency: ACID properties ensure that the data remains consistent and
accurate after any transaction execution.
2. Data Integrity: ACID properties maintain the integrity of the data by ensuring that
any changes to the database are permanent and cannot be lost.
3. Concurrency Control: ACID properties help to manage multiple transactions
occurring concurrently by preventing interference between them.
4. Recovery: ACID properties ensure that in case of any failure or crash, the system can
recover the data up to the point of failure or crash.
1. Performance: The ACID properties can cause a performance overhead in the system,
as they require additional processing to ensure data consistency and integrity.
2. Scalability: The ACID properties may cause scalability issues in large distributed
systems where multiple transactions occur concurrently.
3. Complexity: Implementing the ACID properties can increase the complexity of the
system and require significant expertise and resources.
Overall, the advantages of ACID properties in DBMS outweigh the disadvantages.
They provide a reliable and consistent approach to data
4. management, ensuring data integrity, accuracy, and reliability. However, in some
cases, the overhead of implementing ACID properties can cause performance and
scalability issues. Therefore, it’s important to balance the benefits of ACID properties
against the specific needs and requirements of the system.
******************************************************************************
Atomicity and durability are two important concepts in database management systems (DBMS)
that ensure the consistency and reliability of data.
Atomicity:
Importance:
Even in the case of mistakes, failures, or crashes, atomicity ensures that the database maintains
consistency. The following are some of the reasons why atomicity is essential in DBMS:
○ Consistency: Atomicity ensures that the database remains in a consistent state at all
times. All changes made by a transaction are rolled back if it is interrupted or fails for any
other reason, returning the database to its initial state. By doing this, the database's
consistency and data integrity are maintained.
○ Recovery: Atomicity guarantees that, in the event of a system failure or crash, the
database can be restored to a consistent state. All changes made by a transaction are
undone if it is interrupted or fails, and the database is then reset to its initial state using
the undo log. This guarantees that, even in the event of failure, the database may be
restored to a consistent condition.
○ Reliability: Even in the face of mistakes or failures, atomicity makes the guarantee that
the database is trustworthy. By ensuring that transactions are atomic, the database
remains consistent and reliable, even in the event of system failures, crashes, or errors.
Implementation of Atomicity:
A number of strategies are used to establish atomicity in DBMS to guarantee that either all
operations inside a transaction are correctly done or none of them are executed at all.
○ Undo Log: An undo log is a mechanism used to keep track of the changes made by a
transaction before it is committed to the database. If a transaction fails, the undo log is
used to undo the changes made by the transaction, effectively rolling back the
transaction. By doing this, the database is guaranteed to remain in a consistent condition.
○ Redo Log: A redo log is a mechanism used to keep track of the changes made by a
transaction after it is committed to the database. If a system failure occurs after a
transaction is committed but before its changes are written to disk, the redo log can be
used to redo the changes and ensure that the database is consistent.
○ Two-Phase Commit: Two-phase commit is a protocol used to ensure that all nodes in a
distributed system commit or abort a transaction together. This ensures that the
transaction is executed atomically across all nodes and that the database remains
consistent across the entire system.
Durability:
Importance:
❖ Data Integrity: Durability ensures that the data in the database remains consistent and
accurate, even in the event of a system failure or crash. It guarantees that committed
transactions are durable and will be recovered without data loss or corruption.
❖ Recovery: Durability guarantees that, in the event of a system failure or crash, the
database can be restored to a consistent state. The database can be restored to a consistent
state if a committed transaction is lost due to a system failure or crash since it can be
recovered from the redo log or other backup storage.
❖ Availability: Durability ensures that the data in the database is always available for
access by users, even in the event of a system failure or crash. It ensures that committed
transactions are always retained in the database and are not lost in the event of a system
crash.
The implementation of durability in DBMS involves several techniques to ensure that committed
changes are durable and can be recovered in the event of failure.
❖ Transactions: Transactions are used to group related operations that need to be executed
atomically. They are either committed, in which case all their changes become
permanent, or rolled back, in which case none of their changes are made permanent.
❖ Logging: Logging is a technique that involves recording all changes made to the
database in a separate file called a log. The log is used to recover the database in case of a
failure. Write-ahead logging is a common technique that guarantees that data is written to
the log before it is written to the database.
❖ Shadow copy:
Shadow copy is a technique that involves making a copy of the database before any
changes are made. The copy is used to provide a consistent view of the database in case
of failure. The modifications are made to the original database after a transaction has
been committed.
❖ In the shadow-copy scheme, a transaction that wants to update the database first creates a
complete copy of the database. All updates are done on the new database copy, leaving
the original copy, the shadow copy, untouched. If at any point the transaction has to be
aborted, the system merely deletes the new copy. The old copy of the database has not
been affected.
❖ This scheme is based on making copies of the database, called shadow copies, and
assumes that only one transaction is active at a time. The scheme also assumes that the
database is simply a file on disk. A pointer called db-pointer is maintained on disk; it
points to the current copy of the database.
❖ First, the operating system is asked to make sure that all pages of the new copy of the
database have been written out to disk. (Unix systems use the flush command for this
purpose.)
❖ After the operating system has written all the pages to disk, the database system updates
the pointer db-pointer to point to the new copy of the database; the new copy then
becomes the current copy of the database. The old copy of the database is then deleted.
Figure below depicts the scheme, showing the database state before and after the update.
The transaction is said to have been committed at the point where the updated db pointer
is written to disk.
❖ If the transaction fails at any time before db-pointer is updated, the old contents of the
database are not affected.
❖ We can abort the transaction by just deleting the new copy of the database.
❖ Once the transaction has been committed, all the updates that it performed are in the
database pointed to by db pointer.
❖ Thus, either all updates of the transaction are reflected, or none of the effects are
reflected, regardless of transaction failure.
❖ Next, suppose that the system fails after db-pointer has been updated on disk. Before the
pointer is updated, all updated pages of the new copy of the database were written to disk.
❖ Again, we assume that, once a file is written to disk, its contents will not be damaged
even if there is a system failure. Therefore, when the system restarts, it will read db-
pointer and will thus see the contents of the database after all the updates performed by
the transaction.
● Backup and Recovery: In order to guarantee that the database can be recovered to a
consistent state in the event of a failure, backup and recovery procedures are used. This
involves making regular backups of the database and keeping track of changes made to
the database since the last backup.
*****************************************************************************
Serializability in DBMS
What is Serializability?
Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It
helps in maintaining the transactions to execute simultaneously without interleaving one another.
In simple words, serializability is a way to check if the execution of two or more transactions are
maintaining the database consistency or not.
Schedules in DBMS are a series of operations performing one transaction to the other.
R(X) means Reading the value: X; and W(X) means Writing the value: X.
1. Serial Schedule - A schedule in which only one transaction is executed at a time, i.e.,
one transaction is executed completely before starting another transaction.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
R(a)
W(a)
Here, we can see that Transaction-2 starts its execution after the completion of Transaction-1.
NOTE:
Serial schedules are always serializable because the transactions only work one after the other.
Also, for a transaction, there are n! serial schedules possible (where n is the number of
transactions).
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)
We can see that Transaction-2 starts its execution before the completion of Transaction-1, and
they are interchangeably working on the same data, i.e., "a" and "b".
Testing of Serializability
To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A
serialization Graph is nothing but a Directed Graph of the entire transactions of a schedule.
It can be defined as a Graph G(V, E) consisting of a set of directed-edges E = {E1, E2, E3, ...,
En} and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two
operations - READ, WRITE performed by a certain transaction.
Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.
NOTE:
If there is a cycle present in the serialized graph then the schedule is non-serializable because the
cycle resembles that one transaction is dependent on the other transaction and vice versa. It also
means that there are one or more conflicting pairs of operations in the transactions. On the other
hand, no-cycle means that the non-serial schedule is serializable.
To conclude, let’s take two operations on data: "a". The conflicting pairs are:
1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)
Note:
Let's take an example of schedule "S" having three transactions t1, t2, and t3 working
simultaneously
t1 t2 t3
R(x)
R(z)
W(z)
R(y)
R(y)
W(y)
W(x)
W(z)
W(x)
Non-serializable schedule. R(x) of T1 conflicts with W(x) of T3, so there is a directed edge from
T1 to T3. R(y) of T1 conflicts with W(y) of T2, so there is a directed edge from T1 to T2. W(y\x)
of T3 conflicts with W(x) of T1, so there is a directed edge from T3 to T. Similarly, we will
make edges for every conflicting pair. Now, as the cycle is formed, the transactions cannot be
serializable.
Types of Serializability
Serializability of any non-serial schedule can be verified using two types mainly:
1. Conflict Serializability
2. View Serializability.
Conflicting Operations
A pair of operations are said to be conflicting operations if they follow the set of conditions
given below:
Let's consider two different transactions, Ti and Tj .Then considering the above conditions, the
following table follows:
Transaction 1 Transaction 2
R1(A) W2(B)
W1(A)
R2(A)
R1(B) W2(A)
In the above schedule, we can notice that:
W1(A) and R2(A) are conflicting operations as they satisfy all the above conditions.
Similarly, W1(A) and W2(A) are conflicting operations as they are part of different transactions
working on the same data item, and one of them is the write operation.
W1(A) and W2(B) are non-conflicting operations as they work on different data items and thus
do not satisfy all the given conditions.
R1(A) and R2(A) are non-conflicting operations as none of them is a write operation and thus
does not satisfy the third condition.
W1(A) and R1(A) are non-conflicting as they belong to the same transactions and thus do not
satisfy the first condition.
Conflict Equivalent
If a schedule gets converted into another schedule by swapping the non-conflicting operations,
they are said to be conflict equivalent schedules.
A non-serial schedule gets checked for conflict serializability using the following steps:
Example:
We have a schedule "S" having three transactions t1, t2, and t3 working simultaneously. Let's
form a precedence graph.
t1 t2 t3
R(x)
R(y)
R(y)
W(y)
W(x)
W(x)
R(x)
W(x)
As there is no loop in the graph, it is a conflict serializable schedule as well as a serial schedule.
Since it is a serial schedule, we can detect the order of transactions as well.
The order of the Transactions: t1 will execute first as there is no incoming edge on T1. t3 will
execute second as it depends on T1 only. t2 will execute at last as it depends on both T1 and T3.
NOTE:
If a schedule is conflicting serializable, then it is surely a consistent schedule. On the other hand,
a non-conflicting serializable schedule may or may not be serial. To further check its serial
behavior, we use the concept of View Serializability.
View Serializability in DBMS
A system handles multiple transactions running concurrently, there is a possibility that a few of
them leaves the system databases in an inconsistent state and thus, the system fails. To be aware
of such schedules and save the system from failures, we check if schedules are serializable or
not. If the given schedule is serializable implies it will never leave the database in an inconsistent
state.
Serial schedules are the schedules where no two transactions are executed concurrently. Thus,
serial schedules never leave the database in an inconsistent state. If the given schedule is view
serializable, then we are sure that it is a consistent schedule or serial schedule.
View Serializability is a process used to check whether the given schedule is view serializable or
not. To do so we check if the given schedule is View Equivalent to its serial schedule.
The two conditions needed by schedules (S1 and S2) to be view equivalent are:
Example
T1 T2
R(X)
X=X+10
W(A)
R(X)
X=X+10
W(X)
R(Y)
Y=Y+20
W(Y)
R(Y)
Y=Y*1.1
W(Y)
Checking conflict serializability Precedence Graph of S1:
In the above example if we swap a few transaction operations, and format the table as follows:
T1 T2
R(X)
X=X+10
W(A)
R(Y)
Y=Y+20
W(Y)
R(X)
X=X+10
W(X)
R(Y)
Y=Y*1.1
W(Y)
Now the precedence graph is as follows:
The precedence graph doesn't contain any cycle or loop which implies the given schedule is
conflict serializable and the result will be the same as that of the first table.
But if the given schedule does not conflict serializable, then we need to check for View
Serializability. If it is view serializable then it is consistent else it is inconsistent. Thus, to
address the limitations of conflict serializability, the concept of view-serializability was
introduced.
If the given schedule is view-equivalent to the serial schedule itself then we can say that the
schedule is view-serializable.
To generate the serial schedule of the given schedule, we follow the following steps:
● Take the transaction say ‘Ti’ that performs any operation first in the schedule. Write all
the operations performed by that transaction one after the other in the given schedule.
● Similarly, take another transaction that is performing any operation right after Ti in the
given schedule and write all the operations performed by it below transaction Ti on any
data item.
● Follow the above steps for all the transactions. The order is very important.
T1 T2
R(X)
X=X+10
W(A)
R(X)
X=X+10
W(X)
R(Y)
Y=Y+20
W(Y)
R(Y)
Y=Y*1.1
W(Y)
Also, for the above table the serial schedule is as follows:
T1 T2
R(X)
X=X+10
W(A)
R(Y)
Y=Y+20
W(Y)
R(X)
X=X+10
W(X)
R(Y)
Y=Y*1.1
W(Y)
As transaction T1 performed the first operation we wrote all of its operations on data item X as
well as Y first and then all the operations of T2 were written next.
Two schedules are said to view equivalent if they satisfy the following three conditions:
In the example above, the Initial read is made by the T1 transaction for both the data items X and
Y in Schedule S1 and in S2 as well initial read is made by transaction T1 for both. Thus, the
initial read condition is satisfied in S1 and S2 for data items X and Y.
Value of X and Y is updated by transaction T1 in schedule S1, which is read by transaction T2.
In Schedule S2 the X and Y data items are updated by T1 and are read by T2 as well. Thus, in
the above two schedules, the update read condition is also satisfied.
Condition 3: Final Write
If transaction T1 updates data item Y at last in S1, then in S2 also, T1 should perform the final
write operation on Y.
The final write for data items X and Y are made by Transaction T2 in Schedule S1. Also, the
final write is made by T2 in Schedule S2 in Schedule S1 an well as serial schedule S2. Thus, the
condition for the final write is also satisfied.
Result
As all the three conditions are satisfied for our schedule S1 and its Serial Schedule S2, they are
view equivalent. Thus, we can say that the given schedule S1 is the view-serializable schedule
and is consistent.
There's another method to check the view-serializability of a schedule. The first step of this
method is the same as the previous method i.e. checking the conflict serializability of the
schedule by creating the precedence graph.
Let us take an example of the schedule as follows to check the View Serializability
T1 T2 T3
R(X)
W(X)
R(X)
W(X)
W(X)
Writing all the conflicting operations:
● R1(X), W2(X) (T1 → T2)
● R1(X), W3(X) (T1 → T3)
● W2(X), R3(X) (T2 → T3)
● W2(X), W1(X) (T2 → T1)
● W2(X), W3(X) (T2 → T3)
● R3(X), W1(X) (T3 → T1)
● W1(X), W3(X) (T1 → T3)
If the precedence graph doesn't contain a loop/cycle, then it is conflict serializable, and thus
concludes that the given schedule is consistent. We are not required to perform the View
Serializability test as a conflict serializable schedule is view-serializable as well.
As the above graph contains a loop/cycle, it does not conflict with serializable. Thus we need to
perform the following steps -
Next, we will check for blind writes. If the blind writes don't exist, then the schedule is non-view
Serializable. We can simply conclude that the given schedule is inconsistent.
If there exist any blind writes in the schedule, then it may or may not be view serializable.
Blind writes: If a write action is performed on a data item by a Transaction (updation), without
performing the reading operation then it is known as blind write.
In the above example, transaction T2 contains a blind write, thus we are not sure if the given
schedule is view-serializable or not.
Lastly, we will draw a dependency graph for the schedule. Note: The dependency graph is
different from the precedence graph.
● Firstly, T1 reads X, and T2 first updates X. So, T1 must execute before the T2 operation.
(T1 → T2)
● T3 performs the final updation on X thus, T3 must execute after T1 and T2. (T1 → T3
and T2 → T3)
● T2 updates X before T3, so we get the dependency as (T2 → T3)
As no cycles exist in the dependency graph for the example we can say that the schedule is view-
serializable.
If any cycle/ loop exists, then it is not view-serializable and the schedule is inconsistent. If
cycle//loop doesn't exist, then it is view-serializable.
*****************************************************************************
Recoverability in DBMS
Recoverability refers to the ability of a system to restore its state to a point where the integrity of
its data is not compromised, especially after a failure or an error.
When multiple transactions are executing concurrently, issues may arise that affect the system's
recoverability. The interaction between transactions, if not managed correctly, can result in
scenarios where a transaction's effects cannot be undone, which would violate the system's
integrity.
Importance of Recoverability:
The need for recoverability arises because databases are designed to ensure data reliability and
consistency.
1. Recoverable Schedules
Ti and Tj, if Tj reads a data item previously written by Ti, then Ti must commit before Tj
commits. If a transaction fails for any reason and needs to be rolled back, the system can recover
without having to rollback other transactions that have read or used data written by the failed
transaction.
2. Non-Recoverable Schedules
Ti and Tj such that Tj reads a data item previously written by Ti , but Ti has not committed yet
and Tj commits before Ti. If ‘Ti’ fails and needs to be rolled back after Tj has committed, there's
no straightforward way to roll back the effects of Tj, leading to potential data inconsistency.
In this schedule, T2 reads a value written by T1 and commits before T1 does. If T1 encounters a
failure and has to be rolled back after T2 has committed, we're left in a problematic situation
since we cannot easily roll back T2, making the schedule non-recoverable.
3. Cascading Rollback
A cascading rollback occurs when the rollback of a single transaction causes one or more
dependent transactions to be rolled back. This situation can arise when one transaction reads
uncommitted changes of another transaction, and then the latter transaction fails and needs to be
rolled back. Consequently, any transaction that has read the uncommitted changes of the failed
transaction also needs to be rolled back, leading to a cascade effect.
Here, T2 reads an uncommitted value of A written by T1. When T1 fails and is rolled back, T2
also has to be rolled back, leading to a cascading rollback. This is undesirable because it wastes
computational effort and can complicate recovery procedures.
4. Cascadeless Schedules
A schedule is considered cascadeless if transactions only read committed values. This means, in
such a schedule, a transaction can read a value written by another transaction only after the latter
has committed. Cascadeless schedules prevent cascading rollbacks.
If a transaction reads the value of an operation from an uncommitted transaction and commits
before the transaction from where it has read the value, then such a schedule is called Non-
Recoverable schedule. A non-recoverable schedule means when there is a system failure, we
may not be able to recover to a consistent database state. If the commit operation of Ti doesn't
occur before the commit operation of Tj, it is non-recoverable.
Consider the following Schedule involving two transactions T1 and T2. T2 read the value of A
written by T1, and committed. T1 might later abort/commit; therefore, the value read by T2 is
wrong, but since T2 committed, this Schedule is non-recoverable.
****************************************************************************
Isolation is one of the core ACID properties of a database transaction, ensuring that the
operations of one transaction remain hidden from other transactions until completion. It means
that no two transactions should interfere with each other and affect the other's intermediate state.
Isolation Levels
Isolation levels defines the degree to which a transaction must be isolated from the data
modifications made by any other transaction in the database system. There are four levels of
transaction isolation defined by SQL -
1. Serializable
2. Repeatable Read
● Since other transaction cannot read, update or delete these rows, it avoids non repeatable
read.
3. Read Committed
● Thus it does not allow dirty read (i.e. one transaction reading of data immediately after
written by another transaction).
● The transaction holds a read or write lock on the current row, and thus prevent other rows
from reading, updating or deleting it.
4. Read Uncommitted
● In this level, one transaction may read not yet committed changes made by another
transaction.
The proper isolation level or concurrency control mechanism to use depends on the specific
requirements of a system and its workload. Some systems may prioritize high throughput and
can tolerate lower isolation levels, while others might require strict consistency and higher
isolation.
Serializable NO NO
Repeatable Read NO NO
***************************************************************************
Concurrent Execution in DBMS
● In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same
database is executed simultaneously on a multi-user system by different users.
● While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
● The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of
the transaction operations, there occur several challenging problems that need to be
solved.
Several problems that arise when numerous transactions execute simultaneously in a random
manner are referred to as Concurrency Control Problems.
The dirty read problem in DBMS occurs when a transaction reads the data that has been
updated by another transaction that is still uncommitted. It arises due to multiple uncommitted
transactions executing simultaneously.
Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ READ(DT)
T5 ------ COMMIT
T6 ROLLBACK ------
Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored in
the temporary buffer. The transaction B reads the data DT as 1500 and commits it and the value
of DT permanently gets changed to 1500 in the database DB. Then some server errors occur in
transaction A and it wants to get rollback to its initial value, i.e., 1000 and then the dirty read
problem occurs.
The unrepeatable read problem occurs when two or more different values of the same data are
read during the read operations in the same transaction.
Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
T3 DT=DT+500 ------
T4 WRITE(DT) ------
T5 ------ READ(DT)
Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value of
DT from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500.
Transaction B finds two different values of DT in its two different read operations.
Phantom Read Problem
In the phantom read problem, data is read through two different read operations in the same
transaction. In the first read operation, a value of the data is obtained but in the second operation,
an error is obtained saying the data does not exist.
Time A B
T1 READ(DT) ------
T2 ------ READ(DT)
DELETE(DT
T3 ) ------
T4 ------ READ(DT)
Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from
the database DB and then again transaction B reads the value and finds an error saying the data
DT does not exist in the database DB.
The Lost Update problem arises when an update in the data is done over another update but by
two different transactions.
Time A B
T1 READ(DT) ------
T2 DT=DT+500 ------
T3 WRITE(DT) ------
T4 ------ DT=DT+300
T5 ------ WRITE(DT)
T6 READ(DT) ------
Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT
from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A again
reads DT and finds 1800 in DT and therefore the update done by transaction A has been lost.
The Incorrect summary problem occurs when there is an incorrect sum of the two data. This
happens when a transaction tries to sum two data using an aggregate function and the value of
any one of the data get changed by another transaction.
Example: Consider two transactions A and B performing read/write operations on two data DT1
and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following
table shows the read/write operations in A and B transactions.
Time A B
T1 READ(DT1) ------
T2 add=0 ------
T3 add=add+DT1 ------
T4 ------ READ(DT2)
T5 ------ DT2=DT2+500
T6 READ(DT2) ------
T7 add=add+DT2 ------
Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which
calculates the sum of two data DT1 and DT2 in variable add but in between the value of DT2 get
changed from 2000 to 2500 by transaction B. Variable add uses the modified value of DT2 and
gives the resultant sum as 3500 instead of 3000.
***************************************************************************
To avoid concurrency control problems and to maintain consistency and serializability during the
execution of concurrent transactions some rules are made. These rules are known as Concurrency
Control Protocols.
1. Lock-Based Protocols
2. Time-based Protocols
3. Validation Based Protocol
Lock-Based Protocol
1. Consistency: Without locks, multiple transactions could modify the same data item
simultaneously, resulting in an inconsistent state.
2. Isolation: Locks ensure that the operations of one transaction are isolated from other
transactions, i.e., they are invisible to other transactions until the transaction is
committed.
3. Concurrency: While ensuring consistency and isolation, locks also allow multiple
transactions to be processed simultaneously by the system, optimizing system throughput
and overall performance.
4. Avoiding Conflicts: Locks help in avoiding data conflicts that might arise due to
simultaneous read and write operations by different transactions on the same data item.
5. Preventing Dirty Reads: With the help of locks, a transaction is prevented from reading
data that hasn't yet been committed by another transaction.
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
Shared lock:
● It is also known as a Read-only lock. In a shared lock, the data item can only be read by
the transaction.
● It can be shared between the transactions because when the transaction holds a lock, then
it can't update the data on the data item.
Exclusive lock:
● In the exclusive lock, the data item can be both read as well as written by the transaction.
● This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
A vital point to remember when using Lock-based protocols in Database Management System is
that a Shared Lock can be held by any amount of transactions. On the other hand, an Exclusive
Lock can only be held by one transaction in DBMS, this is because a shared lock only reads data
but does not perform any other activities, whereas an exclusive lock performs read as well as
writing activities.
The figure given below demonstrates that when two transactions are involved, and both of these
transactions seek to read a specific data item, the transaction is authorized, and no conflict
occurs; but, in a situation when one transaction intends to write the data item and another
transaction attempts to read or write simultaneously, the interaction is rejected.
The two methods outlined below can be used to convert between the locks:
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow
all the transactions to get the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.
● Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
● Before initiating an execution of the transaction, it requests DBMS for all the locks on all
those data items.
● If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the locks.
● If all the locks are not granted then this protocol allows the transaction to roll back and
waits until all the locks are granted.
○ The two-phase locking protocol divides the execution phase of the transaction into three
parts.
○ In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
○ In the second part, the transaction acquires all the locks. The third phase is started as soon
as the transaction releases its first lock.
○ In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing locks held by the transaction may be released,
but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
○ Growing phase: from step 1-3
○ Lock point: at 3
Transaction T2:
○ Lock point: at 6
● Concurrency: By allowing multiple transactions to acquire locks and release them, 2PL
increases the concurrency level, leading to better system throughput and overall
performance.
● Deadlocks: The main disadvantage of 2PL is that it can lead to deadlocks, where two or
more transactions wait indefinitely for a resource locked by the other.
● Reduced Concurrency (in certain cases): Locking can block transactions, which can
reduce concurrency. For example, if one transaction holds a lock for a long time, other
transactions needing that lock will be blocked.
● Overhead: Maintaining locks, especially in systems with a large number of items and
transactions, requires overhead. There's a time cost associated with acquiring and
releasing locks, and memory overhead for maintaining the lock table.
● Starvation: It's possible for some transactions to get repeatedly delayed if other
transactions are continually requesting and acquiring locks.
Starvation
Starvation is the situation when a transaction needs to wait for an indefinite period to acquire a
lock.
Deadlock
Deadlock refers to a specific situation where two or more processes are waiting for each other to
release a resource or more than two processes are waiting for the resource in a circular chain.
○ The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
○ The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.
○ Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.
● A transaction can release a lock after using it, but it cannot commit until all locks have
been acquired.
● A transaction must request all the locks it will ever need before it begins execution. If any
of the requested locks are unavailable, the transaction is delayed until they are all
available.
● This approach can avoid deadlocks since transactions only start when all their required
locks are available.
**************************************************************************
2.Timestamp-based Protocols
Timestamp based Protocol in DBMS is an algorithm which uses the System Time or Logical
Counter as a timestamp to serialize the execution of concurrent transactions.
The Timestamp-based protocol ensures that every conflicting read and write operations are
executed in a timestamp order.
The older transaction is always given priority in this method. It uses system time to determine
the time stamp of the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help us to manage the order between the conflicting transactions when
they will execute. Timestamp-based protocols manage conflicts as soon as an operation is
created.
Example:
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.
● If Read timestamp of data item X is greater than the Timestamp of the Transaction T i.e
R_TS(X) > TS(T) or if Write Timestamp of the data item X is greater than the timestamp
of the transaction i.e W_TS(X) > TS(T), then abort and rollback T and reject the
operation. else,
● If Write Timestamp of data item X is greater than the timestamp of the transaction T i.e
W_TS(X) > TS(T), then abort and reject T and reject the operation, else
● If Write Timestamp of data item X is less than or equal to the timestamp of the
transaction T i.e W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
Whenever there is a conflicting operation that violates the timestamp ordering protocol then the
later operation is rejected and the transaction is aborted. Schedules created by the Basic
timestamp ordering protocol are conflict serializable and deadlock-free.
One of the drawbacks of the Basic timestamp ordering protocol is that cascading Rollback is
possible in the timestamp ordering protocol. For Example, if transactions T1 and T2 use a value
written by T1. If T1 aborts and rolls back before committing the transaction then T2 must also be
aborted and rolled back. So cascading problems may occur in the Basic timestamp ordering
protocol.
strict timestamp ordering is a variation of basic timestamp ordering. Strict timestamp ordering
ensures that the transaction is both strict and conflicts serializable. In Strict timestamp ordering a
transaction T that issues a Read_ item(X) or Write_ item(X) such that TS(T) > W_TS(X) has its
read or write operation delayed until the Transaction T‘that wrote the values of X has committed
or aborted.
Thomas Write Rule provides the guarantee of serializability order for the protocol. It improves
the Basic Timestamp Ordering Algorithm.
● If TS(T) < W_TS(X) then don't execute the W_ item(X) operation of the transaction and
continue processing.
● If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
Advantages
● No older transaction waits for a longer period of time so the protocol is free from
deadlock.
● Timestamp based protocol ensures that there are no conflicting items in the transaction
execution.
Disadvantages
*****************************************************************************
Validation Based Protocol is also called Optimistic Concurrency Control Technique. This
protocol is used in DBMS (Database Management System) for avoiding concurrency in
transactions. It is called optimistic because of the assumption it makes, i.e. very less interference
occurs, therefore, there is no need for checking while the transaction is executed.
In this technique, no checking is done while the transaction is being executed. Until the
transaction end is reached updates in the transaction are not applied directly to the database. All
updates are applied to local copies of data items kept for the transaction. At the end of
transaction execution, while execution of the transaction, a validation phase checks whether any
of the transaction updates violate serializability. If there is no violation of serializability the
transaction is committed and the database is updated; or else, the transaction is updated and then
restarted.
Optimistic Concurrency Control is a three-phase protocol. The three phases for validation based
protocol:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
● This protocol is used to determine the time stamp for the transaction for serialization
using the timestamp of the validation phase, as it is the actual phase which determines if
the transaction will commit or rollback.
● While executing the transaction, it ensures a greater degree of concurrency and also less
number of conflicts.
******************************************************************
Multiple Granularity
Multiple Granularity:
● It can be defined as hierarchically breaking up the database into blocks which can be
locked.
● The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
● It makes easy to decide either to lock a data item or to unlock a data item. This type of
hierarchy can be graphically represented as a tree.
● The second level represents a node of type area. The higher level database consists of
exactly these areas.
● The area consists of children nodes which are known as files. No file can be present in
more than one area.
● Finally, each file contains child nodes known as records. The file has exactly those
records that are its child nodes. No records represent in more than one file.
● Hence, the levels of the tree starting from the top level are as follows:
○ Database
○ Area
○ File
○ Record
In this example, the highest level shows the entire database. The levels below are file, record,
and fields.
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with
shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared
locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some
node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the
compatibility matrix for these lock modes:
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
● Transaction T1 firstly locks the root of the tree. It can lock it in any mode.
● If T1 currently has the parent of the node locked in either IX or IS mode, then the
transaction T1 will lock a node in S or IS mode only.
● If T1 currently has the parent of the node locked in either IX or SIX modes, then the
transaction T1 will lock a node in X, SIX, or IX mode only.
● If T1 has not previously unlocked any node only, then the Transaction T1 can lock a
node.
● If T1 currently has none of the children of the node-locked only, then Transaction T1 will
unlock a node.
Observe that in multiple-granularity, the locks are acquired in top-down order, and locks must be
released in bottom-up order.
● If transaction T1 reads record Ra9 in file Fa, then transaction T1 needs to lock the
database, area A1 and file Fa in IX mode. Finally, it needs to lock Ra2 in S mode.
● If transaction T2 modifies record Ra9 in file Fa, then it can do so after locking the
database, area A1 and file Fa in IX mode. Finally, it needs to lock the Ra9 in X mode.
● If transaction T3 reads all the records in file Fa, then transaction T3 needs to lock the
database, and area A in IS mode. At last, it needs to lock Fa in S mode.
● If transaction T4 reads the entire database, then T4 needs to lock the database in S mode.
***************************************************************************
Introduction
Data may be monitored, stored, and changed rapidly and effectively using a DBMS (Database
Management System).A database possesses atomicity, consistency, isolation, and durability
qualities. The ability of a system to preserve data and changes made to data defines its durability.
A database could fail for any of the following reasons:
○ Transaction failures arise when a certain process dealing with data updates cannot be
completed.
○ Disk crashes may occur as a result of the system's failure to read the disc.
○ The data in the database must be recoverable to the state they were in prior to the system
failure, even if the database system fails. In such situations, database recovery procedures
in DBMS are employed to retrieve the data.
The recovery procedures in DBMS ensure the database's atomicity and durability. If a system
crashes in the middle of a transaction and all of its data is lost, it is not regarded as durable. If
just a portion of the data is updated during the transaction, it is not considered atomic. Data
recovery procedures in DBMS make sure that the data is always recoverable to protect the
durability property and that its state is retained to protect the atomic property. The procedures
listed below are used to recover data from a DBMS,
The atomicity attribute of DBMS safeguards the data state. If a data modification is performed,
the operation must be completed entirely, or the data's state must be maintained as if the
manipulation never occurred. This characteristic may be impacted by DBMS failure brought on
by transactions, but DBMS recovery methods will protect it.
***********************************************************************
Log-Based Recovery
● The log is a sequence of records. Log of each transaction is maintained in some stable
storage so that if any failure occurs, then it can be recovered from there.
● If any operation is performed on the database, then it will be recorded in the log.
● But the process of storing the logs should be done before the actual transaction is applied
in the database.
There are two approaches to modify the database:
When the system is crashed, then the system consults the log to find which transactions need to
be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the
Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or
<Ti, abort>, then the Transaction Ti needs to be undone.
***************************************************************************
Recovery with Concurrent Transactions
Concurrency control means that multiple transactions can be executed at the same time and then
the interleaved logs occur. But there may be changes in transaction results so maintain the order
of execution of those transactions.
During recovery, it would be very difficult for the recovery system to backtrack all the logs and
then start recovering.
Recovery with concurrent transactions can be done in the following four ways.
In this scheme, the recovery scheme depends greatly on the concurrency control scheme that is
used. So, to rollback a failed transaction, we must undo the updates performed by the transaction.
Transaction rollback :
● In this scheme, we rollback a failed transaction by using the log.
● The system scans the log backward for a failed transaction, for every log record found
in the log the system restores the data item.
Checkpoints :
● In this scheme, we used checkpoints to reduce the number of log records that the
system must scan when it recovers from a crash.
● In a concurrent transaction processing system, we require that the checkpoint log
record be of the form <checkpoint L>, where ‘L’ is a list of transactions active at the
time of the checkpoint.
● A fuzzy checkpoint is a checkpoint where transactions are allowed to perform updates
even while buffer blocks are being written out.
Restart recovery :
***********************************************************************