0% found this document useful (0 votes)
129 views29 pages

File and File Structure: Overview of Storage Device

The document discusses file structures and organization in databases. It describes the basic components of a file including records and fields. It then explains different file operations like creation, reading, updating, inserting and deleting records. It discusses the memory hierarchy from registers to disk. Records can be organized in files as fixed-length or variable-length. Variable-length records use techniques like byte strings or pointers. Blocks are used to map records to disk for efficient storage. Buffer management is used to minimize disk access by keeping frequently used blocks in memory. The data dictionary containing metadata is also stored on disk.

Uploaded by

Nabin Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views29 pages

File and File Structure: Overview of Storage Device

The document discusses file structures and organization in databases. It describes the basic components of a file including records and fields. It then explains different file operations like creation, reading, updating, inserting and deleting records. It discusses the memory hierarchy from registers to disk. Records can be organized in files as fixed-length or variable-length. Variable-length records use techniques like byte strings or pointers. Blocks are used to map records to disk for efficient storage. Buffer management is used to minimize disk access by keeping frequently used blocks in memory. The data dictionary containing metadata is also stored on disk.

Uploaded by

Nabin Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 8

File and File Structure


Introduction
A collection of data or information that has a name, called the filename. Almost all information stored in
a computer must be in a file. There are many different types of files: data files, text files,
program files, directory files, and so on.
The physical or internal level of organization of a database system is concerned with the efficient storage of
information in the secondary storage device. The basic problem in the physical database representation is to
select suitable file system to store the desired information. The file consists of records and records may consists
of several fields.
The basic file operations that can be performed on file are
1. Creation of file
Before creating a file, we have to collect data, process the data to give it a record like structure. Then we
choose a name for the file and open the file on secondary storage device and store the collected data on it.
2. Reading a file
After creating a file, it is needed at any time that we have to read file. We may have to read a particular
record or the whole file.
3. Update of a file
Update of a file means that we have to perform some changes on the content of file.
4. Insertion in file
It is required to insert a record or set of records at some specific location.
5. Delete
When some record is not required in the file, then we delete that record to free up the memory.

Overview of storage Device


A typical hierarchy is illustrated below.

As we go down the hierarchy, the following occurs.


1. Decreasing cost per bit.
2. Increasing capacity.
3. Increasing access time.
4. Decreasing frequency of access of the memory by the processor.
Compiled By: Mohan Bhandari
Thus, smaller, more expensive and faster memories are supplemented by larger, cheaper and slower memories.

Computer memory can be classified in the below given hierarchy:


1) Internal register:
Internal register in a CPU is used for holding variables and temporary results. Internal registers have a
very small storage; however they can be accessed instantly. Accessing data from the internal register is
the fastest way to access memory.
2) Cache:
Cache is used by the CPU for memory which is being accessed over and over again. Instead of pulling it
every time from the main memory, it is put in cache for fast access. It is also a smaller memory, however,
larger than internal register.
Cache is further classified to L1, L2 and L3:
a. L1 cache: It is accessed without any delay.
b. L2 cache: It takes more clock cycles to access than L1 cache.
c. L3 cache: It takes more clock cycles to access than L2 cache.

3) Main memory or RAM (Random Access Memory):


It is a type of the computer memory and is a hardware component. It can be increased provided the
operating system can handle it. It is accessed slowly as compared to cache. They are volatile i.e. content
of memory are usually loss in case of power failure or crash.

4) Hard disk:
A hard disk is a hardware component in a computer. Data is kept permanently in this memory. Memory
from hard disk is not directly accessed by the CPU, hence it is slower. As compared with RAM, hard
disk is cheaper per bit. Typically, the entire database is stored on disk. Data must be moved from disk to
main memory in order for data to be processed.

5) Magnetic tape:
Magnetic tape memory is usually used for backing up large data. When the system needs to access a tape,
it is first mounted to access the data. When the data is accessed, it is then un-mounted. The memory
access time is slower in magnetic tape and it usually takes few minutes to access a tape.

Organization of Records into Block


A file is organized logically as a sequence of records. Records are mapped into disk blocks. Files are provided as
a basic construct in operating systems, so we assume the existence of an underlying file system.
Blocks are of a fixed size determined by the operating system. Record sizes vary. In relational database, tuples of
distinct relations may be of different sizes.
There are two approaches of organization of records into block.
1. Fixed length records
Consider a file of deposit records of the form:

If we assume that each character occupies one byte, an integer occupies 4 bytes, and a real 8 bytes, our
deposit record is 52 bytes long. The simplest approach is to use the first 52 bytes for the first record, the
next 52 bytes for the second, and so on.
However, there are two problems with this approach.

Compiled By: Mohan Bhandari


 It is difficult to delete a record from this type of structure. Space occupied by the records to be
deleted must be filled with some other records of the file. When a record is deleted, we could
move all successive records up one, which may require moving a lot of records.
 Some records will cross block boundaries. It would then require two block accesses to read or
write such a record.

Fig: Fixed length record


Use of pointers requires careful programming. If a record pointed to is moved or deleted, and that
pointer is not corrected, the pointer becomes a dangling pointer.

2. Variable length records.


Variable-length records arise in a database in several ways:
 Storage of multiple record types in a file
 Record types allowing variable field size
 Record types allowing repeating fields
One example with a variable-length record:

Account-information is an array with an arbitrary number of elements i.e. the type definition does not
limit the number of elements in the array. So there is no limit on how large a record can be. The space
available for the records must be managed carefully. To manage free space, there is a pointer that
indicates the start of the free space area.
There are two approaches for variable length record
1. Byte-String representation
2. Fixed length representation

1. Byte-String Representation
Byte string representation uses the technique of attaching a special end-of-record symbol ( ) to the end
of each record. Then we can store each record as a string of successive bytes.

Compiled By: Mohan Bhandari


Byte string representation has several disadvantages:
 It is not easy to re-use space left by a deleted record
 There is no space for records to grow longer, so if the record become longer it must be moved.
 A large number of small fragments of disk storage are wasted.

2. Fixed Length Representation


 Fixed-length representation uses one or more fixed-length records to represent one variable-length
record.
Two techniques:
a. Reserved space - uses fixed-length records large enough to accommodate the largest variable-
length record.
If the length of record is shorter and space is unused, it is filled with a null or symbol. The
main problem associated with this technique is space is wasted.

b. Pointers - represent by a list of fixed-length records, chained together via pointer. It can be used
even if the maximum record length is not known. The problem is that space is wasted in
successive records in a chain as non-repeating fields are still present.

Buffer Management
We need to use disk storage for the database, and to transfer blocks of data between MM and disk. We also want
to minimize the number of such transfers, as they are time-consuming. One way is to keep as many blocks as
possible in MM. Usually, we cannot keep all blocks in MM, and so we need to manage the allocation of available
MM space. The buffer is the part of MM available for storage of copies of disk blocks. The subsystem
responsible for the allocation of buffer space is called the buffer manager.
The buffer manager handles all requests for blocks of the database. If the block is already in MM, the address in
MM is given to the requestor. If not, the buffer manager must read the block in from disk (possibly displacing
some other block if the buffer is full) and then pass the address in MM to the requestor. The buffer manager must
use some sophisticated techniques in order to provide good service:

Compiled By: Mohan Bhandari


 Replacement Strategy - When there is no room left in the buffer, some block must be removed to
make way for the new one. Typical operating system memory management schemes use a “least
recently used'' (LRU) method.
 Pinned Blocks - For the database to be able to recover from crashes, we need to restrict times when a
block maybe written back to disk. A block not allowed to be written is said to be pinned. Many
operating systems do not provide support for pinned blocks, and such a feature is essential if a
database is to be “crash resistant''.
 Forced Output of Blocks - Sometimes it is necessary to write a block back to disk even though its
buffer space is not needed. (Called the forced output of a block.) This is due to the fact that MM
contents (and thus the buffer) are lost in a crash, while disk data usually survives.

Data Dictionary Storage


The database also needs to store information about the relations, known as the data dictionary. This information
is called data dictionary or system catalog. Some of the types of information the system must store are given
below.
 Names of relations.
o Names of attributes of relations.
o Domains of attributes.
o Names and definitions of views.
o Integrity constraints (e.g. key constraints).
 Data on the system users:
o Names of authorized users including passwords or other information used to authenticate users
o Accounting information about users.
 Statistical and descriptive data
o Number of tuples in each relation.
 Physical file organization
o How relation is stored (Sequential/ hash / Heap)
o Physical location of relation
o Information about indices (Name of index, types index formed, attribute on which the index is
defined).

File Organization
A file is a sequence of records stored in binary format. A disk drive is formatted into several blocks that can store
records. File records are mapped onto those disk blocks. File Organization defines how file records are mapped
onto disk blocks. We have four types of File Organization to organize file records.
1. Heap (Pile) file organization
2. Sequential file organization
3. Indexed Sequential file organization
4. Hash file organization

1. Heap file organization


In a heap or pile file, records are collected in the order they arrive. The block used in heap are linked by
pointers. Any records can be placed anywhere in the file where there is space for the records. There is no
ordering / sorting of the records. When a new record is to be inserted it is placed in the last block if there
is space. If the last block cannot accommodate that records, a new block is allocated and the records to be
inserted is placed. The time required to locate a records in a heap is time consuming if the file is spread
over more than few blocks. The heap file organization is generally used for small files.

Compiled By: Mohan Bhandari


Record 1

Record N

Fig: A heap file Structure


Advantage:
 Very good method of file organization for bulk insertion
 It is suited for very small files as the fetching of records is faster in them.

Disadvantage
 This method is inefficient for larger databases as it takes time to search/modify the record
 Proper memory management is required to boost the performance. Otherwise there would be lots
of unused memory blocks lying and memory size will simply be growing.

2. Sequential file organization


A sequential file is designed for efficient processing of records in sorted order based on some search key.
In a sequential file organization records are placed sequentially onto the storage media.
A search key is an attribute or set of attributes, but it need not to be a primary key. In addition, the
records in the file are usually order according to the vales of key attributes. To permit fast retrieval of
records in search key order, we chain together records by pointers. The pointer in each record point to the
next record in search key order. In the figure below, all the records are stored in search key order using
branch name as search key.

Advantage:
 The design is very simple compared other file organization.
 Searching and deletion operation is faster as compared to heap file organization.
 This method is good in case of report generation or statistical calculations.
 These files can be stored in magnetic tapes which are comparatively cheap.
Disadvantage
 This method always involves the effort for sorting the record. Each time any insert/update/ delete
transaction is performed, file must be sorted.

3. Indexing
An index is any data structure that take as input a property of records, typically the values of one or more
fields and finds the records with that property quickly.

Compiled By: Mohan Bhandari


The field in which the index is based is called the search key. An index is a collection of entries where
each index corresponds to a data records. An index takes as input a search key value and returns the
address of the records hold that values.

Search Key Pointer


Fig: Structure of Index
The search key value stored in the index are sorted. There are two basic kind of indices.
i. Ordered Indices
ii. Hash indices
a. Ordered indices
Ordered indices are based on a sorted ordering of values. There are two types of ordered indices.
a. Primary index
b. Secondary index

Primary index
We assume all files are ordered sequentially on some search key, such file are also called as index
sequential file. A primary index is associated with dense and sparse index.

Dense Index
An index record appears for every search key value in a file. This record contain search key
value and a pointer to the actual record. It require more memory space for index. Records
are accessed fast.

Fig: Dense index


Sparse Index
Index records are created only for some of the records. To locate a record, we find the index
with the largest search key value less than or equal to the search key value we are looking
for. We start at that record pointed to by the index record, and proceed along the pointer in
file (i.e. sequentially) until we find the desired record.

Compiled By: Mohan Bhandari


Fig: Sparse Index

Secondary index
We can use extra level of indirection to implement secondary indices on each search key that are not
candidate key. The pointer in such secondary index do not point directly to the file but to a bucket that
contains pointer to the file. Secondary index must be dense with and index entry for each search key
values and the pointer to every record in the file. Secondary indices improve the performance of queries
that uses keys other than the search key of the primary index

Fig: Secondary Index on balance field of account


B+ Tree index file
The main disadvantage of index sequential file organization are that performance degrades as the file size
grow. Although this degradation can be reduced by reorganization of file, but it requires periodic
reorganization of entire file.
B+ tree file structure maintains the efficient tree as it automatically reorganizes itself after every
insertion and deletion.
B+ tree is a (key, value) storage method in a tree like structure. B+ tree has one root, any number of
intermediary nodes (usually one) and a leaf node. Here all leaf nodes will have the actual records stored.
Intermediary nodes will have only pointers to the leaf nodes; it not has any data. Any node will have only
two leaves.

Compiled By: Mohan Bhandari


Properties:
 It is also called as balanced tree as all the leaf node are at same level.
 Each node in the n order B tree, except the root, must have at least [n/2] children and at most
n children.
 All the node except root have at least (n-1)/2 keys and at most (n-1) keys.
 All non-leaf node has one fewer keys than the number of its children.

Searching a record in B+ Tree

Suppose we want to search 65 in the below B+ tree structure. First we will fetch for the intermediary
node which will direct to the leaf node that can contain record for 65. So we find branch between 50
and 75 nodes in the intermediary node. Then we will be redirected to the third leaf node at the end.
Here DBMS will perform sequential search to find 65. Suppose, instead of 65, we have to search for
60. What will happen in this case? We will not be able to find in the leaf node. No
insertions/update/delete is allowed during the search in B+ tree.

Insertion in B+ Tree
 Traverse B-tree until we find leaf node suitable for insertion.
 We have two cases for inserting a key.
i. Node is not full
- Insertion is done in the node in increasing order of key
ii. Node is full
a. If a node is full and is a leaf, classify the keys L (Lowest), M (Middle) and H
(Highest) and split the node as below.

L M H

b. If a node is full and is non leaf, classify the keys L (Lowest), M (Middle) and H
(Highest) and split the node as below.

Compiled By: Mohan Bhandari


M

L H
The median value (M) is pushed one level up in the parent node. If the parent node is not
full then accommodate the median value otherwise repeat same procedure in case II.

Example:
Construct a B+ tree for the following set of key values: (2,3,5,7,11,17,19,23,29,31)
Assume that the tree is initially empty and values are added in ascending order.
Construct a B+ tree where the pointer number is Four.
Solution:

We know each node has N pointer and has N-1 search key values. In our example pointer
number is 4, so the search key value is 4-1=3.
So each node has 4 pointers and 3 search key value. The first node is

When we want to put 7 in this node we need to split it.

Then we put 19, 23 and split the leaf node. The node is

Then we put 19, 23 in the leaf node and split the leaf node.

Then we put 29, 31 and split the leaf node but when we split the leaf node at a time we
need to split the non-leaf node. We split the non-leaf node at same rules.

Compiled By: Mohan Bhandari


b. Hash Indices
A hash index organizes the search keys with their associated pointers into a hash file structure.
We apply a hash function on a search key to identify a bucket, and store the key and its associated
pointers in the bucket (or in overflow buckets).
A hash function is any function that can be used to map data of arbitrary size to data of a fixed
size.

A hash function, H (K), is a mapping function that maps all the set of search-keys K to the bucket
where keys and its associated pointers are placed.

Formally let K denote the set of all search keys value, B denote the set of all bucket address and h
denote the hash function. To insert a record with search key ki, we compute H (Ki) which gives
the address of bucket for that record. That record is then stored in that bucket
Deletion is also simple. If the search key value of the record to be deleted is Ki. We compute H
(ki) to find the corresponding bucket for that record and delete the record from the bucket.

Handling bucket overflow


If the bucket doesn’t have enough space to store the currently hashed key value, a bucket overflow
is said to occur. Bucket overflow can occur for the following reasons.
i. Insufficient Buckets
ii. Skew: Some bucket are assigned more records than others so bucket may overflow
even when other bucket have space. This situation is called bucket skew.
The worst possible hash function maps al search key values to same bucket. Such a hash function
is undesirable because all the records have to be kept in same bucket. An ideal hash function
distributes the stored key uniformly across the buckets so that every bucket has the same number
of records. We handle bucket overflow by using overflow buckets. If a record must be inserted in
bucket b and bucket b is already full, the system provides overflow bucket for b and insert the
record into the overflow bucket as shown in above figure. If the overflow bucket is also full then
the system provides another overflow bucket and so on. All the overflow buckets of the given
bucket are chained together in a linked list. This kind of overflow handling technique is called as
overflow chaining.

Assignment

Q. Explain B-tree. How can you insert data in it?

Compiled By: Mohan Bhandari


Chapter-9
Recovery System

Recovery System
 A computer system is subjected to failure due to variety of causes such as disk crash, power failure, software
crash, fire, network down etc.
 Whatever the cause, information may be lost.
 The database must take action in advance to ensure that atomicity and durability properties of transaction are
preserved.
 Recovery System in an integral part of a database system which is responsible for the restoration of the
database to a consistent state that existed prior to the occurrence of the failure.
Failure Classification
 Designing a reliable system that can recover form failure requires identifying the types of failure with which
the system has to deal.
 The major types of failures are
1. Transaction failure: The transaction failure occurs when a transaction is aborted. There are two types
of error that can make a transaction to fail.
 Logical errors: transaction cannot complete due to some internal error condition such as bad
input data, resource limit exceeds etc.
 System errors: the database system must terminate an active transaction due to an error
condition (e.g., deadlock). The transaction however can be re-executed at later time.
2. System crash: a power failure or other hardware or software failure causes the system to crash. This
generally occurs when there is hardware malfunctioning, there is bug in the database software or OS
resulting loss of content of volatile storage.
3. Disk failure: a head crash or similar disk failure destroys all or part of disk storage. Copies of data on
another disk or ternary media such as tapes are used to recover from the failure.
4. Natural Disaster: fire, flood, earthquake etc. may cause physical loss of all information on all media.

Storage Structure
The various data items in database may be stored and access in the member of different storage media. To ensure how
to recover data from any kind of failure, we must know about types of storage media and their access methods. The
storage media can be classified as.
1. Volatile storage:
 Information here does not survive system crashes
 examples: main memory, cache memory
2. Nonvolatile storage:
 Information normally does survives system crashes
 examples: disk, tape, flash memory,
 But may still fail in case of head crash.
3. Stable storage:
 A mythical form of storage that survives all failures i.e. system designed not to loss data.
 Approximated by maintaining multiple copies on distinct nonvolatile media

Stable Storage Implementation


 To implement stable storage we maintain multiple copies of each block on separate disks
 Copies can be at remote sites to protect against disasters such as fire or flooding.
 RAID systems guarantee that the failure of a single disk will not result in the loss of data.

Compiled By: Mohan Bhandari


 Update of information must be done in controlled manner to ensure failure during data transfer doesn’t
damage the needed information.
 Failure during data transfer can still result in inconsistent copies: transfer between memory and disk storage
result in
o Successful completion: Transferred information arrived safely at its destination.
o Partial failure: A failure occurred in the midst of a transfer and the destination block has incorrect
information.
o Total failure: The failure occurred sufficiently early during the transfer the destination block was
never updated
 If a data transfer failure occur, the system must detect it and invoke a recovery procedure to restore the block
to a consistent state.
 For this, the system must maintain two physical blocks for each logical database block.
 A transfer of block of data would be
o Write the information onto the first physical block.
o When the first write successfully completes, write the same information onto the second physical
block.
o The output is completed only after the second write successfully completes
 During recovery in case of data failure, each pair of physical block is examined. If both of them are the same
and no detectable error exists, then no further actions are necessary.
 If one block contains a detectable error (bad Checksum), then we replace its content with the content of the
other consistent block.
 When the blocks contain different data without detectable error in either, then the content of the consistent
block is written to the first block.
 So this recovery mechanism ensures that, a write to a stable storage either success completely or results in no
change.

Transaction Failure and Recovery


 In database system, the recovery manager (RM) is responsible for maintaining the database consistency in the
case of failure.
 While selecting the strategy, the recovery manager must ensure that any update performed by already
committed transaction at the time of system failure must be retained and those performed by aborted
transaction must be cancelled.

Log Based Recovery


 The most widely used structure for recording database modification in the log.
 A log is kept on stable storage for the log to be useful for recovery from system and disk failure.
 The log is a sequence of log records, and maintains a record of update activities on the database.
 When transaction Ti starts, it registers itself by writing a
<Ti start> log record
 Before Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old value), and V2 is the value to be written to X
(the new value).
 When Ti finishes it last statement, the log record <Ti commit> is written
 Whenever a transaction performs a write, it is essential that the log record for that write be created before the
database is modified.
 Once a log record exist, we have ability to undo the modification that has already been output to the database
by using the old value field in the log records.
 The problem with this mechanism is that the volume of data stored in the log may become unreasonable
large.
 There are two approached to log.
Compiled By: Mohan Bhandari
1. Deferred Database Modification
2. Immediate Database Modification
3. Checkpoints

1. Deferred Database Modification


The deferred database modification scheme doesn’t physically update the database on disk until the transaction
reaches its commit point. Before reaching commit, all transaction updates are recorded in the transaction local
workspace (buffer or main memory). During commit, the updates are first recorded in log and then written to the
database. If the transaction fails before reaching its commit point, it will have made no changes to database in any way
so no UNDO operation is necessary. It may be necessary to REDO effect of the operations of commit transactions
from the log because their effect may not have been recorded in the database. Therefore this technique is also called as
NO UNDO/REDO algorithm.
Transaction starts by writing < Ti start> record to log. A write(X) operations result in a log record <T i, X, V>, where
V is the new value for X. No old value in needed in this scheme. The write operation is not performed on X, but is
deferred. When Ti partially commits, <Ti commit> is written to the log. Finally the log records are read and used to
actually execute the previously deferred writes.
In case of any failure, no UNDO is necessary for recovery scheme however a transaction having <Ti start> and < Ti
Commit> record in the log must be redone (REDO) to set the value of all data items updated by transaction T i to the
new value found in log.
Example:
Log Write
<To start>
<To A45>
<To B56>
<To commit>
A=45
B=56

2. Immediate Database Modification (Uncommitted Modification)


In this technique, the database may be updated by some operation of transaction before the transaction reaches to its
commit point. However these operations are typically recorded in the log on the disk before they are applied to
database, making recovery still possible. If the transaction fails after recording some changes in the database but
before reaching its commit point, the effect of its operation must be undone (UNDO) i.e. the transaction must be
Rolled Back. So immediate update requires both UNDO and REDO during recovery. Therefore it is also known as
UNDO / REDO algorithm.
Transaction starts by writing <Ti start> record to log. A write(X) operation result in a log record <T i X, U, V> where
U is the old value of X and V is the new value of X. Since undoing may be needed, update logs must have both old
value and new value. The write operation on X is recorded in the log on disk and is output directly to stable storage
without concerning the transaction commits or not.
In case of failure, recovery scheme has two operations: UNDO and REDO. UNDO restores the value of all data items
updated by Ti to their old values, going backward from the last log record for T i. REDO sets the value of all data items
updated by Ti to the new values, going forward from the first log record for Ti. A transaction Ti must be undone if the
log contains the record < Ti, start> but doesn’t contain < Ti, commit> and Ti must be redone if the log contains both
< Ti, start> and < Ti, commit> record.
Example:

Compiled By: Mohan Bhandari


Another Example:

 Recovery actions in each case above are:


a. undo (T0): B is restored to 2000 and A to 1000.
b. undo (T1) and redo (T0): C is restored to 700, and then A and Bare set to 950 and 2050
respectively.
c. redo (T0) and redo (T1): A and B are set to 950 and 2050 respectively. Then C is set to 600

3. Checkpoints
The main problems faced in recovery procedure discussed earlier are
1. Searching the entire log to determine which transaction need to be redone and those that need to be undone is
time consuming.
2. We might unnecessarily redo transactions which have already output their updates to the database.
A scheme called check point is used to limit the volume of log information that has to be handled and processed in the
events of system failure. A check point performs periodic copy of log information onto the stable storage. A check
point requires the following sequence of actions to take place.
1. Output all the log records currently residing in main memory onto stable storage.
2. Output to the disk all modified buffer block.
3. Write a log record <check point> onto stable storage.
During recovery, we need to consider only the most recent transaction T i that started just before the check point and
the transactions that started after it. For this the following task is done.
1. Scan backwards from end of log to the most recent <check point> record.
2. Continue scanning backward till a record <Ti start> is identified.
3. So consider only those log records following <T i start>. Earlier part of log records can be ignored during
recovery.
The UNDO and REDO operation on all transaction Tj that started execution after identified Ti is as below.
 If no <Tj commit> record in log and Tj has done immediate modification then execute UNDO for all Tj.
 If <Tj commit> record in log then execute REDO for all Tj.

Shadow Paging
Shadow paging is an alternative to log based recovery which is useful if transaction executes serially. Here, when a
page of storage is modified by a transaction, a new page is allocated for the modified data and the old page remains as
a shadow copy of the data.
In this technique, database is partitioned into some number of fixed length block called as pages. The key idea behind
shadow paging technique is to maintain two page tables during the lifetime of a transaction. These tables are called as
current page table and shadow page table. Each entry in the page tables contain pointer to a page on a disk. When the

Compiled By: Mohan Bhandari


transaction starts, both tables are identical. The shadow page table is never changed during the life of the transaction.
The current page table is updated with each write operation.

Whenever any page is to be written for the first time


 A copy of this page is made onto an unused space.
 The current page table is then made to point to the copy.
 The update is performed on the copy
The transaction commits after performing the following steps
 Flush all modified pages in main memory to disk.
 Output current page table to disk.
 Make the current page table the new shadow page table by simply updating the pointer to point current page
table in disk.
To recover from a failure during transaction execution, it is sufficient to free the modified database pages and to
discard the current page table. The state of the database before transaction execution is available through the shadow
page table, and that state is recovered by reinstating the shadow page table. The database thus is returned to its state
prior to the transaction that was executing when the crash occurred, and any modified pages are discarded.
Committing a transaction corresponds to discarding the previous shadow page table. Since recovery involves neither
undoing nor redoing data items, this technique can be categorized as a NO-UNDO/NO-REDO technique for recovery.
Advantage
1. No overhead of writing log records.
2. Recovery is simple.
3. No Undo / No Redo algorithm.

Disadvantage
1. Copying the entire page table is very expensive
2. Commit overhead: The commit of a single transaction requires multiple blocks to be output: the current page
table, the actual data i.e. updated pages. Log based scheme need to output only the log records.
3. Data fragmentation: It causes database pages to change location making no longer contiguous.
4. Garbage collection: Each time a transaction commits, the database pages containing the old version of data
changed by the transaction must become inaccessible. Such pages do not contain usable information hence
considered as garbage. Periodically it is necessary to find all the garbage pages and add them to the list of free
pages. The process is called garbage collection and requires additional overhead and complexity on the
system.
5. Hard to extend algorithm to allow transaction to run concurrently.

Compiled By: Mohan Bhandari


Remote Backup System

 Traditional transaction-processing systems are centralized or client–server systems.


 Such systems are vulnerable to environmental disaster such as fire, flooding, or earthquakes.
 Increasingly, there is a need for transaction-processing systems that can function in spite of system failures or
environmental disasters.
 Such systems must provide high availability; that is, the time for which the system is unusable must be
extremely small.
 We can achieve high availability by performing transaction processing at one site, called the primary site, and
having a remote backup site where all the data from the primary site are replicated.
 The remote backup site is sometimes also called the secondary site.
 The remote site must be kept synchronized with the primary site, as updates are performed at the primary.
 We achieve synchronization by sending all log records from primary site to the remote backup site.
 The remote backup site must be physically separated from the primary—for example, we can locate it in a
different state—so that a disaster at the primary does not damage the remote backup site.

Architecture of remote backup system.

 When the primary site fails, the remote backup site takes over processing.
 First, however, it performs recovery, using its (perhaps outdated) copy of the data from the primary, and the
log records received from the primary.
 In effect, the remote backup site is performing recovery actions that would have been performed at the
primary site when the latter recovered.
 Once recovery has been performed, the remote backup site starts processing transactions.
 When the backup site takes over processing it becomes the new primary and the old primary site become new
backup site.

Compiled By: Mohan Bhandari


Chapter-10
Transaction Processing and Concurrency Control

Transaction
Collection of operations such as insertion, deletion, modifications etc. that form a single logical unit of work are called
transactions. A transaction is a unit of program execution that accesses and possibly updates various data items. A
transaction operations must be done entirely (commit) or aborted (Rollback), no intermediated state are acceptable.
During transaction execution, the database may be temporarily inconsistent however when the transaction completes
successfully i.e. commits, the database must be consistent. After a transaction commits, the changes it has made to the
database persist, even if there is system failure.
Transaction Processing Systems are the systems, which includes large size database and allows many users to use
same database concurrently. For example: Airline reservation system.
Example:
Transaction to transfer Rs 50 from account A to account B
1. Read(A)
2. A := A – 50
3. Write (A)
4. Read(B)
5. B := B + 50
6. Write(B)
State of Transaction
After the transaction starts execution, it changes state. The state of a transaction is defined as the current activity of
that transaction. Each transaction may have one of the following states explained below.

Partially Committed
Committed

Active

Failed Aborted

1. Active: It is the initial state; the transaction stays in this state while it is executing.
2. Partially Committed: After the final statement has been executed.
3. Failed: After the discovery that normal execution can no longer proceed.
4. Aborted: After the transaction has been rolled back and the database has been restored to the state
prior to the start of transaction.
5. Committed: A transaction that completes its execution successfully is said to be committed.

ACID properties of Transaction


To ensure data integrity, database system maintains the following properties of transaction.
1. Atomicity:
A transaction is an atomic unit of processing. It is either performed completely or not performed at all. At the
end of transaction, the update made by the transaction will be accessible to other transactions. Atomicity is
maintained in the presence of deadlock, software failure, disk failure etc.
2. Consistency:

Compiled By: Mohan Bhandari


The consistency property of transaction implies that if database was in consistent state before the initiation of
a transaction, then at the end of the transaction the database will also be in a consistent state. This means that a
transaction can’t break the rules or integrity constraints of the database.
3. Isolation:
The isolation property of a transaction indicates that the steps of operations performed by a transaction should
be isolated from other transaction i.e. the execution of a transaction should not be interfered by any other
transaction being executed simultaneously. Intermediate results of any transaction must be hidden from other
concurrently executing transactions.
4. Durability:
The durability property of transaction indicates that the changes made to the database by a committed
transaction must persist in the database. These changes must not be lost by any kind of failure. Durability also
refers to the ability of the system to recover committed transaction updates if the system or storage media
fails.

Schedules
When several transactions are executing concurrently then the order of execution of various instructions is known as
schedule. There are four types of schedule
1. Serial schedule
2. Non-serial schedule
3. Equivalent schedule
4. Conflict schedule

 A schedule S is called serial schedule if for any two transaction Ti and Tj participating in S either all
operation of Ti precedes all operation of Tj or vice versa. Hence in serial schedule, only one transaction is
active at a time. The commit or abort of an active transaction initiates execution of next transaction. No
interleaving occurs in serial schedule
 A schedule S is called non-serial schedule if for any two transaction Ti and Tj participating in S, the operation
of each transaction are executed non-consecutively with interleaved operations from the other transaction.
 Two schedules A and B are said to be equivalent schedules if the execution of first schedule is identical to
execution of second schedule.
 A schedule S is called conflict schedule if for any two transaction Ti and Tj there is some action ti of Ti and an
action tj of Tj accessing the same object and at least one of the actions is write.

Schedule1 Schedule2
T1 T2 T1 T2
Read (A) Read (A)
A := A – 50 Temp := A *0.1
Write (A) A := A – Temp
Read (B) Write (A)
B := B + 50 Read (B)
Write (B) B := B + Temp
Read (A) Write (B)
Temp := A *0.1 Read (A)
A := A – Temp A := A – 50
Write (A) Write (A)
Read (B) Read (B)
B := B + Temp B := B + 50
Write (B) Write (B)

Compiled By: Mohan Bhandari


Schedule3 Schedule4

T1 T2 T1 T2
Read (A) Read (A)
A := A – 50 A := A – 50
Write (A) Read (A)
Read (A) Temp := A *0.1
Temp := A *0.1 A := A – Temp
A := A – Temp Write (A)
Write (A) Read (B)
Read (B) Write (A)
B := B + 50 Read (B)
Write (B) B := B + 50
Read (B) Write (B)
B := B + Temp B := B + Temp
Write (B) Write (B)

In the above example, schedule1 and schedule2 are serial schedule because in schedule1 Transaction T1 is followed
by Transaction T2 and in schedule2 Transaction T2 is followed by Transaction T1. Schedule3 is non-serial schedule
because there is interleaving of operations between transactions T1 and T2. However the schedule3 is equivalent
schedule to schedule1 and schedule2 since the sum (A+B) is preserved. Schedule4 is conflict schedule because it
doesn’t preserve the value of (A+B)

Serializability
Serial schedule are generally considered unacceptable in practice because in serial schedule, if a transaction waits for
I/O operations to complete, it make waste of CPU time making while other transactions are ready for execution. So
interleaving i.e. non serial schedule could improve the use of CPU cycle. But some non-serial schedule produces
erroneous result while some produce correct result. So this introduces the concept of serializability.
A concurrent execution of N transaction is called serializable, if the execution is computationally equivalent to a
serial execution. When more than one transactions is being executed concurrently, we must have serializability in
order to have same effect on the database as same serial execution does.
There are two types of serializability.
1. Conflict serializability
2. View serializability

1. Conflict serializability
If two operations in a schedule satisfy all these three conditions then operations are said to be conflict.
i. They belong to different transaction.
ii. They access the same item.
iii. At least one of the operations is write operation.

Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by
both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S
and S´ are conflict equivalent.
So a schedule S is conflict serializability if it is conflict equivalent to a serial schedule

Compiled By: Mohan Bhandari


Schedule1 Schedule2

T1 T2
Read (A)
Write (A)

Write (A)

Schedule1 is conflict serializability since we can swap instructions (T1, T2) or (T2,T1) to get and equivalent serial
schedule but schedule2 is not as we are unable to swap the instruction in the above schedule to obtain either the serial
schedule (t1,t2) or (t2, t1).
2. View serializability
Two schedules S1 and S2 having same set of instructions are said to be view equivalent if the following conditions are
met.
1. If the transaction Ti in S1 reads an initial value for object X, so does the transaction Ti in S2.
2. If the transaction Ti in S1 reads the value written by transaction Tj in S1 for object X, so does the
transaction Ti in S2.
3. If the transaction Ti in S1 is the final transaction to write the value for an object X, so is the transaction Ti in
S2.
So a schedule is said to be view serializability if it is view equivalent to a serial schedule. Schedule3 in page no 2, is
schedule which is both view serializability and conflict serializability. Also every conflict-serializability schedule is
also a view serializability but there are some view serializability that are not conflict serializability.
Below figure shows a schedule which is view serializability but not conflict serializability.

Testing of Serializability
For testing of serializability, the simple and efficient method is to construct a directed graph, called a precedence
graph for schedule S.
G=(V,E)
The set of vertex consists of all the transactions participating in the schedule.
The set of edges consists of all edges Ti →Tj for which one of the three condition holds.
a. Ti executes Write (Q) before Tj executes Read (Q).
b. Ti executes Read (Q) before Tj executes Write (Q).
c. Ti executes Write (Q) before Tj executes Write (Q).
If the graph doesn’t contain a cycle, it is called as conflict serializable else non serializable. i.e. A schedule is conflict
serializable if and only if its precedence graph is acyclic.

Example:

1. Let us consider a schedule as given below.


T1 T2 T3
Read (X)
Read (X)
Write (X)
Read (X)
Write (X)

Compiled By: Mohan Bhandari


The precedence graph is as below.

X
T T
1 2

X X

X T
3
Since the graph contains a cycle, hence it is not conflict serializable.

2. Let us consider a schedule as given below.


T1 T2 T3
Read (X)
Read (X)
Write (X)
Read (X)
Write (X)

The precedence graph is as below

X
T T
1 2

X X

T
3
Since the graph contains no cycle, hence it is conflict serializable.

3. Test serializability of the following schedule.

T1 T2 T3
Read(X)
Read (Z)
Read (X)
Read (Z)
Read (Y)
Read (Y)
Write (X)
Write (Z)
Write (Y)
Write (Y)

Compiled By: Mohan Bhandari


The precedence graph is as below.

T Z T
1 2
Y

X Y
T
3
Since the graph contains cycle, hence it is not conflict serializability.

Concurrently Control Technique


Concurrency control is the process of managing simultaneous operations like inserts, updates, deletes on the database
without having them interfere with one another. Simply concurrency control is the management of concurrent
transaction execution. DBMS implements concurrency control technique to ensure serializability and isolation of
transaction in order to guard the consistency and integrity of database. There are several concurrency control
techniques which are explained below.

1. Lock based protocols


One way to ensure serializability is to require that data items be accessed in mutually exclusive manner i.e. while one
transaction is accessing a data item s, no other transaction can modify that data items. In this scheme, each data item
has a lock associated with it. When a transaction T wants to access the data item, it first examines the associated lock.
If no other transaction hold the lock, the lock scheduler locks the data items for transaction T. now if any other
transaction Ti wants to access the same data item then Ti has to wait until T releases the lock. So any transaction must
obtain a read lock or write lock on data item before it can perform a read or write operation.
Mode of locking
I. Shared Lock(S) or Read Lock
If a transaction T has shared mode lock on data item Q then,
 T can read them but not update Q.
 Any other transactions can also obtain read lock on same data item Q but not write lock.
2. Exclusive Lock(X) or Write Lock
If a transaction T has exclusive mode lock on a data item Q then,
 T can both read and update the Q.
 No other transaction can obtain either of read or write lock.
Based on these locking modes, we can define a compatibility matrix.
S X
S True False
X False False

If A and B are two arbitrary locking modes, if transaction T1 hold A lock on data item X and another transaction T2
request a B lock on the same data item. Then if the transaction T2 gets the permission to lock X in B mode then we
say that B mode is compatible to A mode. Shared mode is compatible with shared mode but not with exclusive mode.
At any time several shared lock can be held simultaneously on a particular data item. A subsequent exclusive mode
locking request has to wait until the currently held shared mode locks are released.
For example: let us consider we have two transaction as below.
T1: Read (Y) T2: Read (X)
Read (X) Read (Y)
X=X+Y X=X - Y
Compiled By: Mohan Bhandari
Write (X) Write (X)

Using locking, this transaction can be written as


T1: Readlock ( Y) T2: Writelock (X)
Read (Y) Read (X)
Unlock (Y) Readlock(y)
Writelock (X) Read (Y)
Read (X) Unlock (Y)
X=X+Y X=X-Y
Write (X) Write (X)
Unlock (X) Unlock (X)

2. Two Phase Locking Protocol (2PL)


If transactions running concurrently are allowed to acquire and release locks as read and updated, there is a danger that
incorrect result may be produced. So to overcome this problem two phase locking scheme is used. In this scheme
transactions make lock and unlock requests in two phases.
1. Growing phase
2. Shrinking phase
In growing phase, a transaction may obtain locks but may not release any lock. Here the no. of locks increases from
zero to maximum for the transaction.
In shrinking phase, a transaction releases locks but may not obtain any new locks. Here the no. of locks decreases
from maximum to zero.

Two phase locking guarantee serializability but it doesn’t prevent deadlock


Unserializable schedule using lock(S1) Schedule using 2pl (S2)
T1 T2
T1 T2
Writelock (A)
Writelock (A)
Read (A)
Read (A)
A = A-50
A = A-50
Readlock (B)
Readlock (B)
Write (A)
Write (A)
Read (B)
Read (B)
Unlock (A)
Writelock(B)
Unlock (B)
Unlock (A)
Readlock(A)
Readlock (A)
Writelock (B)
Unlock (B)
Read (A)
Read (A)
Read (B)
Read (B)
Unlock (A)
Unlock (A)
B = B + 50
B = B + 50
Display (A+B)
Display (A+B)
Write (B)
Write (B)
Unlock (B)
Unlock (B)

The schedule S2 wouldn’t lead to incorrect result, but both transactions are blocked waiting for each other resulting
deadlock situations. Deadlock occurs because of circular wait conditions involving two or more transactions as above
i.e. T2 holding B and requesting A held by T1, T1 holding A and requesting B held by T1. So in 2pl, we need a
deadlock detection mechanism and a technique to resolve the deadlock once it occurs.
A

T T
1 2

B
Compiled By: Mohan Bhandari
Types

A. Conservative or Static 2PL:


According to this scheme, a transaction locks all data items before beginning its operations and unlocks those data
items only after executing its last operations i.e. pre-declare all the locks before its execution. It often degrades the
system performance by unnecessary delaying other transactions which want to access some of the data item locked by
the transaction. The locking protocol is deadlock free but is difficult to use in practice.

Example:
Readlock (X)
Readlock(Y)
Writelock(Z)
Read (X)
Read (Y)
Z=X+Y
Write (Z)
Unlock (X)
Unlock (Y)
Unlock (Z)
B. Dynamic 2PL:
In this scheme, a transaction locks on a data items immediately before it is accesses. All the locks are released
immediately after the last step of transaction.
Example:
Readlock (X)
Read (X)
Readlock (Y)
Read (Y)
Writelock (Z)
Z=X+Y
Write (Z)
Unlock (X)
Unlock (Y)
Unlock (Z)

C. Strict 2PL:
In this protocol, a transaction does not release any of its write lock until it commits or aborts.
D. Rigorous 2PL:
In this protocol, a transaction does not release any of write lock and read lock until it commits or aborts.

Deadlock
Deadlock is a set of blocked transactions each holding a resource and waiting to acquire a resource held by another
process in the set. The simplest example of deadlock is given below.
1. Transaction T1 request resource A and receive it.
2. Transaction T2 request resource B and receive it.
3. Transaction T1 request resource B and is queued up waiting the release of B by transaction T2.
4. Transaction T2 request resource A and is queued up waiting the release of A by transaction T1.

Condition of deadlock
We can say a deadlock is occurred if these four conditions occur simultaneously.
1. Mutual Exclusion: Only one transaction at a time can use a resource.

Compiled By: Mohan Bhandari


2. Hold and wait: A transaction holding at least one resource is waiting to acquire additional resources held by
other transactions.
3. No preemption: A resource can be released only voluntarily by the transaction holding it after the transaction
completed its task.
4. Circular wait: There exist a set {T1, T2,……….TN} of waiting process such that T1 is waiting for a resource
that is held by T2, T2 is waiting for a resource held by T3 and finally TN is waiting for resource held by T1.

Deadlock Prevention
We can prevent the deadlock by
1. Preventing mutual exclusion by making the entire sharable data item sharable. This doesn’t create any
problem for read operation but may lead to serious problem for write operation.
2. No cyclic wait should occur.
3. Resources should be preempted.
4. A transaction may not request a resource if it is holding another resource.
5. Perform transaction rollback instead of waiting for a lock.

Deadlock Detection
Whenever a transaction is blocked by a lock, every time the DBMS build and analyse a waits- for-graph. If there is a
cycle in the wait-for-graph, then a deadlock is said to exist in the system. Once deadlock has been detected, it must be
resolved. The most common resolution technique is that one transaction in the wait-for-graph is selected as a victim
and is rolled back and restarted and the other transaction continues.

Deadlock Recovery
When deadlock is detected, the system must recover from the deadlock. Some of the approaches for breaking a
deadlock are listed below.
1. Simply Rollback one or more Transactions (victim) in order to break the circular wait.
In this, two method can be applied.
a) Rollback all deadlock transaction
This will break the deadlock cycle but at a loss of lots of computation, since transactions may have
computed for a long period of time and the result of the partial computation must be discarded and
recomputed later.
b) Rollback one process at a time
This will rollback one transaction at a time until a deadlock cycle is eliminated. With each rollback,
this requires invoking deadlock detection algorithm to determine whether any other transactions are
still deadlock.
2. Resource Preemption
In this, preempt some resource from transactions and allocate it to other transactions till the deadlock
cycle is broken. It involve the following issues.
a) We must determine which resource and which transaction are to be preempted. If a transaction has
completed 95% of its execution, it should not be our choice for preemption.
b) The process from which the resource is preempted should be rolled back and restarted.
c) Starvation should be avoided. Starvation happens if same transaction is chosen as victim .i.e. resource
are always preempted from same transaction as a result this transaction never completes its task.

Time Stamp Based Ordering Scheme


In this scheme, with each transaction Ti in the system we associated a unique fixed timestamp denoted by T s (Ti). This
time stamp is assigned by the DBMS before the transaction Ti starts execution. If Ti has been assigned timestamp Ts
(Ti) and a new transaction Tj enters the system then Ts (Ti) < Ts (Tj). There are two simple methods for implementing
this scheme.
1. Use the value of system clock as the timestamp i.e. a transaction’s timestamp value is equal to the value of
clock when the transaction enters the system.

Compiled By: Mohan Bhandari


2. Use the logical counter that is incremented after a new timestamp has been assigned i.e. transactions
timestamp is equal to the values of the counters when transaction enters the system.
The protocol manages concurrent execution such that the time-stamps determine the serializability order. In order to
assure such behaviour, the protocol maintains for each data Q two timestamp values:
– W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q) successfully.
– R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q) successfully.
The timestamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp
order.
• Suppose a transaction Ti issues a read(Q):
1. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence, the
read operation is rejected, and Ti is rolled back.
2. If TS(Ti) >= W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to max(R-
timestamp(Q), TS(Ti)).
• Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the system
assumed that that value would never be produced. Hence, the write operation is rejected, and Ti is rolled
back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this write
operation is rejected, and Ti is rolled back.
3. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

Multiple Granularity
Granularity is the size of data item being locked. It is considered as the major factor concurrency control because it
can effect the performance of concurrency and recovery. Because, if the granularity of a data item is very large then
the overhead of the lock is very low. For example, if we select the data item as whole database, then a transaction T
while accessing even a small portion of database will lock the whole database and no concurrent access by other
transactions would be allowed.
Multiple granularity allows data item to be of various size and define a hierarchy of data granularities where the small
granularities are nested within the larger one. It can be represented graphically as a tree.

The highest level represent the entire database. Below it are node of type area. Each area in turn has node of type file
as its children. Finally each file has node of type records. In addition to shared lock and exclusive lock, there are three
more additional lock modes with it.
1. Intention-shared (IS): If a node is locked in IS mode then it indicates explicit locking at a lower level of the
tree but only with shared locks.
2. Intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks.
3. Shared and intention-exclusive (SIX): the sub-tree rooted by that node is locked explicitly in shared mode and
explicit locking is being done at a lower level with exclusive-mode locks.
Intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.

Compiled By: Mohan Bhandari


Fig: Compatibility Matrix for multiple granularities.

Multiple Granularities Locking Scheme


Transaction Ti can lock a node Q, using the following rules:
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked by Ti in either IX or
IS mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently locked by Ti in either
IX or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-phase).
6. Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.

Locks are acquired in root-to-leaf order whereas they are released in leaf-to-root order.

Compiled By: Mohan Bhandari


Chapter-11
Advanced Database Concept

Assignment

a. Define OODBMS and Object Oriented Model.(Include object


structure, object classes, Inheritance, Multiple Inheritance, Object
Containership in your answer)
b. Define Object Relational Model
c. What is distributed database? Compare it with Centralized database.
d. Explain the concept of dataware housing.

Compiled By: Mohan Bhandari

You might also like