DBMS UNIT 4 - Class
DBMS UNIT 4 - Class
MODULE 4
INFORMAL DESIGN GUIDELINES FOR RELATIONAL SCHEMAS
There are 4 informal guidelines that may be used as a measure to
determine the quality of relational schema. They are
• Making sure that the semantics of the attribute is clear in the schema
• Reducing the redundant information in tuples
• Reducing the NULL values in tuples
• Disallowing the possibility of generating spurious tuples
Making sure that the semantics of the attribute is clear in the schema
Whenever we form the relational schema, there should contain some
meaning among the attributes. That is, the attributes belonging to one
relation must have certain real world meaning and a proper interpretation
associated with them. This meaning is called semantics. Semantics relates
one attribute to another
For example:
Employee
eid ename Join_date esal dno
Here all attributes define the employee relation. That is semantics of the
attribute relates to the relation name
Reducing the redundant information in tuples
One goal of schema design is to minimize the storage space used by the
relation. If the information is stored redundantly, it leaves to the wastage of
spaces. This problem is called update anomalies. There are 3 types of
update anomalies
2
1. Insertion Anomalies
Consider the following tables
Student
Sid Sname Sem Deptno Dname
1 Anu 1 1 Computer
2 Veena 1 1 Computer
3 Arjun 1 2 English
4 Binil 1 2 English
5 Bindhu 1 1 Computer
Problem
Whenever we insert tuples, if there are ‘n’ students in one department for
each semester, then sem,deptno and dname values are repeated ‘n’ times.
This situation leads to data redundancy and is called insertion anomalies
2. Deletion Anomalies
Deletion anomalies arises the situation where , if we delete the students
from the table, at the time we delete the last student from the table , then
the whole information about that department will also be deleted
3. Modification Anomalies
If we change one of the attribute of a table, then we must update all the
student details belonging to that attribute
Reducing the NULL values in tuples
If the situation that, many of the attribute do not apply to all tuples in the
relation, so that we must put NULL’s in those tuples. It leads to the wastage
of storage space and may also cause some update anomalies
NULL values can be applied to a tuple at the following situations
• The attribute does not apply to a tuple
• The attribute value for tuple is unknown
• The value is known but absent
3
Functional Dependency
A functional dependency (FD) is a constraint between two set of attributes
from the database. Functional dependency is a relationship that exists
when one attribute uniquely determines another attribute.
Student
sid sname sdob dno dname dloc
It is represented in table as
sid sname sdob dno dname dloc
FD1 FD2
4
Normalization
The normalization process first proposed by codd in 1972. Codd proposed
3 normal forms first, second, third then next a strong definition of third
called BCNF
Normalization is a process of analyzing the given relational schema based
on their functional dependencies and primary keys to achieve the desirable
properties of
1. Minimizing redundancy
2. Minimizing insertion, deletion and update anomalies
Normal form: The normal form of a relation refers to the highest normal
form condition that a relation meets
Normalization is done through decomposition. It guarantee 2
properties
1. The non-additive join or lossless join property
This property ensures that no spurious tuples are created at the time
of normalization through decomposition
2. The dependency preservation property
This property ensures that after decomposing the relation, the
resulting schema also contain all the functional dependencies that
exists in the actual relational schema
De-normalization: it is the process of storing the higher normal form
relation to a base form relation (lower normal form).
Here the above table contains only single atomic attributes. So that this
relation is in 1 NF
FD1
FD2
FD3
Above relation contains 3 functional dependencies
FD1= eid,pno -> hours
FD2=eid -> fname,lname
FD3= pno -> pname,ploc
If we remove eid, then FD3 exists
If we remove pno, then FD2 exists
So this relation contains partial functional dependency. So that it is
not in 2NF. To normalize it into 2 NF, decompose the relation based on
functional dependency. It contains 3 FD’s that will become 3 new
relations.
EMP-PRJT1
eid pno hours
…. …. ….
EMP-PRJT2
eid fname lname
…. …. ….
EMP-PRJT3
pno pname ploc
…. …. ….
7
FD1 FD2
This relation contains 2 functional dependencies
FD1= eid -> fname,lname,esal,emob,dno
FD2= dno -> dname,dloc
Consider the following. There is a functional dependency between
eid ->dno , dno ->dname eid -> dname
also
eid ->dno , dno ->dloc eid -> dloc
These are transitive dependencies. So that this relation is not in 3NF. To
normalize it into 3 NF, decompose the relation based on functional
dependency. The result will be
8
EMP-DEPT 1
eid fname lname esal emob Dno
EMP-DEPT 2
dno dname dloc
It does not contain any transitive dependencies. So these relations are in
3NF
General definitions of 2NF and 3NF
FD1 FD2
This relation contains 2 functional dependencies
FD1= eid -> fname,lname,esal,emob,dno
FD2= dno -> dname,dloc
Consider the following. There is a functional dependency between
eid ->dno , dno ->dname eid -> dname
also
9
EMP-DEPT 2
dno dname dloc
EMP-DEPT 3
eid dno
We can represent it as
< k(i) , p(i) >
K(i) – primary key of ith record
P(i) – pointer to ith record
The total number of entries in the index file is equal to the number of disk
block in the data file.
The first record in each block of data file is called the anchor record or
block anchor
Index is classified as dense or sparse. A dense index has index entry
for every record in the data file. Sparse index has index entry for some
of the records in the data file
11
Clustering index also have 2 fields. The first field is the value of
clustering field and the second field is the pointer to that field.
Pointer is always point to the first record in the data block which the
record contains.
13
Secondary index
Secondary index provides secondary means of accessing the data file if it
has some primary access exists.secondary index is based on secondary key
or alternative key of the record which has unique value in every record
Secondary index file contains two fields. The first field is the seondary key
value and the second foeld is the pointer to that field.
Secondary index has pointer to all records in the file. It is an example of
dense index