Unit-3 Normalization in data base
Unit-3 Normalization in data base
Number of Tables
Redundancy
◦ Third Normal Form (3NF)
Complexity
◦ Boyce-Codd Normal Form (BCNF)
◦ Fourth Normal Form (4NF)
◦ Fifth Normal Form (5NF)
◦ Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFororBCNF
BCNFininorder
ordertotoavoid
avoidthe
thedatabase
database
anomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetofofthe
thelower
lowerlevel
level
Different anomalies in
designing a database
Ifa table is not properly normalized
and have data redundancy then it
will not only eat up extra memory
space but will also make it difficult
to handle and update the database,
without facing data loss.
Insertion, Updation and Deletion
Anomalies are very frequent if
database is not normalized.
Student table
rollno name branch hod office_tel
42 abc CO A4
43 pqr IT A3the
44 xyz CO A4
valid functional
dependencies:
roll_no → { name, dept_name,
dept_building },→
◦ Here, roll_no can determine values of fields
name, dept_name and dept_building,
hence a valid Functional dependency
roll_no → dept_name ,
◦ Since, roll_no can determine whole set of
{name, dept_name, dept_building}, it can
determine its subset dept_name also.
dept_name → dept_building ,
◦ Dept_name can identify the dept_building
accurately, since departments with
different dept_name will also have a
different dept_building
invalid functional
dependencies:
name → dept_name
◦ Students with the same name can
have different dept_name, hence this
is not a valid functional dependency.
dept_building → dept_name
◦ There can be multiple departments
in the same building, For example, in
the above table departments ME and
EC are in the same building B2,
◦ hence dept_building → dept_name is
an invalid functional dependency.
Armstrong’s
axioms/properties of
functional dependencies:
Reflexivity: If Y is a subset of X, then X→Y holds by
reflexivity rule
◦ For example, {roll_no, name} → name is valid.
D → F G } ( Using CF → G )
Thus,
CF → G
A+ = { A , B , C , D ,
E,F,G}
A → BC D+ ={D}
BC → DE = { D , F }
D → F ( Using D → F )
CF → G
A → BC { B , C }+= { B , C }
BC → DE = { B , C , D , E }
D → F ( Using BC → DE )
CF → G = { B , C , D , E , F } (
Using D → F )
= { B , C , D , E , F , G
} ( Using CF → G )
Thus,
{ B , C } + = { B , C ,
D,E,F,G}
Given relational schema R( P Q R S T U
V) having following attribute P Q R S T U and V,
also there is a set of functional dependency
denoted by FD = { P->Q, QR->ST, PTV-
>V }. Determine Closure of (QR)+ and (PR)+
QR+ = QR FD QR→ST
=QRST
PR + = PR → P → Q
=PRQ →ST
=PRQST
Given relational schema R( P Q R S T) having
following attributes P Q R S and T, also there is
a set of functional dependency denoted by FD
= { P->QR, RS->T, Q->S, T-> P }. Determine
Closure of ( T )+
T+=T → P → QR → S
=TPQRS
Different kinds of keys
candidate key
A candidate key may be defined
as-
◦ A set of minimal attribute(s) that can
identify each tuple uniquely in the
given relation is called as a
candidate key.
OR
◦ A minimal super key is called as a
candidate key.
Consider the following Student schema-
Student ( roll , name , sex , age , address ,
class , section )
Given below are the examples of candidate keys-
( class , section , roll )
( name , address )
These are candidate keys because each set
consists of minimal attributes required to identify
each student uniquely in the Student table.
Let R = (A, B, C, Determine all
D, E, F) be a essential attributes
of the given
relation scheme
relation.
with the Essential attributes
following of the relation are-
dependencies- C and E.
C → F So, attributes C and
E → A E will definitely be a
part of every
EC → D
candidate key.
A → B
So, we have-
{ CE }+
C →F = { C , E }
E → A = { C , E , F } ( Using C → F )
EC → D = { A , C , E , F } ( Using E →
A → B A)
= { A , C , D , E , F } ( Using
EC → D )
= { A , B , C , D , E , F }
( Using A → B )
We conclude that CE can
determine all the attributes
of the given relation.
So, CE is the only possible
candidate key of the
relation.
Let R = (A, B,
C, D, E) be a
relation
scheme with
the following
dependencies
-
AB → C
C → D
B → E
Determine the
total number
of candidate
keys
Closures of a set of functional
dependencies
A Closure is a set of FDs is a set
of all possible FDs that can be
derived from a given set of FDs.
It is also referred as
a Complete set of FDs.
If F is used to donate the set of
FDs for relation R, then a closure
of a set of FDs implied by F is
denoted by F+.
Find F+
Closure Algorithm
Closure=X
Repeat
{
For each FD u v in F
Such that u C closure
Then set closure = closure U v
}
Until there is no change to closure
Ssn+
a) ssn+
result=ssn
repeat
{
pno →(pname,ploc) (u v)
Pno C ssn
}
result=ssn
repeat
{
ssn →ename (u v)
ssn C ssn
Then result ssn U ename
Result=(ssn,ename)
}
Normalization Rule
101 Akon OS
101 Akon CN
102 Bkon C
1 Java
2 C++
3 Php
Now we have a Student table
with student information and
another table Subject for storing
subject information.
Let's create another table Score,
to store the marks obtained by
students in the respective
subjects.
We will also be saving name of
the teacher who teaches that
subject along with marks.
Score
score_id student_i subject_id marks teacher
d
1 10 1 70 Java
Teacher
2 10 2 75 C++
Teacher
3 11 1 80 Java
Teacher
In the score table we are saving the student_id to
know which student's marks are these
and subject_id to know for which subject the marks
are for.
Together, student_id + subject_id forms
a Candidate Key for this table, which can be
the Primary key
Now if you look at the Score table, we have a
column names teacher which is only dependent on
the subject, for Java it's Java Teacher and for C++ it's
C++ Teacher & so on.
primary key for this table is a composition of two
columns which is student_id & subject_id but the
teacher's name only depends on subject, hence
the subject_id, has nothing to do with student_id.
This is Partial Dependency, where an attribute in a
table depends on only a part of the primary key and
How to remove Partial
Dependency
There can be many different
solutions for this, but out objective
is to remove teacher's name from
Score table.
The simplest solution is to remove
columns teacher from Score table
and add it to the Subject table.
Hence, the Subject table will
become:
The simplest solution is to remove
columns teacher from Score table and
add it to the Subject table. Hence, the
Subject table will become:
id subject_name teacher
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
And our Score table is now in the second
normal form, with no partial dependency
201010 UP Noida
02228 US Boston
60007 US Chicago
Boyce Codd normal form
(BCNF)
BCNF is the advance version of
3NF. It is stricter than 3NF.
◦ A table is in BCNF if every functional
dependency X → Y, X is the super
key of the table.
◦ For BCNF, the table should be in 3NF,
and for every FD, LHS is super key.
EMPLOYEE table
EMP_ID EMP_COU EMP_DEPT DEPT_TYP EMP_DEPT
NTRY E _NO
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
D394 283
D394 300
D283 232
D283 549
Candidate keys:
◦ For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID,
EMP_DEPT}
Functional dependencies:
◦ EMP_ID → EMP_COUNTRY
◦ EMP_DEPT → {DEPT_TYPE, EMP_DEPT_
NO}
Now, this is in BCNF because left side
part of both the functional
dependencies is a key.
Fourth Normal Form (4NF)
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
The given STUDENT table is in 3NF, but
the COURSE and HOBBY are two
independent entity. Hence, there is no
relationship between COURSE and
HOBBY.
In the STUDENT relation, a student with
STU_ID, 21 contains two
courses, Computer and Math and two
hobbies, Dancing and Singing. So
there is a Multi-valued dependency on
STU_ID, which leads to unnecessary
repetition of data.
So to make the above table into 4NF, we
can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
Relational Decomposition
When a relation in the relational model is
not in appropriate normal form then the
decomposition of a relation is required.
In a database, it breaks the table into
multiple tables.
If the relation has no proper
decomposition, then it may lead to
problems like loss of information.
Decomposition is used to eliminate some
of the problems of bad design like
anomalies, inconsistencies, and
redundancy.
Types of Decomposition
Lossless Decomposition
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Employee ⋈
Department
EMP_ID EMP_NA EMP_AG EMP_CI DEPT_ID DEPT_N
ME E TY AME