What Is Normalization
What Is Normalization
It divides larger tables to smaller tables and links them using relationships.
The inventor of the relational model Edgar Codd proposed the theory of
normalization with the introduction of First Normal Form, and he continued
to extend theory with Second and Third Normal Form. Later he joined with
Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between t
attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know
employee name associated with it.
1. Emp_Id → Emp_Name
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependenci
Example:
1. ID → Name,
2. Name → DOB
Table 1
Example 1 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
*To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency. A relation is in 2NF if it
has No Partial Dependency, i.e., no non-prime attribute (attributes which
are not part of any candidate key) is dependent on any proper subset of
any candidate key of the table.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
Partial Dependency – If the proper subset of candidate key determines non-
prime attribute, it is called partial dependency.
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the
one only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime
attribute COURSE_FEE is dependent on a proper subset of the candidate
key, which is a partial dependency and so this relation is not in 2NF.
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in memory.
For instance, if there are 100 students taking C1 course, we dont need to
store its Fee as 1000 for all the 100 records, instead once we can store it in
the second table as the course fee for C1 is 1000.
3NF Example
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
image5
Consider the table 1. Changing the non-key column Full Name may change
Salutation.
Key Points –
1. BCNF is free from redundancy.
2. If a relation is in BCNF, then 3NF is also also satisfied.
3. If all attributes of relation are prime attribute, then the relation is always in 3NF.
4. A relation in a Relational Database is always and at least in 1NF form.
5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
6. If a Relation has only singleton candidate keys( i.e. every candidate key consists of
only 1 attribute), then the Relation is always in 2NF( because no Partial functional
dependency possible).
7. Sometimes going for BCNF form may not preserve functional dependency. In that
case go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
8. There are many more Normal forms that exist after BCNF, like 4NF and more. But
in real world database systems it’s generally not required to go beyond BCNF.
Decomposition is lossy if R1 ⋈ R2 ⊃ R
Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must hold:
be either in R1 or in R2.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is
given.
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be
a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
Summary
Database designing is critical to the successful implementation of a
database management system that meets the data requirements of
an enterprise system.
Normalization helps produce database systems that are cost-effective
and have better security models.
Functional dependencies are a very important component of the
normalize data process
Most database systems are normalized database up to the third
normal forms.
A primary key uniquely identifies are record in a Table and cannot be
null
A foreign key helps connect table and references a primary key