Dbms Theory Notes Unit IV
Dbms Theory Notes Unit IV
DBMS
(Database Management System)
CLASS:- BCA III SEM SEC “All”
UNIT - IV
Normalization: the purpose of normalization, how
normalization supports database design, data
redundancy and update anomalies, functional
dependencies, characteristics of functional
dependencies, identifying functional dependencies,
identifying the primary key for a relation
using functional dependencies, the process of
normalization, first normal form (1NF), second
normal form (2NF), third normal form (3NF), general
definitions of 2NF, 3NF, and BCNF.
Normalization
Normalization
Normalization is a
process of organizing
the data in database to
avoid
data redundancy,
insertion anomaly,
update anomaly &
deletion anomaly.
Normalization
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key
attributes are fully functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal
form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form
and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Normalization
Normalization is a process to organize the data into database tables. To make a
good database design, you have to follow Normalization practices. Without
normalization, a database system might be slow, inefficient, and might not produce
the expected result. Normalization reduces data redundancy and inconsistent data
dependency.
Normalization
Advantages of Normalization
Normalization helps to minimize data redundancy.
Disadvantages of Normalization
You cannot start building the database before knowing what the user needs.
The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
It is very time-consuming and difficult to normalize relations of a higher degree.
Data consistency is essential for all teams in the company to stay current. Data
standardization ensures consistency across development, research, and sales teams.
Consistent data also improves workflows between departments and standardizes
their information assets.
Reducing Redundancy
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
3 NF Normalization
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function
dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
3 NF Normalization
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMP_ZIP EMP_STATE EMP_CITY
EMP_ID EMP_NAME EMP_ZIP
201010 UP Noida
222 Harry 201010
02228 US Boston
333 Stephan 02228
444 Lan 60007 60007 US Chicago
In the above table, we have stored student name John twice as he registered for two
different courses and course name English twice as two students registered for it. This
is called data redundancy. Data redundancy causes many problems in databases.
In the Retail Outlets table, the Item_Code and Retail_Outlet_ID are key
attributes. The item description is partially dependent on Item_Code only.
Outlet_Location depends on Retail_Outlet_ID. These are partial dependencies.
To achieve normalization, we need to eliminate these dependencies by decomposing
the relations.
Normalization
Normalization
Advantage 2 NF : 2 NF attempts to reduce the amount of redundant data in a table by
extracting it, placing it in a new table(s), and creating relationships between those
tables.
Limitation: There are still some anomalies, as there might be some indirect
dependencies between Non-Key attributes, leading to redundant data.
Normalization
3 NF: Third Normal Form
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
R is already in 2 NF
There is no transitive dependency that exists between key attributes and non-key
attributes through other non-key attributes.
A transitive dependency exists when another non-key attribute determines a non-key
attribute. In other words, If A determines B and B determines C, then automatically, A
determines C.
Normalization
Normalization
Boyce-Codd Normal Form
It is an upgraded version of the 3rd Normal form. It is also called as 3.5 Normal Form.
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
R is already in 3 NF
For any dependency A –> B, then A should be the Super key.
Converting the tables from higher normal form to lower normal form is called
“Denormalization”.
Normalization
Normalization
Advantages of Normalization
It reduces data redundancy: Normalization assists in removing redundant data
from tables, using less storage space, and increasing database effectiveness.
It improves data consistency: Normalization guarantees that the data stays
organized and consistent, lowering the possibility of data errors and inconsistencies.
It makes database design simple: Normalization offers rules for arranging tables
Functional Dependency
If the information stored in a table can uniquely determine another information in the same table, then it is called
Functional Dependency. Consider it as an association between two attributes of the same relation.
P -> Q
Normalization
<Employee>
EmpID EmpName EmpAge
E01 Amit 28
E02 Rohit 31
EmpName is functionally dependent on EmpID because EmpName can take only one value for the given value of
EmpID:
EmpID -> EmpName
Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on that attribute and
not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P, if it is Functionally Dependent on P
and not on any of the proper subset of P.
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
Normalization
<EmployeeProject>
EmpID ProjectID Days (spent on the project)
E099 001 320
E056 002 190
Multivalued Dependency
When existence of one or more rows in a table implies one or more other rows in the same
table, then the Multi-valued dependencies occur.
The StudentName can be determined by StudentID that makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial Dependent.
Functional Dependency Explanation
Update anomalies
Insertion anomalies
Deletion anomalies
anomalies), then they have to lock more objects thus causing greater
contention and lower throughput
� addressaddress
� cidcname
� cidcname,sname
� cid,sidcname,sname
Count()
In a query, the HAVING clause is placed after the GROUP BY clause and before
FROM tableName
WHERE [conditions]
GROUP BY column1
HAVING [conditons]
ORDER BY column2
Having clause with Example
Example 1:
Consider the following table named "Employees".
EmpNo EName Job Salary DeptNo Age
1 John Clerk 17000 10 25
2 Harry Clerk 35000 20 27
3 David Manager 78020 50 26
4 Smith Engineer 77020 10 35
5 Clarke Salesman 98020 20 32
6 Musk Engineer 14000 50 45
Having clause with Example
Query 1: Let's say we want to display the Job types with total sum of salary greater than or
equal to 90000. Then the query will be:
Select Job, SUM(Salary)
FROM Employees
GROUP BY Job
HAVING
SUM(Salary)>=90000;
The output for the above query is given The output for the above query is given below
below. Job SUM(Salary)
Job SUM(Salary) Manager 78020
Engineer 91020 Engineer 91020
Salesman 98020
Difference between HAVING and WHERE CLAUSE
HAVING WHERE
The WHERE clause is used to filter individual
The HAVING clause is used to filter data from
content from table according to the specified
groups according to the specified condition.
condition.
HAVING clause is used after the groups are made WHERE clause is used before the groups are made
(Post-filter). (Pre-filter).
HAVING clause needs to be executed with the WHERE clause can be executed without the
GROUP BY clause. GROUP BY clause.
In SQL queries, the HAVING clause is used after In SQL queries, the WHERE clause is used before
the GROUP BY clause. the GROUP BY clause.
HAVING clause can only be used with the WHERE clause can be used with SELECT,
SELECT statement for filtering the data. UPLOAD and DELETE statements.
SQL aggregate functions can be used with the SQL aggregate functions can not be used with the
HAVING clause in a query. WHERE clause in a query.
HAVING clause is used in column operations. WHERE clause is used in row operations.
Another Example of Having Clause
In this example we will see how to use functions like COUNT and MAX with the
HAVING clause. Consider the table given below named "Students".
RollNo Name Subject Marks
15 Jack Mathematics 99
20 Henry English 89
23 Mark Physics 90
4 Steve Mathematics 69
17 John Physics 95
36 Mike Chemistry 50
33 Tom English 75
Another Example of Having Clause
Query 1: Let's say we want to show the subjects which are studied by more than one student.
We will use the COUNT function with the HAVING clause in this type of query.
SELECT Subject, COUNT(Subject)
FROM Students
GROUP BY Subject
HAVING COUNT(Subject)>1;
The output of the
above query is given below
Subject COUNT(Subject)
Mathematics 2
Physics 2
English 2
Another Example of Having Clause
Query 2: Let's take another query in which we want to print the subjects in which the
maximum marks obtained is greater than 90. For this we will use MAX function with
the HAVING clause.
SELECT Subject, MAX(Marks)
FROM Students
GROUP BY Subject
HAVING MAX(Marks)>90
The output of the above query
is given below.
Subject MAX(Marks)
Mathematics 99
Physics 95
Thank You & Best wishes to all of you