0% found this document useful (0 votes)
9 views73 pages

Dbms Theory Notes Unit IV

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views73 pages

Dbms Theory Notes Unit IV

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

CODE:-CA3CO09

DBMS
(Database Management System)
CLASS:- BCA III SEM SEC “All”
UNIT - IV
Normalization: the purpose of normalization, how
normalization supports database design, data
redundancy and update anomalies, functional
dependencies, characteristics of functional
dependencies, identifying functional dependencies,
identifying the primary key for a relation
using functional dependencies, the process of
normalization, first normal form (1NF), second
normal form (2NF), third normal form (3NF), general
definitions of 2NF, 3NF, and BCNF.
Normalization
Normalization
Normalization is a
process of organizing
the data in database to
avoid
data redundancy,
insertion anomaly,
update anomaly &
deletion anomaly.
Normalization
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key
attributes are fully functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal
form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form
and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Normalization
Normalization is a process to organize the data into database tables. To make a
good database design, you have to follow Normalization practices. Without
normalization, a database system might be slow, inefficient, and might not produce
the expected result. Normalization reduces data redundancy and inconsistent data
dependency.
Normalization
Advantages of Normalization
 Normalization helps to minimize data redundancy.

 Greater overall database organization.

 Data consistency within the database.

 Much more flexible database design.

 Enforces the concept of relational integrity.

Disadvantages of Normalization
 You cannot start building the database before knowing what the user needs.

 The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
 It is very time-consuming and difficult to normalize relations of a higher degree.

 Careless decomposition may lead to a bad database design, leading to serious


problems.
Normalization
Purpose of normalization
 Improved overall database organization
Database will be structured and organized in a logical way for all departments in the
company. With more organization, duplication and localization errors are
minimized and outdated versions of data can be updated more easily.
 Data Consistency

Data consistency is essential for all teams in the company to stay current. Data
standardization ensures consistency across development, research, and sales teams.
Consistent data also improves workflows between departments and standardizes
their information assets.
 Reducing Redundancy

Redundancy is an often overlooked issue in data storage. Reducing redundancy will


definitely help reduce file size, thus increase data analysis and processing time.
Normalization
Cost Reduction
If file size is reduced, data storage and processors do not need to be as large.
Additionally, increased workflow due to consistency and organization will assure that
all employees can use database information as quickly as possible, saving time for
more tasks.
Increased Security
As standardization requires more accurate location and consistent organization of
data, security increases significantly.
1 NF Normalization
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385,
9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389,
8589830302 Punjab
Normalization
2 NF Normalization
1NF
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

 In the 2NF, relational must be in 1NF.


 In the second normal form, all non-key attributes are fully functional dependent on
the primary key
 Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.
2 NF Normalization
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE TEACHER_DETAIL table:
25 Chemistry 30
TEACHER_ID TEACHER_AGE
25 Biology 30 25 30
47 English 35 47 35
83 Math 38 83 38
83 Computer 38
TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
3 NF Normalization
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function
dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
3 NF Normalization
 Super key in the table above:
 {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
 Candidate key: {EMP_ID}
 Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
 That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMP_ZIP EMP_STATE EMP_CITY
EMP_ID EMP_NAME EMP_ZIP
201010 UP Noida
222 Harry 201010
02228 US Boston
333 Stephan 02228
444 Lan 60007 60007 US Chicago

555 Katharine 06389 06389 UK Norwich


666 John 462007 462007 MP Bhopal
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
Boyce Codd normal form (BCNF)
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
EMP_COUNTRY table: EMP_DEPT_MAPPING table:
EMP_ID EMP_COUNTRY EMP_ID EMP_DEPT
264 India
364 UK D394 283
EMP_DEPT table:
D394 300
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283 D283 232
Testing D394 300
Stores D283 232 D283 549
Developing D283 549
4Normal Form
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.
Example
STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
4th Normal Form
STUDENT_COURSE STUDENT_HOBBY
STU_ID COURSE STU_ID HOBBY
21 Computer 21 Dancing
21 Singing
21 Math
34 Dancing
34 Chemistry 74 Cricket
74 Biology 59 Hockey
59 Physics
Example of Normalization
For example, consider the following tables:
LibraryVisitors (StudentID, Student_Name, Student_Address, InTime, OutTime);
Students(StudentID, Student_Name, Student_Address, Department, RollNo,
CourseRegistered);
Student_Address is stored in both tables. For each student_id, the address must be
the same in those two tables. Both these relations must be considered to retrieve or
update the correct address. The issues mentioned arise due to poorly
designed/structured databases.
We can eliminate data inconsistency in databases by using constraints on the relations.
Data Redundancy is the condition where the same data is stored at different
locations leading to the wastage of storage space.
Normalization
Examples:
Student Id Student Name Course ID Course Name
111 John C08 English
112 Alice C08 English
111 John C02 French

In the above table, we have stored student name John twice as he registered for two
different courses and course name English twice as two students registered for it. This
is called data redundancy. Data redundancy causes many problems in databases.

We can eliminate data redundancy in the databases by the normalization of relations.


Normalization
Functional Dependency
Before diving into normalization, we need to know clearly about functional
dependencies.
An attribute is dependent on another attribute if another attribute
uniquely identifies it.
It is denoted by A –> B, meaning A determines B, and B depends upon
A.
Example: We can find the Student’s name using the Student_ID.
Normalization
What is an Anomaly?
An anomaly is an unexpected side effect of trying to insert, update, or delete a row. Essentially
more data must be provided to accomplish an operation than expected.

Retail_Outlet_ID Outlet_Location Item_Code Description Qty_Available Retail_Unit_Price

R1001 King Street, Britannia


Hyderabad, I1001 Marie Gold 25 1600
540001
R1002 Rajaji Nagar,
Bangalore, I1106 Cookies 58 1289
600341
R1003 MVP Colony,
Visakhapatnam, I1200 Best Rice 22 2000
500021
R1001 King street,
Hyderabad I1309 Dal 20 1500
Normalization
Types of Anomalies
1. Insertion anomalies: These occur when we cannot insert a new tuple into the table due to a lack of data.
What happens if we try to insert(add) the details of a new retail outlet with no items in its stock?
NULL values would be inserted into the item details columns, which is not preferable.
2. Deletion anomalies: They happen when the deletion of some data deletes the other required data also
(Unintended data loss)
What happens if we try to delete the item of item code I1106?
The details of the retail outlet R1002 will also be deleted from the database.
3. Update anomalies: These happen when an update of a single record requires an update in multiple records.
How many rows will be updated if the retail outlet location of R1002 is changed from King Street to Victoria
Street?
2 Rows will be updated
4. Data redundancy: This happens when new items are supplied to a retail outlet.
What details do we need to insert?
Apart from all necessary details, retail_outlet_location will also be inserted, which is redundant.
We have seen insert, delete, update anomalies, and data redundancy in the above-given example. Functional
dependencies may lead to anomalies. To minimize anomalies, there is a need to refine functional dependencies
using normalization.
Normalization
1 NF: First Normal Form
A relation R is said to be in 1 NF
(First Normal) if and only if:
1.All the attributes of R are atomic.
2.It does not contain any multi-
valued attributes.
Normalization
� Advantage 1NF: 1NF allows users to use the database queries effectively as
it removes ambiguity by removing the non-atomic and multi-valued
attributes, which creates major issues in the future while updating and
extracting the data from the database.
� Limitation: Data redundancy still exists even after 1st Normal form, so we
need further normalization.
Normalization
2 NF: Second Normal Form
A relation R is said to be in 2 NF (Second Normal) form if and only if:
R is already in 1 NF
There is no partial dependency in R between non-key attributes and key attributes.

Suppose we have a composite primary or candidate key in our table. Partial


dependency occurs when a part of the primary key (Key attribute) determines the non-
key attribute.

In the Retail Outlets table, the Item_Code and Retail_Outlet_ID are key
attributes. The item description is partially dependent on Item_Code only.
Outlet_Location depends on Retail_Outlet_ID. These are partial dependencies.
To achieve normalization, we need to eliminate these dependencies by decomposing
the relations.
Normalization
Normalization
Advantage 2 NF : 2 NF attempts to reduce the amount of redundant data in a table by
extracting it, placing it in a new table(s), and creating relationships between those
tables.
Limitation: There are still some anomalies, as there might be some indirect
dependencies between Non-Key attributes, leading to redundant data.
Normalization
3 NF: Third Normal Form
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
R is already in 2 NF
There is no transitive dependency that exists between key attributes and non-key
attributes through other non-key attributes.
A transitive dependency exists when another non-key attribute determines a non-key
attribute. In other words, If A determines B and B determines C, then automatically, A
determines C.
Normalization
Normalization
Boyce-Codd Normal Form
It is an upgraded version of the 3rd Normal form. It is also called as 3.5 Normal Form.
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
R is already in 3 NF
For any dependency A –> B, then A should be the Super key.

In simple words, if A –> B, then A cannot be a non-prime Attribute if B is a prime


attribute which means that A non-prime attribute cannot determine a prime
attribute.
There can be some cases in which the Non-Prime attribute will determine the prime
attributes even if the relationship was in the 3rd Normal form. BCNF does not allow
this kind of dependency.
Normalization
Enrollment Table
Student_ID Course_Name Professor
101 JAVA Prof. Java
102 C++ Prof. CPP
101 Python Prof. Python
103 JAVA Prof. Java_2
104 Python Prof. Python_2
In the above relation:
One student can enroll in multiple courses.
Multiple professors can teach one course.
One professor can be assigned only one course.
Normalization
 So the (Student_ID & Course_Name) will form the primary key. These 2 will
compositely determine all other attributes in the relation. In our case, it is only the
professor.
 The Relation is clearly in 1st Normal Form as there are No Multivalued attributes, and all
attributes have atomic values.
 The Relation is in 2nd Normal Form as there are No Partial dependencies.
 Student_Id cannot determine Course_Name as one student can enroll in multiple courses.
 Course_Name cannot determine the professor, as multiple professors may teach the same
course.
 The relation is in 3rd normal form as there are no transitive dependencies.
 If we observe here, the “Professor” attribute, a non-prime attribute, can determine the
Course_Name as each professor teaches only one course. But Course_Name is a prime
attribute, and Professor is not a Super Key. That means a non-prime attribute
determines the prime attribute.
Normalization
Normalization
Guidelines for Using Normalization
 Depending on the business requirements, we can normalize the tables up to

the 2nd normal form or the 3rd normal form.


 Prefer tables in 3 NF in applications with extensive data modifications.

 Prefer tables in 2 NF in applications with extensive data retrieval.

 Reason: retrieving data from multiple tables is a costly operation.

 Converting the tables from higher normal form to lower normal form is called

“Denormalization”.
Normalization
Normalization
Advantages of Normalization
 It reduces data redundancy: Normalization assists in removing redundant data

from tables, using less storage space, and increasing database effectiveness.
 It improves data consistency: Normalization guarantees that the data stays

organized and consistent, lowering the possibility of data errors and inconsistencies.
 It makes database design simple: Normalization offers rules for arranging tables

and data linkages. This facilitates database design and maintenance.


 It handles queries faster: Faster query performance is a result of normalized

tables’ generally easier search and data retrieval capabilities.


 It simplifies database maintenance: By dividing a database’s complexity into

smaller, more manageable tables, normalization makes it simpler to add, change,


and delete data.
Example of Normalization
The First Normal Form – 1NF
 a single cell must not hold more than one value (atomicity)
 there must be a primary key for identification
 no duplicated rows or columns
 each column must have only one value for each row in the table
The Second Normal Form – 2NF
 A table is said to be in 2NF if it meets the following criteria:
 it’s already in 1NF
 has no partial dependency. That is, all non-key attributes are fully dependent on a primary key.
The Third Normal Form – 3NF
 This means a non-prime attribute (an attribute that is not part of the candidate’s key) is dependent on another
non-prime attribute. This is what the third normal form (3NF) eliminates.
 for a table to be in 3NF, it must:be in 2NF
 have no transitive partial dependency.
Example of Normalization
EMPLOYEE_ID NAME JOB_CODE JOB STATE_CODE HOME_STATE
E001 Alice J01 Chef 26 Michigan
E001 Alice J02 Waiter 26 Michigan
E002 Bob J02 Waiter 56 Wyoming
E002 Bob J03 Bartender 56 Wyoming
E003 Alice J01 Chef 56 Wyoming
EXAMPLE OF (2NF) EMPLOYEE_ID NAME STATE_CODE HOME_STATE
EMPLOYEE_ROLES TABLE E001 Alice 26 Michigan
EMPLOYEE_ID JOB_CODE E002 Bob 56 Wyoming
E003 Alice 56 Wyoming
E001 J01
E001 J02
JOBS TABLE
E002 J02
JOB_CODE JOB
E002 J03 J01 Chef
J02 Waiter
E003 J01
J03 Bartender
Example of Normalization
EMPLOYEE_ID NAME JOB_ JOB STATE HOME_STATE Example of Third Normal Form
CODE _CODE (3NF)
E001 Alice J01 Chef 26 Michigan employee_roles Table
EMPLOYEE_ID JOB_CODE
E001 Alice J02 Waiter 26 Michigan
E001 J01
E002 Bob J02 Waiter 56 Wyoming E001 J02
E002 J02
E002 Bob J03 Bartender 56 Wyoming
E002 J03
E003 Alice J01 Chef 56 Wyoming E003 J01

employees Table jobs Table states Table


EMPLOYEE_ID NAME STATE_CODE JOB_CODE JOB STATE_CODE HOME_STATE
E001 Alice 26 J01 Chef 26 Michigan
E002 Bob 56 J02 Waiter 56 Wyoming
E003 Alice 56 J03 Bartender
Normalization
Types of dependencies in DBMS
 Functional Dependency
 Fully-Functional Dependency
 Transitive Dependency
 Multivalued Dependency
 Partial Dependency

Functional Dependency
If the information stored in a table can uniquely determine another information in the same table, then it is called
Functional Dependency. Consider it as an association between two attributes of the same relation.

If P functionally determines Q, then

P -> Q
Normalization
<Employee>
EmpID EmpName EmpAge
E01 Amit 28
E02 Rohit 31
EmpName is functionally dependent on EmpID because EmpName can take only one value for the given value of
EmpID:
EmpID -> EmpName

Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on that attribute and
not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P, if it is Functionally Dependent on P
and not on any of the proper subset of P.
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
Normalization
<EmployeeProject>
EmpID ProjectID Days (spent on the project)
E099 001 320
E056 002 190

The above relations states:


EmpID, ProjectID, ProjectCost -> Days
However, it is not fully functional dependent.
Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the
project by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)
Normalization
Transitive Dependency
When an indirect relationship causes functional dependency it is called Transitive
Dependency.
If P -> Q and Q -> R is true, then P-> R is a transitive dependency.

Multivalued Dependency
When existence of one or more rows in a table implies one or more other rows in the same
table, then the Multi-valued dependencies occur.

If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.


It is represented by double arrow −
->->
For our example:
P->->QQ->->R
Normalization
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a candidate
key.
The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster Exploration

The prime key attributes are StudentID and ProjectNo.


As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally dependent
on part of a candidate key, to be Partial Dependent.

The StudentName can be determined by StudentID that makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial Dependent.
Functional Dependency Explanation

sid sname address cid cname grade


124 Britney USA 206 Database A++
204 Victoria Essex 202 Semantics C
124 Britney USA 201 S/Eng I A+
206 Emma London 206 Database B-
124 Britney USA 202 Semantics B+
Functional Dependency Explanation
 Redundancy is the root of many problems associated with relational schemas
 Redundant storage

 Update anomalies

 Insertion anomalies

 Deletion anomalies

 LOW TRANSACTION THROUGHPUT

 In general, with higher redundancy, if transactions are correct (no

anomalies), then they have to lock more objects thus causing greater
contention and lower throughput

 (Aside: Could having a dummy value, NULL, help?)


Functional Dependency Explanation
� We remove anomalies by replacing the schema
Data(sid,sname,address,cid,cname,grade)
with
Student(sid,sname,address)
Course(cid,cname)
Enrolled(sid,cid,grade)
� We can say that sid determines address
� We’ll write this
sid  address
� This is called a functional dependency (FD)
Functional Dependency Explanation
� We’d expect the following functional dependencies to hold in our Student database
� sid  sname,address
� cid  cname
� sid,cid  grade
� A functional dependency X  Y is simply a pair of sets (of field names)
� Note: the sloppy notation A,B  C,D rather than {A,B}  {C,D}
� Given a relation R=R(A1:1, …, An:n), and X, Y ({A1, …, An}), an instance r of R
satisfies XY, if
� For any two tuples t1, t2 in R, if t1.X=t2.X then t1.Y=t2.Y

� Note: This is a semantic assertion. We can not look at an instance to determine


which FDs hold (although we can tell if the instance does not satisfy an FD!)
Properties of FDs
� Assume that X  Y and Y  Z are known to hold in R. It’s clear that
X  Z holds too.
� We shall say that an FD set F logically implies X  Y, and write F
[X  Y
� e.g. {X  Y, Y  Z} [ X  Z
� The closure of F is the set of all FDs logically implied by F, i.e.
F+ @ {XY | F [ XY}
� The set F+ can be big, even if F is small 
Properties of FDs
� Which of the following are in the closure of our Student FDs?

� addressaddress
� cidcname
� cidcname,sname
� cid,sidcname,sname

� If R=R(A1:1, …, An:n) with FDs F and X{A1, …, An}, then X is a


candidate key for R if
� X  A1, …,An  F+
� For no proper subset YX is
Y  A1, …,An  F+
Properties of FDs Armstrong’s axioms
� Reflexivity: If YX then F \ XY
� (This is called a trivial dependency)
� Example: sname,addressaddress
� Augmentation: If F \ XY then F \ X,WY,W
� Example: As cidcname then cid,sidcname,sid
� Transitivity: If F \ XY and F \ YZ then F \ XZ
� Example: As sid,cidcid and
cidcname, then sid,cidcname
Consequences of Armstrong’s axioms
� Union: If F \ XY and F \ XZ then F \ XY,Z
� Pseudo-transitivity: If F \ XY and F \ W,YZ then F \ X,WZ
� Decomposition: If F \ XY and ZY then F \ XZ

Exercise: Prove that these are consequences of Armstrong’s axioms


Proof of Union Rule
Suppose that F \ XY and F \ XZ.
By augmentation we have
F \ XX,Y
since X U X = X. Also by augmentation
F \ X,YZ,Y
Therefore, by transitivity we have
F \ XZ,Y
Benefits of functional dependency
 Prevent data redundancy. Functional dependency helps ensure the same data
doesn’t exist repetitively across a database or network of databases.
 Maintain the quality and integrity of data. Because the parameters of functional
dependency often create an effective and less redundant system, the quality and
integrity of your data are often higher. Establishing functional dependency often
leads to accurate and reliable data.
 Reduce the risk of error. Keeping records, data and other transactions in a
database with functional dependency often helps to reduce the risk of errors within
documents and datasets by better sorting information and storing it concisely.
 Gain productivity and save costs. With properly configured files, documents and
transactions, you can often retrieve and access data with more productivity, leading
to cost savings within a company. You can rely on having accurate and centralized
information rather than sorting through multiple files or data sets.
Benefits of functional dependency
 Define meanings and constraints of databases. Functional dependency allows you
to set parameters that restrict or control how the data behaves or gets stored and
accessed.
 Identify poor designs. Functional dependency allows you to see where data spreads
across tables or is missing in others. Poor design means updates to data require
many changes across tables and functional dependency often shows you data
inconsistencies.
Some Remaining topics
Aggregate functions in SQL
Aggregate functions in DBMS take multiple rows from the table and return a value
according to the query.
Note:- All the aggregate functions are used in Select statement.
Syntax −
SELECT <FUNCTION NAME> (<PARAMETER>) FROM <TABLE NAME>
Various Aggregate Functions
1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()
Aggregate functions in SQL

Aggregate function with a example:

Count()

Count(*): Returns total number of records .i.e 6.

Count(salary): Return number of Non Null


values over the column salary. i.e 5.

Count(Distinct Salary):Return number of distin


ct Non Null values over the column salary .i.e 4
Aggregate functions in SQL
Sum()
sum(salary): Sum all Non Null values of Column salary
i.e., 310
sum(Distinct salary): Sum of all distinct Non-Null values
i.e., 250.
Avg()
Avg(salary) = Sum(salary) / count(salary) = 310/5
Avg(Distinct salary) = sum(Distinct salary) /
Count(Distinct Salary) = 250/4
Min()
Min(salary): Minimum value in the salary column except
NULL i.e., 40.
Max(salary): Maximum value in the salary i.e., 80.
Some Remaining topics
GROUP BY Modifiers/clause with Example
The GROUP BY clause permits a WITH ROLLUP modifier that causes
summary output to include extra rows that represent higher-level (that is, super-
aggregate) summary operations. ROLLUP thus enables you to answer questions
at multiple levels of analysis with a single query. For example, ROLLUP can be
used to provide support for OLAP (Online Analytical Processing) operations.
Example
mysql> CREATE TABLE sales( year INT, country VARCHAR(20), product
VARCHAR(32), profit INT );
mysql> SELECT year, SUM(profit) AS profit FROM sales GROUP BY year;
mysql> SELECT year, SUM(profit) AS profit FROM sales GROUP BY year
WITH ROLLUP;
Super-Aggregate
mysql> SELECT year, country, product, SUM(profit) AS profit FROM sales GROUP
BY year, country, product;
mysql> SELECT year, country, product, SUM(profit) AS profit FROM sales GROUP
BY year, country, product WITH ROLLUP;
mysql> SELECT year, country, product, SUM(profit) AS profit, GROUPING(year)
AS grp_year, GROUPING(country) AS grp_country, GROUPING(product) AS
grp_product FROM sales GROUP BY year, country, product WITH ROLLUP;
ORDER BY Modifiers/clause with Example
The ORDER BY keyword is used to sort the result-set in ascending or
descending order.
Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
Example
SELECT * FROM Products
ORDER BY Price;
ORDER BY Modifiers/clause with Example
DESC
The ORDER BY keyword sorts the records in ascending order by default. To sort the
records in descending order, use the DESC keyword.
Example
Sort the products from highest to lowest price:

SELECT * FROM Products


ORDER BY Price DESC;

SELECT * FROM Customers


ORDER BY Country ASC, CustomerName DESC;
Having clause with Example
The HAVING clause is generally used along with the GROUP BY clause.
HAVING clause is used to filter the results obtained by the GROUP BY clause based
on some specific conditions.
HAVING clause is quite similar to the WHERE clause as both are used to filter
records in SQL queries. But WHERE clause can not be used with aggregate functions
(Eg: COUNT, MAX, SUM, etc) which is why HAVING clause is needed.
Some important points related to the HAVING clause are mentioned below:
 HAVING clause can only be used with the SELECT clause.

 In a query, the HAVING clause is placed after the GROUP BY clause and before

the ORDER BY clause.


 GROUP BY clause is used to arrange the data into groups.

 HAVING clause is used in the column operation.


Having clause with Example
Syntax

SELECT column1, column2,....,columnN

FROM tableName

WHERE [conditions]

GROUP BY column1

HAVING [conditons]

ORDER BY column2
Having clause with Example
Example 1:
Consider the following table named "Employees".
EmpNo EName Job Salary DeptNo Age
1 John Clerk 17000 10 25
2 Harry Clerk 35000 20 27
3 David Manager 78020 50 26
4 Smith Engineer 77020 10 35
5 Clarke Salesman 98020 20 32
6 Musk Engineer 14000 50 45
Having clause with Example
Query 1: Let's say we want to display the Job types with total sum of salary greater than or
equal to 90000. Then the query will be:
Select Job, SUM(Salary)
FROM Employees
GROUP BY Job
HAVING
SUM(Salary)>=90000;

The output for the


above query is given below
Job SUM(Salary)
Engineer 91020
Salesman 98020
Having clause with Example
Query 2: Let's apply one more condition Query 3: Let's take another example where we want
on the above query which is the age of the to display the Job types with total sum of salary
employee should be greater than or equal greater than or equal to 60000. Here we will use
to 35. ORDER BY clause along with the HAVING clause to
sort the output according to the salary.
SELECT Job, SUM(Salary)
FROM Employees SELECT Job, SUM(Salary)
From Employees
WHERE Age>=35
GROUP BY Job
GROUP BY Job HAVING SUM(Salary)>=60000
HAVING SUM(Salary)>=90000; ORDER BY SUM(Salary);

The output for the above query is given The output for the above query is given below
below. Job SUM(Salary)
Job SUM(Salary) Manager 78020
Engineer 91020 Engineer 91020
Salesman 98020
Difference between HAVING and WHERE CLAUSE
HAVING WHERE
The WHERE clause is used to filter individual
The HAVING clause is used to filter data from
content from table according to the specified
groups according to the specified condition.
condition.
HAVING clause is used after the groups are made WHERE clause is used before the groups are made
(Post-filter). (Pre-filter).
HAVING clause needs to be executed with the WHERE clause can be executed without the
GROUP BY clause. GROUP BY clause.
In SQL queries, the HAVING clause is used after In SQL queries, the WHERE clause is used before
the GROUP BY clause. the GROUP BY clause.
HAVING clause can only be used with the WHERE clause can be used with SELECT,
SELECT statement for filtering the data. UPLOAD and DELETE statements.
SQL aggregate functions can be used with the SQL aggregate functions can not be used with the
HAVING clause in a query. WHERE clause in a query.
HAVING clause is used in column operations. WHERE clause is used in row operations.
Another Example of Having Clause
In this example we will see how to use functions like COUNT and MAX with the
HAVING clause. Consider the table given below named "Students".
RollNo Name Subject Marks
15 Jack Mathematics 99
20 Henry English 89
23 Mark Physics 90
4 Steve Mathematics 69
17 John Physics 95
36 Mike Chemistry 50
33 Tom English 75
Another Example of Having Clause
Query 1: Let's say we want to show the subjects which are studied by more than one student.
We will use the COUNT function with the HAVING clause in this type of query.
SELECT Subject, COUNT(Subject)
FROM Students
GROUP BY Subject
HAVING COUNT(Subject)>1;
The output of the
above query is given below
Subject COUNT(Subject)
Mathematics 2
Physics 2
English 2
Another Example of Having Clause
Query 2: Let's take another query in which we want to print the subjects in which the
maximum marks obtained is greater than 90. For this we will use MAX function with
the HAVING clause.
SELECT Subject, MAX(Marks)
FROM Students
GROUP BY Subject
HAVING MAX(Marks)>90
The output of the above query
is given below.
Subject MAX(Marks)
Mathematics 99
Physics 95
Thank You & Best wishes to all of you

Dr. Abdul Razzak Khan Qureshi


Assistant Professor (CS)
Class Coordinator
Class:-BCA III Sem Sec “A”

You might also like