0% found this document useful (0 votes)
3 views20 pages

UNIT IV Dbms

The document discusses schema refinement through normalization, which aims to eliminate data redundancy and anomalies such as insertion, update, and deletion issues in databases. It covers various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), functional dependencies, and the advantages and disadvantages of normalization. Additionally, it explains concepts like candidate keys, super keys, and types of functional dependencies, providing examples to illustrate these principles.

Uploaded by

tmjasper2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

UNIT IV Dbms

The document discusses schema refinement through normalization, which aims to eliminate data redundancy and anomalies such as insertion, update, and deletion issues in databases. It covers various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), functional dependencies, and the advantages and disadvantages of normalization. Additionally, it explains concepts like candidate keys, super keys, and types of functional dependencies, providing examples to illustrate these principles.

Uploaded by

tmjasper2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT IV: Schema Refinement (Normalization):Purpose of Normalization or schema

refinement, concept of functional dependency, normal forms based on functional


dependencyLossless join and dependency preserving decomposition, (1NF, 2NF and 3 NF),
concept of surrogate key, Boyce-Coddnormal form(BCNF), MVD, Fourth normal
form(4NF), Fifth Normal Form (5NF).

The Schema Refinement refers to refine the schema by using some technique. The best technique of
schema refinement is decomposition. Normalization means “split the tables into small tables which
will contain less number of attributes in such a way that table design must not contain any problem
of inserting, deleting, updating anomalies and guarantees no redundancy”.

Normalization or Schema Refinement is a technique of organizing the data in the database. It is a


systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies.

Redundancy: refers to repetition of same data or duplicate copies of same data stored in different
locations.

Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized
databases where all the data is stored in one table which is sometimes called a flat file database.

Anomalies or problems facing without normalization (problems due to redundancy): Anomalies


refers to the problems occurred after poorly planned and unnormalized databases where all the
data is stored in one table which is sometimes called a flat file database. Let us consider such type of
schema.

Here all the data is stored in a single table which causes redundancy of data or say anomalies as SID
and Sname are repeated once for same CID . Let us discuss anomalies one by one. Due to
redundancy of data we may get the following problems, those are

1. Insertion anomalies: It may not be possible to store some information unless some other
information is stored as well.
2. Redundant storage: some information is stored repeatedly

3. Update anomalies: If one copy of redundant data is updated, then inconsistency is created unless
all redundant copies of data are updated.

4. Deletion anomalies: It may not be possible to delete some information without losing some other
information as well.

Problem in updation / updation anomaly – If there is updation in the fee from 5000 to 7000, then
we have to update FEE column in all the rows, else data will become inconsistent.

Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy, otherwise
they do not exist.

Insertion Anomalies: New course is introduced C4, But no student is there who is having C4 subject.

Because of insertion of some data, It is forced to insert some other dummy data.

Deletion Anomaly: Deletion of S3 student cause the deletion of course. Because of deletion of some
data forced to delete some other useful data.
Solutions To Anomalies : Decomposition of Tables – Schema Refinement, shown below.

Purpose of Normalization:

 Minimize the redundancy in data.


 Remove insert, update, and delete anomalies during the database activities.
 Reduce the need to organize the data when it is modified or enhanced.
 Normalization reduces a complex user view to a set of small and sub groups of fields or
relations. This process helps to design a logical data model known as conceptual data model.

Advantages of Normalization:

1. Greater overall database organization will be gained.


2. The amount of unnecessary redundant data reduced.
3. Data integrity is easily maintained within the database.
4. The database & application design processes are much for flexible.
5. Security is easier to maintain or manage.
Disadvantages of Normalization:

1. The disadvantage of normalization is that it produces a lot of tables with a relatively small
number of columns. These columns then have to be joined using their primary/foreign key
relationship.
2. This has two disadvantages. Performance: all the joins required to merge data slow
processing & place additional stress on your hardware. Complex queries: developers have to
code complex queries in order to merge data from different tables.

Concept of Functional Dependency:

Functional Dependencies are fundamental to the process of Normalization i.e., Functional


Dependency plays key role in differentiating good database design from bad database designs. A
functional dependency is a “type of constraint that is a generalization of the notation of the key”.

Functional Dependency describes the relationship between attributes (columns) in a table.


Functional dependency is represented by an arrow sign (→).

In other words, a dependency FD: “X → Y” means that the values of Y are determined by the values
of X. Two tuples sharing the same values of X will necessarily have the same values of Y. An attribute
on left hand side is known as “Determinant”. Here X is a Determinant.

Example [Identifying the FD’s]

Case1: A →B

Here A1 belongs to B1 & B2. So A1 does not have unique value in B. So it is not in FD.

Case1: A →C

Here A1→C1 and A2, A3→C2. So A has unique values in B. So it is in FD.

Note: try to find all the possibilities. i.e., A→D, B→C, B→D, and C→D
Reasoning about functional dependencies:

Armstrong Axioms (Inference Rules ) : The term Armstrong axioms refers to the sound and
complete set of inference rules or axioms, introduced by William W. Armstrong, that is used to test
logical implication of functional dependencies.

Armstrong axioms define the set of rules for reasoning about functional dependencies and also to
infer all the functional dependencies on a relational database.

Various axioms rules or inference rules:

Primary axioms:

Secondary or derived axioms:

Closure of a Set of Attributes:

Attribute closure of an attribute set can be defined as set of attributes which can be functionally
determined from it.

The set of FD’s that is logically implied by F is called the closure of F and written as F+. And it is
defined as “If F is a set FD’s on a relation R, the F+, the closure of F by using the inferences axioms
that are not contained in F+.

Example: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then what is the
Closure of A, B, C, D?

Solution: A+ is

A+→ {A, B, D} i.e., A→B, B→D is exists and C is not FD on A. So it is eliminated.

B+→ { B, D} i.e., B→D is exists and A, C is not FD on A. So it is eliminated.


C+→ {C, B, D} i.e., C→B, B→D is exists and A is not FD on C. So it is eliminated.

Types of functional dependencies:

1. Fully Functional Dependency: A functional dependency is said to be full dependency “if and only if
the determinant of the functional dependency if either candidate key or super key, and the
dependent can be either prime or non-prime attribute”.

(OR)

Let’s take the functional dependency X → Y (i.e., X determines y). Here Y is said to be fully
determinant, if it cannot determine any subset of X.

Example: Consider the following determinant ABC → D i.e., ABC determines D but D is not
determined by any subset of A/ BC/C/B/AB i.e., BC→D, C→D, A→D Functional dependencies are not
exists. So D is Fully Functional Dependent.

2. Partial Functional Dependency: If a non-prime attribute of the relation is getting derived by only a
part of the candidate key, then such dependency is known as Partial Dependency.

(OR)

In a relation having more than one key field, a subset of non key fields may depend on all key fields
but another subset or a particular non-key field may depend on only one of the key fields. Such
dependency is defined as Partial Dependency.

Example: Consider the following determinants AC→P, A→D, D→P. From these determinants P is not
fully FD on AC. Because, If we find A+ (means A’s Closure) A→D, D→P i.e., A→P. But we don’t have
any requirement of C. C attribute is removed completely. So P is Partially Dependent on AC.

Under the following conditions a table cannot have partial F.D

(1) If primary key consists a single attribute

(2) If table consists only two attributes

(3) If all the attributes in the table are part of the primary key

3. Transitive Functional Dependency: If a non-prime attribute of a relation is getting derived by


either another non-prime attribute or the combination of the part of the candidate key along with
non-prime attribute, then such dependency is defined as Transitive dependency. i.e., in a relation,
there may be dependency among non-key fields. Such dependency is called Transitive Functional
Dependency.

Example: X→Y, and Y→Z then we can determine X→Z holds.

Under the following Circumstances, a table cannot have transitive F.D

(1) If table consists only two attributes

(2) If all the attributes in the table are part of the primary key.
4. Trivial Functional Dependency: It is basically related to Reflexive rule. i.e., if X is a set of
attributes, and Y is subset of X then X→Y holds.

Example: ABC→BC is a Trivial Dependency.

5. Multi-Valued Dependency: Consider 3 fields X, Y, and Z in a relation. If for each value of X, there is
a well-defined set of values Y and Well-defined set of values of Z and set of values of Y is
independent of the set values of Z. This dependency is Multi-valued Dependency. i.e., X →Y / Z.

Prime and non-prime attributes: Attributes which are parts of any candidate key of relation are
called as prime attribute, others are non-prime attributes.

Candidate Key: Candidate Key is minimal set of attributes of a relation which can be used to identify
a tuple uniquely.

Consider student table: student(sno, sname,sphone,age) we can take sno as candidate key.

We can have more than 1 candidate key in a table.

Types of candidate keys:

1. simple(having only one attribute)

2. composite(having multiple attributes as candidate key)

Super Key: Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.

 Adding zero or more attributes to candidate key generates super key.


 A candidate key is a super key but vice versa is not true.

Consider student table: student(sno, sname,sphone,age)

we can take sno, (sno, sname) as super key

To identify the additional F.D’s : To check any F.D’s like AB can be determined from F1 or not.
Complete A+ from F1 is A+ includes B also then; AB can be derived as a F.D in F1.

Examples:

1. In a schema with attributes A,B,C,D and E the following set of attributes are given AB, AC,
CDE, BD, EA. Find CDAC determines from the given FDs or not.

2. Check DA can be derived from the following FDs or not ABC, BCAD, DE, CFB.
Identification of key by using closure set as attributes: A key attribute: An attribute that is capable of
identifying all other attributes in a given table.

(i) Primary key: It is an unique value attribute in a table to enforce entity integrity and ti identify
rows in the table uniquely.

(ii) Composite Primary Key: Sometimes single attribute is not sufficient to identify uniquely the rows
in the table so, we combine 2 or more attributes to identify the rows uniquely.

(iii) Candidate keys: Sometimes 2 or more independent attribute or attributes can be used to
identify the rows uniquely Eg :( vech no,veng no,purchase date) Either vehicle no or vehicle engine
no can be used as a key attribute then they are called as candidate keys one of the candidate key can
be elected as primary key.

Example 1: Find candidate keys for the relation R(ABCD) having following FD’s ABCD, CA, DA.
Normal Forms in DBMS

Normalization is the process of minimizing redundancy from a relation or set of relations.


Redundancy in relation may cause insertion, deletion and updation anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in
database tables.

1. First Normal Form – If a relation contain composite or multi-valued attribute, it violates first
normal form or a relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is singled valued
attribute.

Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute


STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
2. Second Normal Form –To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is
dependent on any proper subset of any candidate key of the table.

Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
{Note that, there are many courses having the same course fee. }

Here, COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;

COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;

COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;

Hence,

COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;

But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper
subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation is not in 2NF.

To convert the above relation to 2NF, we need to split the table into two tables such as:

Table 1: STUD_NO, COURSE_NO

Table 2: COURSE_NO, COURSE_FEE

STUD_NO COURSE_NO
1 C1
2 C2
1 C4
4 C3
4 C1
2 C5
COURSE_NO COURSE_FEE
C1 1000
C2 1500
C3 1000
C4 2000
C5 2000
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead
once we can store it in the second table as the course fee for C1 is 1000.

Example 2 – Consider following functional dependencies in relation R (A, B , C, D )

• AB -> C [A and B together determine C]

BC -> D [B and C together determine D]

In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any
proper subset of AB doesn’t determine any non-prime attribute.

3. Third Normal Form –A relation is in third normal form, if there is no transitive dependency for
non-prime attributes as well as it is in second normal form. A relation is in 3NF if at least one of the
following condition holds in every non-trivial function dependency X –> Y

1. X is a super key.

2. Y is a prime attribute (each element of Y is part of some candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive dependency.
Example 1 – In relation STUDENT given in Table,

FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE}

Candidate Key: {STUD_NO}

For this relation in table, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true. So
STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert
it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE, COUNTRY)

Example 2 – Consider relation R(A, B, C, D, E)

A -> BC,

CD -> E,

B -> D,

E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right sides of all
functional dependencies are prime.

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A relation is in
BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.

Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B-
>E}

Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of
relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the relation,
so there will be only 1 candidate key {AC}.

Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this example
and others will be non-prime {B, D, E} in this example

Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.

The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of
candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal
form (B is not a proper subset of candidate key AC).

The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime
attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal
for, either LHS of an FD should be super key or RHS should be prime attribute. So the highest normal
form of relation will be 2nd Normal form.

Example 2 –For example consider relation R(A, B, C)

A -> BC,

B ->A and B both are super keys so above relation is in BCNF

5.Multivalued Dependency (MVD)

MVD or multivalued dependency means that for a single value of attribute ‘a’ multiple values of
attribute ‘b’ exist. We write it as, a --> --> b

It is read as: a is multi-valued dependent on b.

Suppose a person named Geeks is working on 2 projects Microsoft and Oracle and has 2 hobbies
namely Reading and Music. This can be expressed in a tabular format in the following way.
Project and Hobby are multivalued attributes as they have more than one value for a single person
i.e., Geeks

Multi Valued Dependency (MVD):

We can say that multivalued dependency exists if the following conditions are met.

Conditions for MVD :

Any attribute say a multiple define another attribute b; if any legal relation r(R), for all pairs of tuples
t1 and t2 in r, such that, t1[a] = t2[a]

Then there exists t3 and t4 in r such that.

t1[a] = t2[a] = t3[a] = t4[a]

t1[b] = t3[b]; t2[b] = t4[b]

t1 = t4; t2 = t3

Then multivalued (MVD) dependency exists.

To check the MVD in given table, we apply the conditions stated above and we check it with the
values in the given table.
5. Fourth Normal Form-

A relation R is in 4NF if R is in BCNF and does not contain any multi-valued dependencies (MVDs)

Consider a university where a student can enroll in multiple courses, and a student can also
participate in multiple clubs. The two independent multi-valued attributes (Courses and Clubs) lead
to multi-valued dependency.

 Multi-valued Dependency (MVD):

 A student’s courses are independent of their clubs.


 {Student_ID} →→ {Course} (A student can take multiple courses independently)
 {Student_ID} →→ {Club} (A student can join multiple clubs independently)

 Causes redundancy: If a student joins an extra club, all their course records get duplicated.

Convert into 4NF

We decompose the table into two separate tables, one for student-course and another for
student-club.
 Since each independent multi-valued attribute(Course, Club) is now in it’s own table.
 Data is stored efficiently, preventing duplication.
 We can say that our table or relation is in 4NF.

6. Fifth Normal Form or Project-Join Normal Form(PJNF):

A table is in 5NF if it is already in 4NF and it does not have join dependency that cannot be
represented as the natural join of the smaller tables which eliminates redundancy caused by
complex multi-join relationships and prevents the loss of information when decomposing.

Join Dependency (JD)?

A join dependency means that a table can be split into two or more smaller tables and then
joined back without losing any data.

 If a table can be decomposed into smaller tables and reconstructed perfectly using a
natural join, it is said to have a valid join dependency.
 However, if a table cannot be represented by natural joins of smaller tables without losing
information or causing extra tuples (spurious tuples), it violates 5NF.

Example: A company assigns Projects to Vendors, and Vendors supply Parts for those projects.

 The Project, Vendor, and Part are interrelated, but their relationships are independent.
 A Project may require multiple Parts.
 A Vendor may supply multiple Parts.
 A Project may use multiple Vendors.

The problem is If a project stops using a part, but the vendor still supplies it to another project,
deleting a row may cause loss of information.

So we decompose the table to remove redundancy


We can say that the table is in 5NF as the original table can be reconstructed by joining these three
tables and there is no unnecessary redundancy and no loss of data or spurious tuples.

6.1 Dependency preserving decomposition

Let R is decomposed into {R1, R2,....,Rn} with projected FD set {F1,F2,......Fn}. This decomposition is
dependency preserving if F + ={F1 U F2 U.........Fn} + .

Example

Let the relation R{A,B,C,D,E} F:{AB->C, C->D, AB->D} R is decomposed to R1(A,B,C), R2(D,E). Prove
decomposition is dependency preserving.

Solution

F1={AB->C}

F2={C->D}
=> (F1 u F2) = {AB->C, C->D}

AB+ under (F1 U F2) = {A,B,C,D} => AB->D is under (F1 U F2)

F+ = (F1 U F2) +

=> Decomposition is dependency preserving.

Decomposition is not preserving

Now consider another example where decomposition is not preserved.

Let the relation R{A,B,C,D,E,F,G,H,I,J} where F: {AB->C, A->DE, B->F, F->GH. D->IJ}

R is decomposed to R1(A,B,C,D), R2(D,E), R3(B,F), R4(F,G,H) AND R5(D,I,J). Check decomposition is


dependency preserving or not.

Solution

F1={AB->C}

F2={}

F3={B->F}

F4={F->GH}

F5={D->IJ}

=> (F1 U F2 U F3 U F4 U F5) = {AB->C, B->F, F->GH, D->IJ}

A+ under (F1 U F2 U F3 U F4 U F5) = {AB->C, B->F, F->GH, D->IJ}

=>A->DE is not under (F1 U F2 UF3 U F4 U F5)

=>F+ ≠ (F1 U F2 U F3 U F4 U F5) +

=> Decomposition is not dependency preserving.

6.2 Lossless-join decomposition

It is a process in which a relation is decomposed into two or more relations. This property
guarantees that the extra or less tuple generation problem does not occur, and no information is lost
from the original relation during the decomposition. It is also known as non-additive join
decomposition.

When the sub relations combine again then the new relation must be the same as the original
relation was before decomposition.

Consider a relation R if we decomposed it into sub-parts relation R1 and relation R2. The
decomposition is lossless when it satisfies the following statement −

 whenever you are decomposing a table and finding union then R1UR2=R
and whenever you perform join operation R1∩R2 /= ∅
common attribute when decomposing a table should be a candidate key of

R1 or R2 or both.

 It should not have any spurious tuples

So by the end when you achieve same number of tuples as that of an original table after
decomposing them and performing join operation should result in same number of tuples.

Spurious tuples: Spurious tuples are incorrect or extra tuples that appear when joining decomposed
relations improperly. They do not exist in the original relation but are generated due to lossy
decomposition.

Follow the example I taught in class when I explained dependency preserving decomposition.

You might also like