0% found this document useful (0 votes)
3 views124 pages

DBMS Module4 Notes

This document discusses the basics of functional dependencies and normalization in relational databases, focusing on the importance of designing relation schemas that minimize redundancy and anomalies. It outlines informal guidelines for schema design, such as ensuring clear semantics, reducing NULL values, and preventing spurious tuples. Additionally, it explains normalization processes, including definitions and criteria for first, second, and third normal forms, to enhance the quality of database designs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views124 pages

DBMS Module4 Notes

This document discusses the basics of functional dependencies and normalization in relational databases, focusing on the importance of designing relation schemas that minimize redundancy and anomalies. It outlines informal guidelines for schema design, such as ensuring clear semantics, reducing NULL values, and preventing spurious tuples. Additionally, it explains normalization processes, including definitions and criteria for first, second, and third normal forms, to enhance the quality of database designs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 124

Module 4

Basics of Functional Dependencies and Normalization for


Relational Databases

• There are two levels at which we can discuss the goodness of relation schemas.
– logical (or conceptual) level—how users interpret the relation schemas
and the meaning of their attributes.
– implementation (or physical storage) level—how the tuples in a base
relation are stored and updated.
• As with many design problems, database design may be performed using two
approaches:
– Bottom-up or Top-down.

– A bottom-up design methodology (also called design by synthesis)


considers the basic relationships among individual attributes as the
starting point and uses those to construct relation schemas.
• In contrast, a top-down design methodology (also called design by
analysis) starts with a number of groupings of attributes into relations that
exist together naturally, for example, on an invoice, a form, or a report.

• The relations are then analyzed individually and collectively, leading to


further decomposition until all desirable properties are met.

• Relational database design ultimately produces a set of relations.

– The implicit goals of the design activity are information preservation


and minimum redundancy.
Informal Design Guidelines for Relation Schemas

• Four informal guidelines that may be used as measure to determine the


quality of relation schema design:

■ Making sure that the semantics of the attributes is clear in the schema

■ Reducing the redundant information in tuples

■ Reducing the NULL values in tuples

■ Disallowing the possibility of generating spurious tuples


Imparting Clear Semantics to Attributes in Relations

• Whenever we group attributes to form a relation schema, we assume that attributes


belonging to one relation have certain real-world meaning and a proper interpretation
associated with them.
• The semantics of a relation refers to its meaning resulting from the interpretation of
attribute values in a tuple.
• The meaning of the EMPLOYEE relation schema is simple:
– Each tuple represents an employee, with values for the employee’s name (Ename),
Social Security number (Ssn), birth date (Bdate), and address (Address), and the
number of the department that the employee works for (Dnumber). The Dnumber
attribute is a foreign key that represents an implicit relationship between
EMPLOYEE and DEPARTMENT.
• The ease with which the meaning of a relation’s attributes can be explained is an
informal measure of how well the relation is designed.
Guideline 1:

– Design a relation schema so that it is easy to explain its meaning.

– Do not combine attributes from multiple entity types and relationship


types into a single relation.

– Only foreign keys should be used to refer to other entities

– Entity and relationship attributes should be kept apart as much as


possible
• Examples of Violating Guideline 1:
– The relation schemas in Figures 14.3(a) and 14.3(b) also have clear
semantics.

• They violate Guideline 1 by mixing attributes from distinct real-world entities:


entities
EMP_DEPT mixes attributes of employees and departments, and EMP_PROJ mixes
attributes of employees and projects and the WORKS_ON relationship.
Redundant Information in Tuples and Update Anomalies

• One goal of schema design is to minimize the storage space used by the
base relations (and hence the corresponding files).
– Grouping attributes into relation schemas has a significant effect on
storage space.
• For example, compare the space used by the two base relations
EMPLOYEE and DEPARTMENT with that for an EMP_DEPT base
relation, which is the result of applying the NATURAL JOIN operation to
EMPLOYEE and DEPARTMENT.
• Storing natural joins of base relations leads to an additional problem referred to as
update anomalies.
– These can be classified into insertion anomalies, deletion anomalies, and
modification anomalies.
• Consider the relation:
– EMP_PROJ( Emp#, Proj#, Ename, Pname, No_hours)
• Insertion Anomalies:
– Suppose you want to add new information in any relation because of some
constraints.
– Ex:
• Cannot insert a project unless an employee is assigned to it.
• Conversely
– Cannot insert an employee unless an he/she is assigned to a project.
• It is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation.
• The only way to do this is to place NULL values in the attributes for employee.
– This violates the entity integrity for EMP_DEPT because its primary key Ssn
cannot be null.

• This problem does not occur in the design of Figure 14.2 because a department is
entered in the DEPARTMENT relation whether or not any employees work for it,
and whenever an employee is assigned to that department, a corresponding tuple
is inserted in EMPLOYEE.
Deletion Anomalies:

– If we delete from EMP_DEPT an employee tuple that happens to


represent the last employee working for a particular department, the
information concerning that department is lost inadvertently from
the database.

– This problem does not occur in the database of Figure 14.2 because
DEPARTMENT tuples are stored separately.

Modification Anomalies:

– In EMP_DEPT, if we change the value of one of the attributes of a


particular department—say, the manager of department 5—we must
update the tuples of all employees who work in that department;
department
otherwise, the database will become inconsistent.
Guideline 2:
– Design the base relation schemas so that no insertion, deletion, or
modification anomalies are present in the relations.
– If any anomalies are present, note them clearly and make sure that the
programs that update the database will operate correctly.
NULL Values in Tuples

• In some schema designs we may group many attributes together into a “fat”
relation.
• If many of the attributes do not apply to all tuples in the relation, we end up
with many NULLs in those tuples.
tuples
• This can waste space at the storage level and may also lead to problems with
understanding the meaning of the attributes and with specifying JOIN
operations at the logical level.
• Another problem with NULLs is how to account for them when aggregate
operations such as COUNT or SUM are applied.
• SELECT and JOIN operations involve comparisons; if NULL values are
present, results will be unpredictable.
• NULLs can have multiple interpretations, such as attribute does not apply to
this tuple, attribute value for this tuple is unknown, value is known but absent;
Guideline 3:
• As far as possible, avoid placing attributes in a base relation whose
values may frequently be NULL.
• If NULLs are unavoidable, make sure that they apply in exceptional cases
only and do not apply to a majority of tuples in the relation.
Generation of Spurious Tuples
• Bad designs for a relational database may result in erroneous results for certain JOIN
operations
• A spurious tuple is, basically, a record in a database that gets created when two tables
are joined badly.
– spurious tuples are created when two tables are joined on attributes that are
neither primary keys nor foreign keys
• The "lossless join" property is used to guarantee meaningful results for join
operations .
Guideline 4:
• Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs in a way that
guarantees that no spurious tuples are generated.
• Avoid relations that contain matching attributes that are not (foreign key, primary
key) combinations because joining on such attributes may produce spurious tuples.
Functional Dependencies

• A functional dependency is a property of the semantics or meaning of the


attributes.

Definition of Functional Dependency:


– A functional dependency is a constraint between two sets of attributes from
the database.
– Suppose that our relational database schema R has n attributes A1, A2, … , An;
– A functional dependency, denoted by X → Y, between two sets of attributes
X and Y that are subsets of R specifies a constraint on the possible tuples that
can form a relation state r of R.
– The constraint is that, for any two tuples t1 and t2 in r that have t1[X] =t2[X],
they must also have t1[Y] = t2[Y].
– This means that the values of the Y component of a tuple in r depend
on, or are determined by, the values of the X component;
component alternatively,
the values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component
• We say that there is a functional dependency from X to Y,
Y or that Y is
functionally dependent on X.
• The abbreviation for functional dependency is FD or f.d.

• The set of attributes X is called the left-hand side of the FD, and Y is called
the right-hand side.
• For example, {State, Driver_license_number} → EMP_ID
– Should normally hold for any adult in the United States and hence
should hold whenever these attributes appear in a relation.
• Consider the following relation schema:

• From the semantics of the attributes and the relation, we know that the
following functional dependencies should hold:
• These functional dependencies specify that:

– (a) the value of an employee’s Social Security number (Ssn) uniquely


determines the employee name (Ename),
– (b) the value of a project’s number (Pnumber) uniquely determines the
project name (Pname) and location (Plocation), and
– (c) a combination of Ssn and Pnumber values uniquely determines the
number of hours the employee currently works on the project per week
(Hours).
Normal Forms Based on Primary Keys

• Having introduced functional dependencies, we are now ready to use them to


specify how to use them to develop a formal methodology for testing and
improving relation schemas.
Normalization of Relations:
• Normalization of data can be considered a process of analyzing the given
relation schemas based on their FDs and primary keys to achieve the
desirable properties of (1) minimizing redundancy and (2) minimizing the
insertion, deletion, and update anomalies.
• It can be considered as a “filtering” or “purification” process to make the
design have successively better quality.
– An unsatisfactory relation schema that does not meet the condition for a
normal form tests are decomposed into smaller relation schemas that
satisfy tests and meet desirable properties.
Normalization procedure provides database designers with the following:
• A formal framework for analyzing relation schemas based on their keys and
on the functional dependencies among their attributes.
• A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any desired
degree

Definition:
• The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.
First Normal Form

• First normal form (1NF) is now considered to be part of the formal


definition of a relation in the basic (flat) relational model.
• It disallows multi-valued attributes,
attributes composite attributes,
attributes and their
combinations.
combinations
• It states that the domain of an attribute must include only atomic (simple,
indivisible) values and that the value of any attribute in a tuple must be a
single value from the domain of that attribute.
• Hence, 1NF disallows having a set of values, a tuple of values, or a
combination of both as an attribute value for a single tuple.
• In other words, 1NF disallows relations within relations or relations as
attribute values within tuples.
• The only attribute values permitted by 1NF are single atomic (or
indivisible) values.
Second Normal Form

• Second normal form (2NF) is based on the concept of full functional


dependency.
• A functional dependency X → Y is a full functional dependency if
removal of any attribute A from X means that the dependency does not
hold anymore; that is, for any attribute A ε X, (X − {A}) does not
functionally determine Y.
• A functional dependency X → Y is a partial dependency if some attribute
A ε X can be removed from X and the dependency still holds; that is, for
some A ε X, (X − {A}) → Y.
• In {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours
nor Pnumber → Hours holds).
• However, the dependency {Ssn, Pnumber} → Ename is partial because
Ssn → Ename holds
• Definition: A relation schema R is in 2NF if every nonprime attribute A
in R is fully functionally dependent on the primary key of R.
Third Normal Form

• A database is in third normal form if it satisfies the following conditions:


– It is in second normal form
– There is no transitive functional dependency
• By transitive functional dependency, we mean we have the following
relationships in the table:
– A is functionally dependent on B, and B is functionally dependent on
C. In this case, C is transitively dependent on A via B.

• Definition: According to Codd’s original definition, a relation schema R is


in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively
dependent on the primary key.
Vendor(ID, Name, Account_No, Bank_Code_No, Bank)

The attribute ID is the identification key. All attributes are single valued (1NF). The
table is also in 2NF.

The following dependencies exist:

1. Name, Account_No, Bank_Code_No are functionally dependent on ID


(ID --> Name, Account_No, Bank_Code_No)

2. Bank is functionally dependent on Bank_Code_No


(Bank_Code_No --> Bank)
• In the above table, [Book ID] determines [Genre ID], and [Genre ID]
determines [Genre Type]. Therefore, [Book ID] determines [Genre
Type] via [Genre ID] and we have transitive functional dependency,
dependency
and this structure does not satisfy third normal form.
• To bring this table to third normal form, we split the table into two
as follows:

• Now all non-key attributes are fully functional dependent only on


the primary key. In [TABLE_BOOK], both [Genre ID] and [Price] are
only dependent on [Book ID].
• In [TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].
• The test for 2NF involves testing for functional dependencies whose
left-hand side attributes are part of the primary key.
• If the primary key contains a single attribute, the test need not be applied
at all.
• Consider the relation schema LOTS shown in Figure 14.12(a), which
describes parcels of land for sale in various counties of a state. Suppose
that there are two candidate keys: Property_id# and {County_name,
Lot#}; that is, lot numbers are unique only within each county, but
Property_id#numbers are unique across counties for the entire state.

(a) The LOTS relation with its functional dependencies FD1 through FD4.
• Suppose that the following two additional functional dependencies hold in
LOTS:

– dependency FD3 says that the tax rate is fixed for a given county (does not vary
lot by lot within the same county),
– whereas FD4 says that the price of a lot is determined by its area regardless of
which county it is in.

• The LOTS relation schema violates the general definition of 2NF because Tax_rate
is partially dependent on the candidate key {County_name, Lot#}, due to FD3.
• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and
LOTS2, shown in Figure 14.12(b).
(b) Decomposing into the 2NF relations LOTS1 and LOTS2.

(c) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.
General Definition of Third Normal Form:
• A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional
dependency X → A holds in R, either

(a) X is a superkey of R, or

(b) A is a prime attribute of R. (each element of A is part of some candidate key)


• According to this definition, LOTS2 (Figure 14.12(b)) is in 3NF.
• FD4 in LOTS1 violates 3NF because
– Area is not a superkey and Price is not a prime attribute in LOTS1.
• To normalize LOTS1 into 3NF, decompose it into the relation schemas LOTS1A and
LOTS1B shown in Figure 14.12(c).
– Both LOTS1A and LOTS1B are in 3NF.
(d) Progressive normalization of LOTS into a 3NF design.
Assignment
convert the following invoice table to 1nf, 2nf & 3nf
1NF
2NF
3NF
• Identify candidate key in the below relation . Decompose the table into
1NF, 2NF, 3NF.
• STUD_NO -> STUD_STATE
• STUD_STATE -> STUD_COUNTRY are true.
– So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the
third normal form.
• To convert it in third normal form, we will decompose the relation

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,


STUD_COUNTRY_STUD_AGE) as

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,


STUD_AGE)

STATE_COUNTRY (STATE, COUNTRY)


Boyce-Codd Normal Form

• Boyce-Codd normal form (BCNF) was proposed as a simpler form of


3NF, but it was found to be stricter than 3NF.
– That is, every relation in BCNF is also in 3NF; however, a relation in
3NF is not necessarily in BCNF.
– A table complies with BCNF if it is in 3NF and for every
functional dependency X->Y, X should be the super key of the table.
• A relation R(X) is in Boyce–Codd Normal Form if for every non-trivial
functional dependency Y → Z defined on it, Y contains a key K of R(X).
– That is, Y is a superkey for R(X). „

• Example:

– Person1(SI#, Name, Address)

• The only FD is SI# → Name, Address

– Since SI# is a key,

» Person1 is in BCNF
• In the table above:

– One student can enroll for multiple subjects. For example, student with student_id
101, has opted for subjects - Java & C++.

– For each subject, a professor is assigned to the student.

– And, there can be multiple professors teaching one subject like we have for Java.

• What should be the Primary Key?

– In the table above student_id, subject together form the primary key, key because
using student_id and subject, we can find all the columns of the table.

– One more important point to note here is, one professor teaches only one subject,
subject
but one subject may have two different professors.
professors
– Hence, there is a dependency between subject and professor here, where subject
depends on the professor name
Multivalued Dependency
and Fourth Normal Form

• Fourth Normal Form comes into picture when Multi-valued


Dependency occur in any relation.
relation
• For a table to satisfy the Fourth Normal Form, it should satisfy the following
two conditions:
• It should be in the Boyce-Codd Normal Form.

• And, the table should not have any Multi-valued Dependency.


What is Multi-valued Dependency?

A table is said to have multi-valued dependency, if the following conditions are true:
• For a dependency A → B, if for a single value of A, multiple value of B
exists, then the table may have multi-valued dependency.
• Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
• And, for a relation R(A,B,C), if there is a multi-valued dependency between,
A and B, then B and C should be independent of each other.
– If all these conditions are true for any relation(table), it is said to have multi-
valued dependency.
dependency
• For example: Consider a bike manufacture company, which produces two colors (Black and
Red) in each model every year

• Here columns manuf_year and color are independent of each other and dependent on
bike_model.
• In this case these two columns are said to be multivalued dependent on bike_model.
These dependencies can be represented like this:
» bike_model ->-> manuf_year
» bike_model ->-> color
4NF (Contd..)

• Below we have a college enrolment table with


columns s_id, course and hobby

• As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
• What is the problem ???
• Two records for student with s_id 1, will give rise to two more records, as shown
below, because for one student, two hobbies exists, hence along with both the
courses, these hobbies should be specified.

• And, in the table above, there is no relationship between the


columns course and hobby. They are independent of each other.
– So there is multi-value dependency, which leads to un-necessary repetition of
data and other anomalies as well.
How to satisfy 4th Normal Form?
• To make the above relation satisfy the 4th normal form, we can decompose
the table into 2 tables.
• A table can also have functional dependency along with multi-valued
dependency.
– In that case, the functionally dependent columns are moved in a
separate table and the multi-valued dependent columns are moved
to separate tables.
– If you design your database carefully, you can easily avoid these issues.
Join Dependency And Fifth Normal Form

• Fifth normal form is satisfied when all tables are broken into as many tables as
possible in order to avoid redundancy.
• Also known as project join NF.
• Once it is in fifth normal form it cannot be broken into smaller relations
without changing the facts or the meaning.
• A relation decomposed into two relations must have loss-less join Property,
which ensures that no spurious or extra tuples are generated, when relations
are reunited through a natural join.
• Whenever a non-trivial join dependency *{R1 , R2 , …, Rn } holds in R,
implies every Ri (all the attributes of Ri ) is a superkey for R.
Example:
• Let us consider the relation STOCK(Agent, Company, Product)

• We assume that:

1. Agents represent companies.

2. Companies make products.

3. Agents sell products

4. If an agent sells a product and he represents the company making that


product, then he sells that product for that company.
• Note: It is an all key relation. There is no FD or MVD in the relation.
• The table is in 4NF because it contains no multi-valued dependency.
• Suppose that the table is decomposed into its three relations, P1, P2 and
P3
• But if we perform natural join between the above three relations then no
spurious (extra) rows are added so this decomposition is called lossless
decomposition.
• So above three tables P1, P2 and P3 are in 5 NF
Summary of keys
• A superkey of a relation schema R = {A1, A2, ... , An} is a set of
attributes S € R with the property that no two tuples ‘t1’ and ‘t2’ in any
legal relation state ‘r’ of ‘R’ will have t1[S] = t2[S].

• A key ‘K’ is a superkey with the additional property that removal of any
attribute from ‘K’ will cause ‘K’ not to be a superkey any more.

• The difference between a key and a superkey is that a key has to be


minimal; that is, if we have a key K = {A1, A2, ... , Ak} of ‘R’, then K -
{Ai} is not a key of ‘R’ for any Ai where 1 ≤ i ≤ k.
• In the following figure, {SSN} is a key for EMPLOYEE, whereas {SSN},
{SSN, ENAME}, {SSN, ENAME, BDATE}, and any set of attributes that
includes SSN are all superkeys.
superkeys
• If a relation schema has more than one key,
key each is called a candidate key.
One of the candidate keys is arbitrarily designated to be the primary key,
and the others are called secondary keys.
keys Each relation schema must have a
primary key.
– {SSN} is the only candidate key for EMPLOYEE, so it is also the
primary key.
• An attribute of relation schema R is called a prime attribute of R if it is a
member of some candidate key of R. R An attribute is called nonprime if it
is not a prime attribute-that
attribute is, if it is not a member of any candidate key.
– In the following figure, both SSN and PNUMBER are prime attributes
of WORKS_ON,
WORKS_ON whereas other attributes of WORKS_ON are
nonprime.
Summary of Normal Forms
Module 4

15.1.1 Inference Rules for Functional Dependencies


Inference Rules for Functional Dependencies

• We denote by F the set of functional dependencies that are specified on


relation schema R.
– Numerous other functional dependencies hold in all legal relation
instances among sets of attributes that can be derived from and satisfy
the dependencies in F.
– Those other dependencies can be inferred or deduced from the FDs in
F.
• We call them as inferred or implied functional dependencies.

Definition: An FD X → Y is inferred from or implied by a set of


dependencies F specified on R if X → Y holds in every legal relation state r
of R; that is, whenever r satisfies all the dependencies in F,
F X → Y also holds in
r.
• In real life, it is impossible to specify all possible functional dependencies for
a given situation.
– For example, if each department has one manager, so that Dept_no
uniquely determines Mgr_ssn (Dept_no → Mgr_ssn), and a manager has
a unique phone number called Mgr_phone (Mgr_ssn → Mgr_phone),
then these two dependencies together imply that Dept_no → Mgr_phone.
• This is an inferred or implied FD and need not be explicitly stated in
addition to the two given FDs.
– Therefore, it is useful to define a concept called closure formally that
includes all possible dependencies that can be inferred from the given
set F.

Definition: Formally, the set of all dependencies that include F as well as


all dependencies that can be inferred from F is called the closure of F; it
is denoted by F+.
• There are three steps to calculate closure of functional dependency:

– Step-1 : Add the attributes which are present on Left Hand Side in the
original functional dependency.
– Step-2 : Now, add the attributes present on the Right Hand Side of the
functional dependency.
– Step-3 : With the help of attributes present on Right Hand Side, check
the other attributes that can be derived from the other given functional
dependencies. Repeat this process until all the possible attributes which
can be derived are added in the closure.
Example
• Consider a relation R(A,B,C,D,E) having below mentioned functional
dependencies.
– FD1 : A BC
– FD2 : C B
– FD3 : D  E
– FD4 : E D
• F = {Ssn → {Ename, Bdate, Address, Dnumber}, Dnumber → {Dname, Dmgr_ssn} }
• Some of the additional functional dependencies that we can infer from F are the following:
– Ssn → {Dname, Dmgr_ssn}

– Ssn → Ssn

– Dnumber → Dname

• The closure F+ of F is the set of all functional dependencies that can be inferred from F.
• Set of inference rules can be used to infer new dependencies from a given set of
dependencies.
• We use the notation F |=X → Y to denote that the functional dependency X → Y is
inferred from the set of functional dependencies F.
• They were proposed first by Armstrong (1974) and hence are known as Armstrong’s axioms.
• Reflexive Rule (IR1)
– In the reflexive rule, if Y is a subset of X, then X determines Y.
• If X ⊇ Y then X → Y
• Example:
– X = {a, b, c, d, e}
– Y = {a, b, c}
• Augmentation Rule (IR2)
– The augmentation is also called as a partial dependency.
– In augmentation, if X determines Y, then XZ determines YZ for any Z
• If X → Y then XZ → YZ
• Example:
– For R(ABCD), if A → B then AC → BC
• Transitive Rule (IR3)
– In the transitive rule, if X determines Y and Y determine Z, then X
must also determine Z.
• If X → Y and Y → Z then X → Z
• Union or additive Rule (IR4)
– Union rule says, if X determines Y and X determines Z, then X must
also determine Y and Z
• If X → Y and X → Z then X → YZ
– Proof:
• Decomposition or project Rule (IR5)

– Decomposition rule is also known as project rule. It is the reverse of


union rule.
– This Rule says, if X determines Y and Z, then X determines Y and X
determines Z separately.
• If X → YZ then X → Y and X → Z
• Pseudo transitive Rule (IR6)

– In Pseudo transitive Rule, if X determines Y and WY determines Z,


then WX determines Z.
• If X → Y and WY → Z then WX → Z
• One important cautionary note regarding the use of these rules:

– Although X → A and X → B implies X → AB by the union rule stated


above, X → A and Y → B does imply that XY → AB.
– Also, XY → A does not necessarily imply either X → A or Y → A.
• A systematic way to determine these additional functional dependencies is
first to determine each set of attributes X that appears as a left-hand side
of some functional dependency in F and then to determine the set of all
attributes that are dependent on X.

Definition:
For each such set of attributes X, we determine the set X+ of attributes that
are functionally determined by X based on F; X+ is called the closure of X
under F.
Determining X+, the Closure of X under F
• Algorithm 15.1 starts by setting X+ to all the attributes in X. By IR1, we
know that all these attributes are functionally dependent on X.
X

• Using inference rules IR3 and IR4, we add attributes to X+, using each
functional dependency in F.

• We keep going through all the dependencies in F (the repeat loop) until no
more attributes are added to X+ during a complete cycle (of the for loop)
through the dependencies in F.
• For example, consider the following relation schema about classes held at a university in a
given academic year.

CLASS ( Classid, Course#, Instr_name, Credit_hrs, Text, Publisher, Classroom,


Capacity)
• Let F, the set of functional dependencies for the above relation include the following f.d.s:
– FD1: Classid → Course#, Instr_name, Credit_hrs, Text, Publisher, Classroom, Capacity;
– FD2: Course# → Credit_hrs;
– FD3: {Course#, Instr_name} → Text, Classroom;
– FD4: Text → Publisher
– FD5: Classroom → Capacity
• Using the inference rules about the FDs and applying the definition of
closure,
closure we can define the following closures:
– { Classid } + = { Classid , Course#, Instr_name, Credit_hrs, Text,
Publisher, Classroom, Capacity } = CLASS
– { Course#} + = { Course#, Credit_hrs}

– {Course#, Instr_name } + = { Course#, Credit_hrs, Text, Publisher,


Classroom, Capacity }
– {Text}+ ={Text,Publisher}

– {Classroom}+ ={Classroom,capacity}
consider the relation schema EMP_PROJ. From the semantics of the
attributes, we specify

•For example, the following set F of functional dependencies that should hold
on EMP_PROJ;

• SSN → ENAME

• PNUMBER → {PNAME, PLOCATION}

• {SSN, PNUMBER} → HOURS

• Using Algorithm 15.1, we calculate the following closure sets with respect
to F;
Equivalence of Sets of Functional Dependencies

Definition: A set of functional dependencies F is said to cover another set of


functional dependencies E if every FD in E is also in F+; that is, if every
dependency in E can be inferred from F;
F alternatively, we can say that E is covered
by F.

Definition: Two sets of functional dependencies E and F are equivalent if E+ = F+.


Therefore, equivalence means that every FD in E can be inferred
from F, and every FD in F can be inferred from E; that is, E is equivalent to F if both the
conditions—E covers F and F covers E—hold.
hold

We can determine whether F covers E by calculating X+ with respect to F for each FD X → Y


in E, and then checking whether this X+ includes the attributes in Y. If this is the case for
every FD in E, then F covers E.
Let us take an example to show the relationship between two FD sets. A relation
R(A,B,C,D) having two FD sets FD1 = {A->B, B->C, AB->D} and FD2 = {A->B, B->C,
A->C, A->D}

Step 1. Checking whether all FDs of FD1 are present in FD2


• A->B in set FD1 is present in set FD2.
• B->C in set FD1 is also present in set FD2.
• AB->D in present in set FD1 but not directly in FD2 but we will
check whether we can derive it or not.
not
– For set FD2, (AB)+ = {A,B,C,D}. It means that AB can
functionally determine A, B, C and D. So AB->D will also
hold in set FD2.
• As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
Step 2. Checking whether all FDs of FD2 are present in FD1
• A->B in set FD2 is present in set FD1.
• B->C in set FD2 is also present in set FD1.
• A->C is present in FD2 but not directly in FD1 but we will check
whether we can derive it or not.
– For set FD1, (A)+ = {A,B,C,D}. It means that A can functionally
determine A, B, C and D. SO A->C will also hold in set FD1.
• A->D is present in FD2 but not directly in FD1 but we will check
whether we can derive it or not.
– For set FD1, (A)+ = {A,B,C,D}. It means that A can functionally
determine A, B, C and D. SO A->D will also hold in set FD1.
– As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is true

Step 3. As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These
two FD sets are semantically equivalent.
Find whether the following FD’s are equivalent

1. F = {A → C, AC → D, E → AD, E → H} and G = {A → CD, E → AH}

2. A relation R2(A,B,C,D) having two FD sets FD1 = {A->B, B->C,A->C}


and FD2 = {A->B, B->C, A->D}
Minimal Sets of Functional Dependencies

• Minimal cover of a set of functional dependencies E is a set of


functional dependencies F that satisfies the property that every
dependency in E is in the closure F+ of F.
– In addition, this property is lost if any dependency from the set F is
removed;
– F must have no redundancies in it, and the dependencies in F are in a
standard form.
• The concept of an extraneous attribute in a functional dependency is used
for defining the minimum cover.

Definition: An attribute in a functional dependency is considered an extraneous attribute


if we can remove it without changing the closure of the set of dependencies. Formally,
given F, the set of functional dependencies, and a functional dependency X → A in F,
attribute Y is extraneous in X if Y ⊂ X, and F logically implies (F − (X → A) ∪ { (X −
Y) → A } ).
• We can formally define a set of functional dependencies F to be minimal
if it satisfies the following conditions:

1. Every dependency in F has a single attribute for its right-hand


side.

2. We cannot replace any dependency X → A in F with a dependency


Y → A, where Y is a proper subset of X, and still have a set of
dependencies that is equivalent to F.

3. We cannot remove any dependency from F and still have a set of


dependencies that is equivalent to F.
Definition:
A canonical cover of a set of functional dependencies F is a simplified set
of functional dependencies that has the same closure as the original set F.
F
• Canonical cover: A canonical cover Fc of a set of functional dependencies
F such that ALL the following properties are satisfied:
Algorithm 15.2. Finding a Minimal Cover F for a Set of
Functional Dependencies E
Example 1: Let the given set of FDs be E: {B → A, D → A, AB → D}. We have to
find the minimal cover of E.

• All above dependencies are in canonical form (that is, they have only one attribute on
the right-hand side), so we have completed step 1 of Algorithm 15.2 and can proceed to
step 2.
• In step 2 we need to determine if AB → D has any redundant (extraneous) attribute on
the left-hand side; that is, can it be replaced by B → D or A → D?
– Since B → A, by augmenting with B on both sides (IR2), we have BB → AB, or B
→ AB (i). However, AB → D as given (ii).
– Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus AB → D
may be replaced by B → D.
• We now have a set equivalent to original E, say E′: {B → A, D → A, B → D}.
• No further reduction is possible in step 2 since all FDs have a single attribute on the
left-hand side.
• In step 3 we look for a redundant FD in E′. By using the transitive rule on B → D and
D → A, we derive B → A. Hence B → A is redundant in E′ and can be eliminated.
• Therefore, the minimal cover of E is F: {B → D, D → A}.
Example 2: Let the given set of FDs be G: {A → BCDE, CD → E}.

• Here, the given FDs are NOT in the canonical form. So we first convert
them into:
• E: {A → B, A→ C, A→ D, A→ E, CD → E}.
• In step 2 of the algorithm, for CD → E, neither C nor D is extraneous on
the left-hand side, since we cannot show that C → E or D → E from the
given FDs. Hence we cannot replace it with either.
• In step 3, we want to see if any FD is redundant. Since A→ CD and CD →
E, by transitive rule (IR3), we get A→ E. Thus, A→ E is redundant in G.
• So we are left with the set F, equivalent to the original set G as: {A → B,
A→ C, A→ D, CD → E}. F is the minimum cover.

Using the union rule (IR5) the minimum cover is:


Minimum cover of G, F: {A → BCD, CD → E}.
Find minimal cover

• F = {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}
• F={BC->ADEF,F->DE}
Algorithm 15.2(a). Finding a Key K for R Given a Set F of Functional
Dependencies

Input: A relation R and a set of functional dependencies F on the attributes of R.


1. Set K := R.
2. For each attribute A in K
• {compute (K − A)+ with respect to F;
– if (K − A)+ contains all the attributes in R, then set K := K − {A} };

• we start by setting K to all the attributes of R; we can say that R


itself is always a default superkey.
• We then remove one attribute at a time and check whether the
remaining attributes still form a superkey.
• Algorithm 15.2(a) determines only one key out of the possible
candidate keys for R; the key returned depends on the order in which
attributes are removed from R in step 2.
Properties of Relational Decompositions
• Universal Relation Schema:
– A relation schema R = {A1, A2, …, An} that includes all the attributes of the database.
• Universal relation assumption:
– Every attribute name is unique.
• Decomposition:
– The process of decomposing the universal relation schema R into a set of relation
schemas D = {R1,R2, …, Rm} that will become the relational database schema by
using the functional dependencies
• Attribute preservation condition:
– Each attribute in R will appear in at least one relation schema Ri in the decomposition
so that no attributes are “lost”.
• Another goal of decomposition is to have each individual relation Ri in the decomposition
D be in BCNF or 3NF.
• Additional properties of decomposition are needed to prevent from generating spurious
tuples (loseless; non-additive)
Dependency Preservation Property
of a Decomposition

• It would be useful if each functional dependency X → Y specified in F


either appeared directly in one of the relation schemas Ri in the
decomposition D or could be inferred from the dependencies that appear in
some Ri. Informally, this is the dependency preservation condition.

Definition: Given a set of dependencies F on R, the projection of F on Ri,


denoted by πRi(F) where Ri is a subset of R, is the set of dependencies X → Y
in F+ such that the attributes in X ∪ Y are all contained in Ri.

Hence, the projection of F on each relation schema Ri in the decomposition D


is the set of functional dependencies in F+, the closure of F, such that all the
left- and right-hand-side attributes of those dependencies are in Ri.
We say that a decomposition D = {R1, R2, … , Rm} of R is dependency-
preserving with respect to F if the union of the projections of F on each Ri in
D is equivalent to F; that is, ((πR1(F)) ∪ K ∪ (πRm(F)))+ = F+.
• Claim: It is always possible to find a dependency-preserving decomposition
D with respect to F such that each relation is in 3NF
• Example:
– Assume R= ABCD, and F= {A→B, C → D}. The decomposition D1 =
{AB, CD} is clearly F-preserving. Observe the first rule is kept in R1,
while the second is preserved in R2. Also notice that schemas AB and CD
are in 3NF format.
• Example:
– Assume R= ABCD, and F= {A→B, B →C, C → D}. Evaluate the
decomposition D = {ABC, CD}.
Nonadditive (Lossless) Join Property
of a Decomposition

• Another property that a decomposition D should possess is the nonadditive


join property, which ensures that no spurious tuples are generated
when a NATURAL JOIN operation is applied to the relations resulting
from the decomposition.

Definition:
Formally, a decomposition D = {R1, R2, … , Rm} of R has the lossless
(nonadditive) join property with respect to the set of dependencies F on R if,
for every relation state r of R that satisfies F, the following holds, where
* is the NATURAL JOIN of all the relations in D: *(πR1(r), … , πRm(r))
= r.
• The word loss in lossless refers to loss of information, not to loss of tuples. If a
decomposition does not have the lossless join property, we may get additional spurious
tuples
• The nonadditive join property ensures that no spurious tuples result after the application of
PROJECT and JOIN operations.
Algorithm 15.3. Testing for Nonadditive Join Property
• Example: Consider the schema R=ABCD, subjected to FDs F= { A → B, B →
C }, and the Non-binary partition D1= {ACD, AB, BC}. Question Is D1a
Lossless decomposition?
• Example 8: Consider the schema R=ABCD, subjected to FDs F=
{A→B,B → C }, and the Non-binary partition D2= {ABC, AD}.
• Example: Consider the schema R=ABCDE, subjected to FDs F= {A→C,
B → C, C → D, DE →C, CE →A}, and the Non-binary partition D4= AD,
B, BE, CDE, AE}. Question Is D4 a Lossless decomposition?
• Example : –Your turn…

– Consider the schema R=ABCD, subjected to FDs F= {A→B, C → D},


and the Non-binary partitions D4= {AB, AC, AD} and D5= {AB, AC,
CD}. Question. Are partitions D4 and D5 Lossless decompositions?
Algorithms for Relational Database
Schema Design

• There are two algorithms for creating a relational decomposition from a


universal relation.
– The first algorithm decomposes a universal relation into dependency
preserving 3NF relations that also possess the non-additive join
property.
– The second algorithm decomposes a universal relation schema into
BCNF schemas that possess the non-additive join property.
Dependency-Preserving and Nonadditive (Lossless)
Join Decomposition into 3NF Schemas

• We know that it is not possible to have all three of the following:


– (1) guaranteed nonlossy (nonadditive) design, (2) guaranteed dependency
preservation, and (3) all relations in BCNF.
– the first condition is a must and cannot be compromised.

– The second condition is desirable, but not a must, and may have to be relaxed if
we insist on achieving BCNF.
– The original lost FDs can be recovered by a JOIN operation over the results of
decomposition.
• Now we give an algorithm where we achieve conditions 1 and 2 and only
guarantee 3NF.
• Algorithm 15.4 yields a decomposition D of R that does the following:

■ Preserves dependencies

■ Has the nonadditive join property

■ Is such that each resulting relation schema in the decomposition is in


3NF
Algorithm 15.4 Relational Synthesis into 3NF with Dependency Preservation
and Nonadditive Join Property

Input: A universal relation R and a set of functional dependencies F on the


attributes of R.
1. Find a minimal cover G for F (use Algorithm 15.2).
2. For each left-hand-side X of a functional dependency that appears in G,
create a relation schema in D with attributes {X ∪ {A1} ∪ {A2} … ∪ {Ak} },
where X → A1, X → A2, … , X → Ak are the only dependencies in G with X
as lefthand side (X is the key of this relation).
3. If none of the relation schemas in D contains a key of R, then create one
more relation schema in D that contains attributes that form a key of R.
(Algorithm 15.2(a) may be used to find a key.)
4. Eliminate redundant relations from the resulting set of relations in the
relational database schema. A relation R is considered redundant if R is a
projection of another relation S in the schema; alternately, R is subsumed by
S.
• Example: Consider the following universal relation:

– U (Emp_ssn, Pno, Esal, Ephone, Dno, Pname, Plocation)

• Emp_ssn, Esal, and Ephone refer to the Social Security number,


salary, and phone number of the employee.
• Pno, Pname, and Plocation refer to the number, name, and location of
the project.
• Dno is the department number.
• By virtue of FD3, the attribute set {Emp_ssn, Pno} represents a key of the
universal relation.
• Hence F, the set of given FDs, includes:
– {Emp_ssn → Esal, Ephone, Dno;
– Pno → Pname, Plocation;
– Emp_ssn, Pno → Esal, Ephone, Dno, Pname, Plocation}
• By applying the minimal cover Algorithm 15.2, in step 3 we see that Pno is
an extraneous attribute in Emp_ssn, Pno → Esal, Ephone, Dno.
– Moreover, Emp_ssn is extraneous in Emp_ssn, Pno → Pname,
Plocation.
– Hence the minimal cover consists of FD1 and FD2 only (FD3 being
completely redundant)
• The second step of Algorithm 15.4 produces relations R1 and R2 as:
– R1 (Emp_ssn, Esal, Ephone, Dno)

– R2 (Pno, Pname, Plocation)

• In step 3, we generate a relation corresponding to the key {Emp_ssn, Pno} of U.


• Hence, the resulting design contains:
– R1 (Emp_ssn, Esal, Ephone, Dno)

– R2 (Pno, Pname, Plocation)

– R3 (Emp_ssn, Pno)

• This design achieves both the desirable properties of dependency preservation and
nonadditive join.
Nonadditive Join Decomposition into BCNF Schemas

• The next algorithm decomposes a universal relation schema R = {A1, A2,


… , An} into a decomposition D = {R1, R2, … , Rm} such that each Ri is in
BCNF and the decomposition D has the lossless join property with respect
to F.
Domain-Key Normal Form

• DKNF stands for Domain Key Normal Form requires the database that contains no
constraints other than domain constraints and key constraints.
• In DKNF, it is easy to build a database.
• It avoids general constraints in the database which are not clear domain or key constraints.
• A constraint in this definition is any rule that’s precise enough that you can evaluate
whether or not it’s true. A key is a unique identifier of a row in a table. A domain is the set
of permitted values of an attribute.
• The 3NF, 4NF, 5NF and BCNF are special cases of the DKNF.
• It is achieved when every constraint on the relation is a logical consequence of the
definition.
• Look at this database, which is in 1NF, to see what you must do to put that
database in DK/NF.

Table: SALES (Customer_ID, Product, Price)


Key: Customer_ID
Constraints: Customer_ID determines Product
Product determines Price
Customer_ID must be an integer > 1000
• To enforce Constraint 3 (that Customer_ID must be an integer greater than
1000), you can simply define the domain for Customer_ID to incorporate this
constraint.
– That makes the constraint a logical consequence of the domain of
the CustomerID column.
• Product depends on Customer_ID, and Customer_ID is a key, so you have no
problem with Constraint 1, which is a logical consequence of the definition of the
key.
• Constraint 2 is a problem.
– Price depends on (is a logical consequence of) Product, and Product isn’t a key.
– The solution is to divide the SALES table into two tables.
– One table uses Customer_ID as a key, and the other uses Product as a key. The
database, besides being in 3NF, is also in DK/NF.

You might also like