DBMS Module4 Notes
DBMS Module4 Notes
• There are two levels at which we can discuss the goodness of relation schemas.
– logical (or conceptual) level—how users interpret the relation schemas
and the meaning of their attributes.
– implementation (or physical storage) level—how the tuples in a base
relation are stored and updated.
• As with many design problems, database design may be performed using two
approaches:
– Bottom-up or Top-down.
■ Making sure that the semantics of the attributes is clear in the schema
• One goal of schema design is to minimize the storage space used by the
base relations (and hence the corresponding files).
– Grouping attributes into relation schemas has a significant effect on
storage space.
• For example, compare the space used by the two base relations
EMPLOYEE and DEPARTMENT with that for an EMP_DEPT base
relation, which is the result of applying the NATURAL JOIN operation to
EMPLOYEE and DEPARTMENT.
• Storing natural joins of base relations leads to an additional problem referred to as
update anomalies.
– These can be classified into insertion anomalies, deletion anomalies, and
modification anomalies.
• Consider the relation:
– EMP_PROJ( Emp#, Proj#, Ename, Pname, No_hours)
• Insertion Anomalies:
– Suppose you want to add new information in any relation because of some
constraints.
– Ex:
• Cannot insert a project unless an employee is assigned to it.
• Conversely
– Cannot insert an employee unless an he/she is assigned to a project.
• It is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation.
• The only way to do this is to place NULL values in the attributes for employee.
– This violates the entity integrity for EMP_DEPT because its primary key Ssn
cannot be null.
• This problem does not occur in the design of Figure 14.2 because a department is
entered in the DEPARTMENT relation whether or not any employees work for it,
and whenever an employee is assigned to that department, a corresponding tuple
is inserted in EMPLOYEE.
Deletion Anomalies:
– This problem does not occur in the database of Figure 14.2 because
DEPARTMENT tuples are stored separately.
Modification Anomalies:
• In some schema designs we may group many attributes together into a “fat”
relation.
• If many of the attributes do not apply to all tuples in the relation, we end up
with many NULLs in those tuples.
tuples
• This can waste space at the storage level and may also lead to problems with
understanding the meaning of the attributes and with specifying JOIN
operations at the logical level.
• Another problem with NULLs is how to account for them when aggregate
operations such as COUNT or SUM are applied.
• SELECT and JOIN operations involve comparisons; if NULL values are
present, results will be unpredictable.
• NULLs can have multiple interpretations, such as attribute does not apply to
this tuple, attribute value for this tuple is unknown, value is known but absent;
Guideline 3:
• As far as possible, avoid placing attributes in a base relation whose
values may frequently be NULL.
• If NULLs are unavoidable, make sure that they apply in exceptional cases
only and do not apply to a majority of tuples in the relation.
Generation of Spurious Tuples
• Bad designs for a relational database may result in erroneous results for certain JOIN
operations
• A spurious tuple is, basically, a record in a database that gets created when two tables
are joined badly.
– spurious tuples are created when two tables are joined on attributes that are
neither primary keys nor foreign keys
• The "lossless join" property is used to guarantee meaningful results for join
operations .
Guideline 4:
• Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs in a way that
guarantees that no spurious tuples are generated.
• Avoid relations that contain matching attributes that are not (foreign key, primary
key) combinations because joining on such attributes may produce spurious tuples.
Functional Dependencies
• The set of attributes X is called the left-hand side of the FD, and Y is called
the right-hand side.
• For example, {State, Driver_license_number} → EMP_ID
– Should normally hold for any adult in the United States and hence
should hold whenever these attributes appear in a relation.
• Consider the following relation schema:
• From the semantics of the attributes and the relation, we know that the
following functional dependencies should hold:
• These functional dependencies specify that:
Definition:
• The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.
First Normal Form
The attribute ID is the identification key. All attributes are single valued (1NF). The
table is also in 2NF.
(a) The LOTS relation with its functional dependencies FD1 through FD4.
• Suppose that the following two additional functional dependencies hold in
LOTS:
– dependency FD3 says that the tax rate is fixed for a given county (does not vary
lot by lot within the same county),
– whereas FD4 says that the price of a lot is determined by its area regardless of
which county it is in.
• The LOTS relation schema violates the general definition of 2NF because Tax_rate
is partially dependent on the candidate key {County_name, Lot#}, due to FD3.
• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and
LOTS2, shown in Figure 14.12(b).
(b) Decomposing into the 2NF relations LOTS1 and LOTS2.
(c) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.
General Definition of Third Normal Form:
• A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional
dependency X → A holds in R, either
(a) X is a superkey of R, or
• Example:
» Person1 is in BCNF
• In the table above:
– One student can enroll for multiple subjects. For example, student with student_id
101, has opted for subjects - Java & C++.
– And, there can be multiple professors teaching one subject like we have for Java.
– In the table above student_id, subject together form the primary key, key because
using student_id and subject, we can find all the columns of the table.
– One more important point to note here is, one professor teaches only one subject,
subject
but one subject may have two different professors.
professors
– Hence, there is a dependency between subject and professor here, where subject
depends on the professor name
Multivalued Dependency
and Fourth Normal Form
A table is said to have multi-valued dependency, if the following conditions are true:
• For a dependency A → B, if for a single value of A, multiple value of B
exists, then the table may have multi-valued dependency.
• Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
• And, for a relation R(A,B,C), if there is a multi-valued dependency between,
A and B, then B and C should be independent of each other.
– If all these conditions are true for any relation(table), it is said to have multi-
valued dependency.
dependency
• For example: Consider a bike manufacture company, which produces two colors (Black and
Red) in each model every year
• Here columns manuf_year and color are independent of each other and dependent on
bike_model.
• In this case these two columns are said to be multivalued dependent on bike_model.
These dependencies can be represented like this:
» bike_model ->-> manuf_year
» bike_model ->-> color
4NF (Contd..)
• As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
• What is the problem ???
• Two records for student with s_id 1, will give rise to two more records, as shown
below, because for one student, two hobbies exists, hence along with both the
courses, these hobbies should be specified.
• Fifth normal form is satisfied when all tables are broken into as many tables as
possible in order to avoid redundancy.
• Also known as project join NF.
• Once it is in fifth normal form it cannot be broken into smaller relations
without changing the facts or the meaning.
• A relation decomposed into two relations must have loss-less join Property,
which ensures that no spurious or extra tuples are generated, when relations
are reunited through a natural join.
• Whenever a non-trivial join dependency *{R1 , R2 , …, Rn } holds in R,
implies every Ri (all the attributes of Ri ) is a superkey for R.
Example:
• Let us consider the relation STOCK(Agent, Company, Product)
• We assume that:
• A key ‘K’ is a superkey with the additional property that removal of any
attribute from ‘K’ will cause ‘K’ not to be a superkey any more.
– Step-1 : Add the attributes which are present on Left Hand Side in the
original functional dependency.
– Step-2 : Now, add the attributes present on the Right Hand Side of the
functional dependency.
– Step-3 : With the help of attributes present on Right Hand Side, check
the other attributes that can be derived from the other given functional
dependencies. Repeat this process until all the possible attributes which
can be derived are added in the closure.
Example
• Consider a relation R(A,B,C,D,E) having below mentioned functional
dependencies.
– FD1 : A BC
– FD2 : C B
– FD3 : D E
– FD4 : E D
• F = {Ssn → {Ename, Bdate, Address, Dnumber}, Dnumber → {Dname, Dmgr_ssn} }
• Some of the additional functional dependencies that we can infer from F are the following:
– Ssn → {Dname, Dmgr_ssn}
– Ssn → Ssn
– Dnumber → Dname
• The closure F+ of F is the set of all functional dependencies that can be inferred from F.
• Set of inference rules can be used to infer new dependencies from a given set of
dependencies.
• We use the notation F |=X → Y to denote that the functional dependency X → Y is
inferred from the set of functional dependencies F.
• They were proposed first by Armstrong (1974) and hence are known as Armstrong’s axioms.
• Reflexive Rule (IR1)
– In the reflexive rule, if Y is a subset of X, then X determines Y.
• If X ⊇ Y then X → Y
• Example:
– X = {a, b, c, d, e}
– Y = {a, b, c}
• Augmentation Rule (IR2)
– The augmentation is also called as a partial dependency.
– In augmentation, if X determines Y, then XZ determines YZ for any Z
• If X → Y then XZ → YZ
• Example:
– For R(ABCD), if A → B then AC → BC
• Transitive Rule (IR3)
– In the transitive rule, if X determines Y and Y determine Z, then X
must also determine Z.
• If X → Y and Y → Z then X → Z
• Union or additive Rule (IR4)
– Union rule says, if X determines Y and X determines Z, then X must
also determine Y and Z
• If X → Y and X → Z then X → YZ
– Proof:
• Decomposition or project Rule (IR5)
Definition:
For each such set of attributes X, we determine the set X+ of attributes that
are functionally determined by X based on F; X+ is called the closure of X
under F.
Determining X+, the Closure of X under F
• Algorithm 15.1 starts by setting X+ to all the attributes in X. By IR1, we
know that all these attributes are functionally dependent on X.
X
• Using inference rules IR3 and IR4, we add attributes to X+, using each
functional dependency in F.
• We keep going through all the dependencies in F (the repeat loop) until no
more attributes are added to X+ during a complete cycle (of the for loop)
through the dependencies in F.
• For example, consider the following relation schema about classes held at a university in a
given academic year.
– {Classroom}+ ={Classroom,capacity}
consider the relation schema EMP_PROJ. From the semantics of the
attributes, we specify
•For example, the following set F of functional dependencies that should hold
on EMP_PROJ;
• SSN → ENAME
• Using Algorithm 15.1, we calculate the following closure sets with respect
to F;
Equivalence of Sets of Functional Dependencies
Step 3. As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These
two FD sets are semantically equivalent.
Find whether the following FD’s are equivalent
• All above dependencies are in canonical form (that is, they have only one attribute on
the right-hand side), so we have completed step 1 of Algorithm 15.2 and can proceed to
step 2.
• In step 2 we need to determine if AB → D has any redundant (extraneous) attribute on
the left-hand side; that is, can it be replaced by B → D or A → D?
– Since B → A, by augmenting with B on both sides (IR2), we have BB → AB, or B
→ AB (i). However, AB → D as given (ii).
– Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus AB → D
may be replaced by B → D.
• We now have a set equivalent to original E, say E′: {B → A, D → A, B → D}.
• No further reduction is possible in step 2 since all FDs have a single attribute on the
left-hand side.
• In step 3 we look for a redundant FD in E′. By using the transitive rule on B → D and
D → A, we derive B → A. Hence B → A is redundant in E′ and can be eliminated.
• Therefore, the minimal cover of E is F: {B → D, D → A}.
Example 2: Let the given set of FDs be G: {A → BCDE, CD → E}.
• Here, the given FDs are NOT in the canonical form. So we first convert
them into:
• E: {A → B, A→ C, A→ D, A→ E, CD → E}.
• In step 2 of the algorithm, for CD → E, neither C nor D is extraneous on
the left-hand side, since we cannot show that C → E or D → E from the
given FDs. Hence we cannot replace it with either.
• In step 3, we want to see if any FD is redundant. Since A→ CD and CD →
E, by transitive rule (IR3), we get A→ E. Thus, A→ E is redundant in G.
• So we are left with the set F, equivalent to the original set G as: {A → B,
A→ C, A→ D, CD → E}. F is the minimum cover.
• F = {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}
• F={BC->ADEF,F->DE}
Algorithm 15.2(a). Finding a Key K for R Given a Set F of Functional
Dependencies
Definition:
Formally, a decomposition D = {R1, R2, … , Rm} of R has the lossless
(nonadditive) join property with respect to the set of dependencies F on R if,
for every relation state r of R that satisfies F, the following holds, where
* is the NATURAL JOIN of all the relations in D: *(πR1(r), … , πRm(r))
= r.
• The word loss in lossless refers to loss of information, not to loss of tuples. If a
decomposition does not have the lossless join property, we may get additional spurious
tuples
• The nonadditive join property ensures that no spurious tuples result after the application of
PROJECT and JOIN operations.
Algorithm 15.3. Testing for Nonadditive Join Property
• Example: Consider the schema R=ABCD, subjected to FDs F= { A → B, B →
C }, and the Non-binary partition D1= {ACD, AB, BC}. Question Is D1a
Lossless decomposition?
• Example 8: Consider the schema R=ABCD, subjected to FDs F=
{A→B,B → C }, and the Non-binary partition D2= {ABC, AD}.
• Example: Consider the schema R=ABCDE, subjected to FDs F= {A→C,
B → C, C → D, DE →C, CE →A}, and the Non-binary partition D4= AD,
B, BE, CDE, AE}. Question Is D4 a Lossless decomposition?
• Example : –Your turn…
– The second condition is desirable, but not a must, and may have to be relaxed if
we insist on achieving BCNF.
– The original lost FDs can be recovered by a JOIN operation over the results of
decomposition.
• Now we give an algorithm where we achieve conditions 1 and 2 and only
guarantee 3NF.
• Algorithm 15.4 yields a decomposition D of R that does the following:
■ Preserves dependencies
– R3 (Emp_ssn, Pno)
• This design achieves both the desirable properties of dependency preservation and
nonadditive join.
Nonadditive Join Decomposition into BCNF Schemas
• DKNF stands for Domain Key Normal Form requires the database that contains no
constraints other than domain constraints and key constraints.
• In DKNF, it is easy to build a database.
• It avoids general constraints in the database which are not clear domain or key constraints.
• A constraint in this definition is any rule that’s precise enough that you can evaluate
whether or not it’s true. A key is a unique identifier of a row in a table. A domain is the set
of permitted values of an attribute.
• The 3NF, 4NF, 5NF and BCNF are special cases of the DKNF.
• It is achieved when every constraint on the relation is a logical consequence of the
definition.
• Look at this database, which is in 1NF, to see what you must do to put that
database in DK/NF.