Chapter6_NormalizationDatabaseTables_Part4 (2)
Chapter6_NormalizationDatabaseTables_Part4 (2)
Normalization of Database
Tables
CSC 3326
Learning Objectives
• After completing this chapter, you will be able to:
• Explain normalization and its role in the database design process
• Identify and describe each of the normal forms: 1NF, 2NF, 3NF, BCNF, and 4NF
• Explain how normal forms can be transformed from lower normal forms to higher normal
forms
• Apply normalization rules to evaluate and correct table structures
• Use a data-modeling checklist to check that the ERD meets a set of minimum
requirements
Introduction
• Good database design must be matched to good table structures.
• Having good relational database software is not enough to avoid the data redundancy.
• How do you recognize a poor table structure? and how do you produce a good table?
2) Modify existing data structures that can be in the form of flat files,
spreadsheets, or older database structures.
o Can use the normalization process to improve the existing data structure and
create an appropriate database design.
• Construction company that manages several building projects sample
periodic report:
Data is organized around projects
The Normalization Process
• Normalization is used to produce a set of normalized relations
(tables) that will be used to generate the required information.
• In normalization terminology:
Any attribute that is at part of a candidate key is known as
a prime attribute instead of the more common term key attribute.
A nonprime attribute, or a nonkey attribute, is not part of any
candidate key.
The Normalization Process
• The objective of normalization is to ensure that each table conforms to the
concept of well-formed tables:
• Each table represents a single subject
• Each row/column intersection contains only one value and not a group of values
• No data item will be unnecessarily stored in more than one table (tables have
minimum controlled redundancy)=> to ensure data is updated in only one place.
• All nonprime attributes in a table are dependent on the primary key, the
entire primary key (in case of composite key) and nothing but the primary
key.
• Each table has no insertion, update, or deletion anomalies
The Normalization Process
Step 3) Identify all Dependencies: anomalies still exist because there are additional
dependencies in addition to the primary key dependency.
Conversion to First Normal Form (1NF)
• Conversion to 2NF occurs only when the 1NF has a composite primary key. If the 1NF has a single-attribute
primary key, then the table is automatically in 2NF (true with the assumption of one candidate key = PK)
• Step 1: Make new tables to eliminate partial dependencies
• For each component of the primary key that acts as a determinant in a partial dependency, create a new table
with a copy of that component as the primary key.
• The determinants must remain in the original table because they will be the foreign keys for the relationships
needed to relate these new tables to the original table.
• Step 2: Reassign corresponding dependent attributes:
• Use the dependency diagram in 1NF to determine attributes that are dependent in the partial dependencies
• The attributes that are dependent in a partial dependency are removed from the original table and placed in
the new table with the dependency’s determinant.
• Table is in 2NF when it:
Is in 1NF
and
Includes no partial dependencies
Conversion to Third Normal Form (3NF)
• Step 1: Make new tables to eliminate transitive dependencies
• For every transitive dependency, write a copy of its determinant as a primary key for a new table.
• The determinant must remain in the original table because it will be the foreign key for the relationship
needed to relate this new table to the original table.
• Step 2: Reassign corresponding dependent attributes
• Identify the attributes that are dependent on each determinant identified in Step1.
It is in 2NF.
and
It contains no transitive dependencies.
Improving the design
• After cleaning the partial and transitive dependencies (3NF), the focus is to improve the database’s
ability to provide information and on enhancing its operational characteristics.
• Normalization is valuable because its use helps eliminate data redundancies => Various types of
issues need to be addressed to produce a good normalized set of tables.
1) Evaluate PK assignments
Evaluate PK against the PK characteristics
Consider the JOB_CLASS primary key ( too much descriptive content to be usable) => risk of
referential integrity violation.
Therefore, it would be better to add a JOB_CODE attribute (surrogate key) to create a unique
identifier
Surrogate key is an artificial PK introduced by the designer with the purpose of simplifying the
assignment of primary keys to tables.
Surrogate keys are usually numeric, they are often generated automatically by the DBMS.
Improving the design
2) Naming conventions
• Entity name:
• Be descriptive of the objects in the business environment
• Use terminology that is familiar to the users
• Attribute name:
• Required to be descriptive of the data represented by the attribute
• A good practice to prefix the name of an attribute with the name or abbreviation of the
entity in which it occurs: CUSTOMER/CUS_CREDIT_NUMBER
Þ CHG_HOUR will be changed to JOB_CHG_HOUR to indicate its association with the JOB
table.
Þ Attribute name JOB_CLASS does not quite describe entries such as Systems Analyst,
Database Designer, and so on; the label JOB_DESCRIPTION is used.
Improving the design
3) Refine attribute atomicity
• An atomic attribute is an attribute that cannot be further subdivided to produce meaningful
components. For example, a person’s last name attribute cannot be meaningfully subdivided.
• By improving the degree of atomicity, querying flexibility is gained.
• In general, designers prefer to use simple, single-valued attributes, as indicated by the business rules
and processing requirements.
=> EMP_NAME in the EMPLOYEE table is not atomic because EMP_NAME can be decomposed into a last
name, a first name, and an initial.
• 2NF
Partial depencies should be identified for all candidate keys
• 3NF
The remaining non-prime attributes should not have transitive dependencies
The CLASS table has two candidate keys:
•CLASS_CODE
•CRS_CODE + CLASS_SECTION
The table is in 1NF because the key attributes are defined and all nonkey attributes are
determined by the both candidate keys.
The table is in 2NF because it is in 1NF and there are no partial dependencies on either
candidate key.
Finally, the table is in 3NF because there are no transitive dependencies.
Normalization and Database Design
• Data modeling
• Naming conventions: all names should be limited in length
• Entity names:
• Should be nouns that are familiar to business and should be short and meaningful
• Should document abbreviations, synonyms, and aliases for each entity
• Should be unique within the model
• For composite entities, may include a combination of abbreviated names of the entities linked through the
composite entity
Data-Modeling Checklist
• Attribute names:
• Should be unique within the entity
• Should use the entity abbreviation as a prefix
• Should be descriptive of the characteristic
• Should use suffixes such as _ID, _NUM, or _CODE for the PK attribute
• Should not be a reserved word
• Should not contain spaces or special characters such as @, !, or &
• Relationship names:
• Should be active or passive verbs that clearly indicate the nature of the
relationship
Data-Modeling Checklist
• Entities:
• Each entity should represent a single subject
• Each entity should represent a set of distinguishable entity instances
• All entities should be in 3NF or higher
• Granularity of the entity instance should be clearly defined
• PK should be clearly defined and support the selected data granularity
Data-Modeling Checklist
• Attributes:
• Should be simple and single-valued (atomic data)
• Should document default values, constraints, synonyms, and aliases
• Derived attributes should be clearly identified and include source(s)
• Should not be redundant unless this is required for transaction accuracy,
performance, or maintaining a history
• Nonkey attributes must be fully dependent on the PK attribute
• Relationships:
• Should clearly identify relationship participants
• Should clearly define participation, connectivity, and document cardinality
Data-Modeling Checklist
• ER model:
• Should be validated against expected processes: inserts, updates,
and deletions
• Should evaluate where, when, and how to maintain a history
• Should minimize data redundancy to ensure single-place updates
• Should conform to the minimal data rule: All that is needed is there,
and all that is there is needed