0% found this document useful (0 votes)
79 views38 pages

Chapter6_NormalizationDatabaseTables_Part4 (2)

Database class

Uploaded by

chamso Abou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views38 pages

Chapter6_NormalizationDatabaseTables_Part4 (2)

Database class

Uploaded by

chamso Abou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 6

Normalization of Database
Tables
CSC 3326
Learning Objectives
• After completing this chapter, you will be able to:
• Explain normalization and its role in the database design process
• Identify and describe each of the normal forms: 1NF, 2NF, 3NF, BCNF, and 4NF
• Explain how normal forms can be transformed from lower normal forms to higher normal
forms
• Apply normalization rules to evaluate and correct table structures
• Use a data-modeling checklist to check that the ERD meets a set of minimum
requirements
Introduction
• Good database design must be matched to good table structures.

• Having good relational database software is not enough to avoid the data redundancy.

• The table is the basic building block of database design

• Ideally, the database design process yields good table structures.


• Yet, it is possible to create poor table structures (contain data redundancy).

• How do you recognize a poor table structure? and how do you produce a good table?

• Normalization is a process for evaluating and correcting table structures to minimize


data redundancies, thereby reducing the likelihood of data anomalies (update,
insert and delete).
• The normalization process involves assigning attributes to tables based on the
concepts of determination and functional dependency.
Introduction
• Normalization works through a series of stages called normal forms:
• The first three stages are:
 First normal form (1NF)
 Second normal form (2NF)
 Third normal form (3NF)

• Structural point of view of normal forms


• Higher normal forms are better than lower normal forms.
Need for Normalization
• Database designers commonly use normalization in two situations:
1) When designing a new database structure based on the business requirements of
the end users.
o After the initial design is complete, the designer can use normalization to analyze
the relationships among the attributes within each entity and determine if the
structure can be improved through normalization.

2) Modify existing data structures that can be in the form of flat files,
spreadsheets, or older database structures.
o Can use the normalization process to improve the existing data structure and
create an appropriate database design.
• Construction company that manages several building projects sample
periodic report:
Data is organized around projects
The Normalization Process
• Normalization is used to produce a set of normalized relations
(tables) that will be used to generate the required information.
• In normalization terminology:
 Any attribute that is at part of a candidate key is known as
a prime attribute instead of the more common term key attribute.
 A nonprime attribute, or a nonkey attribute, is not part of any
candidate key.
The Normalization Process
• The objective of normalization is to ensure that each table conforms to the
concept of well-formed tables:
• Each table represents a single subject
• Each row/column intersection contains only one value and not a group of values
• No data item will be unnecessarily stored in more than one table (tables have
minimum controlled redundancy)=> to ensure data is updated in only one place.
• All nonprime attributes in a table are dependent on the primary key, the
entire primary key (in case of composite key) and nothing but the primary
key.
• Each table has no insertion, update, or deletion anomalies
The Normalization Process

• The concept of keys and functional dependencies are central to the


normalization process.
• From the data modeler’s point of view, the objective of normalization is to
ensure that all tables are at least in 3NF.
• Normal forms such as the fifth normal form (5NF) and domain-key normal
form (DKNF) are not likely to be encountered in a business environment and
are mainly of theoretical interest.
=>STU_NUM is the determinant and STU_LNAME is the dependent

=>It is a functional dependency and not a full functional dependency.


The Normalization Process
• The normalization process works one relation at a time
 Identifies the dependencies of a relation (table)
 Progressively breaks the relation up into a new set of relations
• Two types of functional dependencies that are of special interest in normalization.
• Assumption: one candidate key = primary key
 A partial dependency (applicable only to composite keys): exists when there is a
functional dependence in which the determinant is only part of the primary key.
If (A, B) → (C, D), B → C, and (A, B) is the primary key, then the functional dependence B → C is a
partial dependency because only part of the primary key (B) is needed to determine the value of C.
 Transitive dependency: attribute is dependent on another attribute that is not part of the
primary key
X → Y, Y → Z, and X is the primary key.
• More difficult to identify among a set of data.
• Occur only when a functional dependence exists among nonprime attributes.
The normalization process takes through steps that lead to successively higher normal
forms
Conversion to First Normal Form (1NF)
• 1NF deals with the repeating groups and ensures that the table conforms to the
requirements for a relational table.
• A repeating group is a set of one or more multivalued attributes that are related.
• If repeating groups do exist, they must be eliminated by making sure that each row
defines a single entity instance and that each row-column intersection has only a
single value.
• Three step process:
Step 1) Eliminate the Repeating Groups: Start by presenting the data in a tabular
format, where each cell has a single value and there are no repeating groups.
Step 2) Identify the Primary Key.

Step 3) Identify all Dependencies: anomalies still exist because there are additional
dependencies in addition to the primary key dependency.
Conversion to First Normal Form (1NF)

• Dependency diagram: depicts all dependencies found within given table


structure
• Helps to get an overview of all relationships among table’s attributes
• Makes it less likely that an important dependency will be overlooked
1.The primary key attributes are bold, underlined, and in a different color.
2.The arrows above the attributes indicate all desirable dependencies—that is,
dependencies based on the primary key.
3.The arrows below the dependency diagram indicate less desirable
dependencies.
Conversion to First Normal Form (1NF)

• 1NF describes tabular format in which:


• All key attributes are defined
• There are no repeating groups in the table
• All attributes are dependent on the primary key

• All relational tables satisfy 1NF requirements


• Tables still contain partial and transitive dependencies.
• Source of update, insertion, and deletion anomalies.
Conversion to Second Normal Form (2NF)

• Conversion to 2NF occurs only when the 1NF has a composite primary key. If the 1NF has a single-attribute
primary key, then the table is automatically in 2NF (true with the assumption of one candidate key = PK)
• Step 1: Make new tables to eliminate partial dependencies
• For each component of the primary key that acts as a determinant in a partial dependency, create a new table
with a copy of that component as the primary key.
• The determinants must remain in the original table because they will be the foreign keys for the relationships
needed to relate these new tables to the original table.
• Step 2: Reassign corresponding dependent attributes:
• Use the dependency diagram in 1NF to determine attributes that are dependent in the partial dependencies
• The attributes that are dependent in a partial dependency are removed from the original table and placed in
the new table with the dependency’s determinant.
• Table is in 2NF when it:

Is in 1NF
and
Includes no partial dependencies
Conversion to Third Normal Form (3NF)
• Step 1: Make new tables to eliminate transitive dependencies

• For every transitive dependency, write a copy of its determinant as a primary key for a new table.

• The determinant must remain in the original table because it will be the foreign key for the relationship
needed to relate this new table to the original table.
• Step 2: Reassign corresponding dependent attributes

• Identify the attributes that are dependent on each determinant identified in Step1.

• A table is in third normal form (3NF) when:

It is in 2NF.
and
It contains no transitive dependencies.
Improving the design
• After cleaning the partial and transitive dependencies (3NF), the focus is to improve the database’s
ability to provide information and on enhancing its operational characteristics.

• Normalization is valuable because its use helps eliminate data redundancies => Various types of
issues need to be addressed to produce a good normalized set of tables.

1) Evaluate PK assignments
Evaluate PK against the PK characteristics

Consider the JOB_CLASS primary key ( too much descriptive content to be usable) => risk of
referential integrity violation.
Therefore, it would be better to add a JOB_CODE attribute (surrogate key) to create a unique
identifier
Surrogate key is an artificial PK introduced by the designer with the purpose of simplifying the
assignment of primary keys to tables.
Surrogate keys are usually numeric, they are often generated automatically by the DBMS.
Improving the design
2) Naming conventions
• Entity name:
• Be descriptive of the objects in the business environment
• Use terminology that is familiar to the users

• Attribute name:
• Required to be descriptive of the data represented by the attribute
• A good practice to prefix the name of an attribute with the name or abbreviation of the
entity in which it occurs: CUSTOMER/CUS_CREDIT_NUMBER
Þ CHG_HOUR will be changed to JOB_CHG_HOUR to indicate its association with the JOB
table.
Þ Attribute name JOB_CLASS does not quite describe entries such as Systems Analyst,
Database Designer, and so on; the label JOB_DESCRIPTION is used.
Improving the design
3) Refine attribute atomicity
• An atomic attribute is an attribute that cannot be further subdivided to produce meaningful
components. For example, a person’s last name attribute cannot be meaningfully subdivided.
• By improving the degree of atomicity, querying flexibility is gained.
• In general, designers prefer to use simple, single-valued attributes, as indicated by the business rules
and processing requirements.
=> EMP_NAME in the EMPLOYEE table is not atomic because EMP_NAME can be decomposed into a last
name, a first name, and an initial.

4) Identify New Attributes


• Several other attributes would have to be added.
=> An employee hire date attribute (EMP_HIREDATE) could be used to track an employee’s job longevity.

5) Identify New Relationships


=> The employee and project using the manage relationship
Improving the design
6) Refine primary keys as required for data granularity
• Granularity: Level of detail represented by the values stored in a table’s row
• Changing granularity requirements might dictate changes in primary key selection.
=> Does ASSIGN_HOURS represent the daily total, weekly total, monthly total, or yearly total?
=> Using a surrogate primary key such as ASSIGN_NUM provides lower granularity and yields greater flexibility.

7) Maintain Historical Accuracy


 Writing the job charge per hour into the ASSIGNMENT table is crucial to maintaining the historical accuracy of the
table’s data.

8) Evaluate Using Derived Attributes


 Use a derived attribute in the ASSIGNMENT table to store the actual charge made to a project.
 The derived attribute, named ASSIGN_CHARGE, is the result of multiplying ASSIGN_HOURS by
ASSIGN_CHG_HOUR.
 Storing the derived attribute in the table makes it easy to write the application software to produce the desired
results.
Multiple candidate keys
• The concept of keys is central to the normalization process.
• A candidate key has the same characteristics as a primary key, but for some reason, it was not chosen to be the
primary key.
• Normalization rules focus on candidate keys, not just the primary key.
• The previous normalization process (1NF,2NF,3NF) should be generalized for multiple candidate keys (instead of a single
candidate key = PK)
• 1NF
 Identify all candidate keys
 Make sure that all non-prime attributes are determined by all candidate keys.

• 2NF
 Partial depencies should be identified for all candidate keys

• 3NF
 The remaining non-prime attributes should not have transitive dependencies
The CLASS table has two candidate keys:
•CLASS_CODE
•CRS_CODE + CLASS_SECTION

The table is in 1NF because the key attributes are defined and all nonkey attributes are
determined by the both candidate keys.

The table is in 2NF because it is in 1NF and there are no partial dependencies on either
candidate key.
Finally, the table is in 3NF because there are no transitive dependencies.
Normalization and Database Design

• Normalization should be part of the design process


• The proposed entities meet the required normal form before the table structures are
created.
• If the designer follow the design procedures, the likelihood of data anomalies will be small
• Even the best database designers are known to make occasional mistakes that come to
light during normalization checks.
• Designer should be aware of good design principles and procedures as well as normalization
procedures.
1) ERD is created through an iterative process => Macro view of an organization’s data
requirements and operations.
2) Normalization focuses on the characteristics of specific entities=> normalization represents
a micro view of the entities within the ERD.
Normalization and Database Design
(Example)
• Business rules for the contracting company:
• The company manages many projects.
• Each project requires the services of many employees.
• An employee may be assigned to several different projects.
• Some employees are not assigned to a project and perform duties not specifically related
to a project.
• Each employee has a single primary job classification, which determines the hourly billing
rate.
• Many employees can have the same job classification. For example, the company employs
more than one electrical engineer
EMPLOYEE contains a transitive dependency.

The removal of EMPLOYEE’s transitive dependency yields three entities:


PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
Data-Modeling Checklist
• Designer should go over this checklist to ensure that all modeling tasks were successfully done.
• Business rules
• Properly document and verify all business rules with the end users
• Ensure that all business rules are written precisely, clearly, and simply
• The business rules must help identify entities, attributes, relationships, and constraints
• Identify the source of all business rules, and ensure that each business rule is justified, dated, and signed off by an
approving authority

• Data modeling
• Naming conventions: all names should be limited in length

• Entity names:
• Should be nouns that are familiar to business and should be short and meaningful
• Should document abbreviations, synonyms, and aliases for each entity
• Should be unique within the model
• For composite entities, may include a combination of abbreviated names of the entities linked through the
composite entity
Data-Modeling Checklist

• Attribute names:
• Should be unique within the entity
• Should use the entity abbreviation as a prefix
• Should be descriptive of the characteristic
• Should use suffixes such as _ID, _NUM, or _CODE for the PK attribute
• Should not be a reserved word
• Should not contain spaces or special characters such as @, !, or &

• Relationship names:
• Should be active or passive verbs that clearly indicate the nature of the
relationship
Data-Modeling Checklist
• Entities:
• Each entity should represent a single subject
• Each entity should represent a set of distinguishable entity instances
• All entities should be in 3NF or higher
• Granularity of the entity instance should be clearly defined
• PK should be clearly defined and support the selected data granularity
Data-Modeling Checklist
• Attributes:
• Should be simple and single-valued (atomic data)
• Should document default values, constraints, synonyms, and aliases
• Derived attributes should be clearly identified and include source(s)
• Should not be redundant unless this is required for transaction accuracy,
performance, or maintaining a history
• Nonkey attributes must be fully dependent on the PK attribute

• Relationships:
• Should clearly identify relationship participants
• Should clearly define participation, connectivity, and document cardinality
Data-Modeling Checklist
• ER model:
• Should be validated against expected processes: inserts, updates,
and deletions
• Should evaluate where, when, and how to maintain a history
• Should minimize data redundancy to ensure single-place updates
• Should conform to the minimal data rule: All that is needed is there,
and all that is there is needed

You might also like