RDBMS-Relational Database Made Easy
RDBMS-Relational Database Made Easy
Unit I
Introduction: Database System Applications – Purpose of Database Systems –
View of Data– Database Languages – Relational Databases – Database Design – Object
based and semi structured databases – Data storage and Querying – Database Users and
Administrators– Transaction Management – Database users and Architectures – History
of Database System.
Entity-Relationship Model: E-R model – constraints – E-R diagrams – E-R design
issues – weak entity sets – Extended E-R features.
Unit II
Relational Database Design: Features of good Relational designs – Atomic
domains and First Normal Form – Decomposition using functional dependencies –
Functional dependency theory – Decomposition using functional – Decomposition using
multivalued dependencies – more Normal forms – database design process – modeling
temporal data
Unit III
Database System Architecture: Centralized and Client-Server architecture –
Server system architecture – parallel systems – Distributed systems – Network types.
Parallel databases: I/O parallelism – Interquery Parallelism – Intraquery parallelism.
Distributed Databases: Homogeneous and Heterogeneous databases – Distributed Data
storage – Distributed transactions – Distributed query processing.
Unit IV
Schema Objects Data Integrity – Creating and Maintaining Tables – Indexes –
Sequences – Views – Users Privileges and Roles –Synonyms.
Unit V
5
Text Books:
1. Database System Concepts – SilberschatzKorthSudarshan, International (5th
Edition) McGraw Hill Higher Education 2006
2. Jose A.Ramalho – Learn ORACLE 8i BPB Publications 2003
UNIT I
Introduction
➢ Database System Applications
➢ Purpose of Database Systems
➢ View of Data
➢ Database Languages
6
➢ Relational Databases
➢ Database Design
➢ Object based and semi structured databases
➢ Data storage and Querying
➢ Database Users and Administrators
➢ Transaction Management
➢ Database users and Architectures
➢ History of Database System.
Entity-Relationship Model:
➢ E-R model
➢ Constraints
➢ E-R diagrams
➢ E-R design issues
➢ Weak entity sets
➢ Extended E-R features.
7
1. Database and
2. Management System
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any
DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent
meaningful information.
Roll Name Age
1 ABC 19
The columns of this relation are called Fields, Attributes or Domains. The rows are called
Tuples
or Records.
8
T1 T2
Roll Name Age Roll Address
1 ABC 19 1 KOL
2 DEF 22 2 DEL
3 XYZ 28 3 MUM
T3 T4
We now have a collection of 4 tables. They can be called a “related collection” because we can clearly
find out that there are some common attributes existing in a selected pair of tables. Because of
these common attributes we may combine the data of two or more tables together to find out
the complete details of a student. Questions like “Which hostel does the youngest student live
in?” can be answered now, although
9
A database in a DBMS could be viewed by lots of different people with different responsibilities.
For example, within a company there are different departments, as well as customers, who each
need to see different kinds of data. Each employee in the company will have different levels of access
to the database with their own customized front-end application.
In a database, data is organized strictly in row and column format. The rows are called Tuple or
Record. The data items within one row may belong to different data types. On the other hand, the
columns are often called Domain or Attribute. All the data items within a single attribute are of
the same data type.
database. The collection of data, usually referred to as the database, contains information relevant to
an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient. By data, we mean known facts that can be
recorded and that have implicit meaning.
Databases touch all aspects of our lives. Some of the major areas of application are as
follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing and selling
5. Human resources
Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for
generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items
in factories, inventories of items inwarehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking,generation of
recommendation lists, and
✓ Assign grades to students, compute grade point averages (GPA), and generate transcripts
Data redundancy and inconsistency. Since different programmers create the files and application
programs over a long period, the various files are likely to have different structures and the
programs may be written in several programming languages. Moreover, the same information may
be duplicated in several places (files). For example, if a student has a double major (say, music and
mathematics) the address and telephone number of that student may appear in a file that consists of
student records of students in the Music department and in a file that consists of student records of
students in the Mathematics department. This redundancy leads to higher storage and access
cost. In addition, it may lead to data inconsistency; that is, the various copies of the same data may
no longer agree. For example, a changed student address may be reflected in the Music
department records but not elsewhere in the system.
Difficulty in accessing data. Suppose that one of the university clerks needs to find out the
names of all students who live within a particular postal-code area. The clerk asks the data-
processing department to generate such a list. Because the designers of the original system did
not anticipate this request, there is no application program on hand to meet it. There is, however,
an application program to generate the list of all students.
Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department,
and records the balance amount in each account. Suppose also that the university requires that the
account balance of a department may never fall below zero. Developers enforce these constraints in
the system by adding appropriate code in the various application programs. However, when new
constraints are added, it is difficult to change the programs to enforce them. The problem is
compounded when constraints involve several data items from different files.
Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure.
13
Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed, today, the
largest Internet retailers may have millions of accesses per day to their data by shoppers.
As another example, suppose a registration program maintains a count of students registered for a
course, in order to enforce limits on the number of students registered. When a student registers, the
program reads the current count for the courses, verifies that the count is not already at the limit, adds
one to the count, and stores the count back in the database. Suppose two students register
concurrently, with the count at (say) 39. The two program executions may both read the value 39,
and both would then write back 40, leading to an incorrect increase of only 1, even though two
students successfully registered for the course and the count should be 41. Furthermore, suppose the
course registration limit was 40; in the above case both students would be able to register, leading
to a violation of the limit of 40 students.
Security problems. Not every user of the database system should be able to access all the data. For
example, in a university, payroll personnel need to see only that part of the database that has financial
information. They do not need access to information about academic records. But, since application
programs are added to the file-processing system in an ad hoc manner, enforcing such security
constraints is difficult.
These difficulties, among others, prompted the development of database systems. In what follows, we
shall see the concepts and algorithms that enable database systems to solve the problems with file-
processing systems.
Advantages of DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same data
multiple times). In a database system, by having a centralized database and centralized control of
data by the DBA the unnecessary duplication of data is avoided. It also eliminates the extra time for
processing the large volume of data. It results in saving the storage space.
14
Improved Data Sharing : DBMS allows a user to share the data in any number of application
programs.
Data Integrity : Integrity means that the data in the database is accurate. Centralized control of the
data helps in permitting the administrator to define integrity constraints to the data in the
database. For example: in customer database we can can enforce an integrity that it must accept
the customer only from Noida and Meerut city.
Security : Having complete authority over the operational data, enables the DBA in ensuring that
the only mean of access to the database is through proper channels. The DBA can define
authorization checks to be carried out whenever access to sensitive data is attempted.
Data Consistency : By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: is a customer address is stored only once, we cannot have disagreement
on the stored values. Also updating data values is greatly simplified when each value is stored in
one place only. Finally, we avoid the wasted storage that results from redundant data storage.
Efficient Data Access : In a database system, the data is managed by the DBMS and all access to the
data is through the DBMS providing a key to effective data processing
Enforcements of Standards : With the centralized of data, DBA can establish and enforce the data
standards which may include the naming conventions, data quality standards etc.
Data Independence : Ina database system, the database management system provides the interface
between the application programs and the data. When changes are made to the data
representation, the meta data obtained by the DBMS is changed but the DBMS is continues to
provide the data to application program in the previously used way. The DBMs handles the task of
transformation of data wherever necessary.
15
Reduced Application Development and Maintenance Time : DBMS supports many important
functions that are common to many applications, accessing data stored in the DBMS, which facilitates
the quick development of application.
Disadvantages of DBMS
1) It is bit complex. Since it supports multiple functionality to give the user the best, the
underlying software has become complex. The designers and developers should have thorough
knowledge about the software to get the most out of it.
2) Because of its complexity and functionality, it uses large amount of memory. It also needs large
memory to run efficiently.
3) DBMS system works on the centralized system, i.e.; all the users from all over the world
access this database. Hence any failure of the DBMS, will impact all the users.
4) DBMS is generalized software, i.e.; it is written work on the entire systems rather specific one.
Hence some of the application will run slow.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with an
abstract view of the data. That is, the system hides certain details of how the data are stored and
maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many
database-system users are not computer trained, developers hide the complexity from users
through several levels of abstraction, to simplify users’ interactions with the system:
16
D
B
• Physical level (or Internal View / Schema): The lowest level of abstraction describes how
the data are actually stored. The physical level describes complex low-level data structures in
detail.
• Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes what
data are stored in the database, and what relationships exist among those data. The logical level
thus describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this complexity.
This is referred to as physical data independence. Database administrators, who must decide what
information to keep in the database, use the logical level of abstraction.
• View level (or External View / Schema): The highest level of abstraction describes only part of
the entire database. Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database. Many users of the database system
do not need all this information; instead, they need to access only a part of the database. The view
level of abstraction exists to simplify their interaction with the system. The system may provide
many views for the same database. Figure 1.2 shows the relationship among the three levels of
abstraction.
17
An analogy to the concept of data types in programming languages may clarify the distinction among
levels of abstraction. Many high-level programming languages support the notion of a structured
type. For example, we may describe a record as follows:
This code defines a new record type called instructor with four fields. Each field has a name and a type
associated with it. A university organization may have several such record types, including
At the physical level, an instructor, department, or student record can be described as a block of
consecutive storage locations. The compiler hides this level of detail from programmers. Similarly,
the database system hides many of the lowest-level storage details from database programmers.
Database administrators, on the other hand, may be aware of certain details of the physical
organization of the data.
At the logical level, each such record is described by a type definition, as in the previous code segment,
and the interrelationship of these record types is defined as well. Programmers using a programming
language work at this level of abstraction. Similarly, database administrators usually work at this
level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of the
data types. At the view level, several views of the database are defined, and a database user sees
some or all of these views. In addition
18
to hiding details of the logical level of the database, the views also provide a security mechanism to
prevent users from accessing certain parts of the database. For example, clerks in the university
registrar office can see only that part of the database that has information about students; they
cannot access information about salaries of instructors.
Databases change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema. Schemas are changed
infrequently, if at all. The concept of database schemas and instances can be understood by analogy
to a program written in a programming language.
A database schema corresponds to the variable declarations (along with associated type
definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables in a program at a
point in time correspond to an instance of a database schema. Database systems have several
schemas, partitioned according to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas at the
view level, sometimes called subschemas, which describe different views of the database. Of
these, the logical schema is by far the most important, in terms of its effect on application
programs, since programmers construct applications by using the logical schema. The physical
schema is hidden beneath the logical schema, and can usually be changed easily without affecting
application programs. Application programs are said to exhibit physical data independence if
they do not depend on the physical schema, and thus need not be rewritten if the physical schema
changes.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, logical, and view levels.
19
Relational Model. The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and each column has a unique
name. Tables are also known as relations. The relational model is an example of a record-based
model.
Record-based models are so named because the database is structured in fixed-format records of
several types. Each table contains records of a particular type. Each record type defines a fixed
number of fields, or attributes. The columns of the table correspond to the attributes of the record
type. The relational data model is the most widely used data model, and a vast majority of current
database systems are based on the relational model.
Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic objects,
called entities, and relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The
entity- relationship model is widely used in database design.
Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an
object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model combines
features of the object-oriented data model and relational data model.
Semi-structured Data Model. The semi-structured data model permits the specification of
data where individual data items of the same type may have different sets of attributes. This is
in contrast to the data models mentioned earlier, where every data item of a particular type must
have the same set of attributes. The Extensible Markup Language (XML) is widely used to
represent semi-structured data.
Historically, the network data model and the hierarchical data model preceded the relational
data model. These models were tied closely to the underlying implementation, and complicated the
20
task of modeling data. As a result they are used little now, except in old database code that is still in
service in some places.
Database Languages
A database system provides a data-definition language to specify the database schema and a
data-manipulation language to express database queries and updates. In practice, the data- definition
and data-manipulation languages are not two separate languages; instead they simply form parts of a
single database language, such as the widely used SQL language.
Data-Manipulation Language
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a
user does not have to specify how to get the data, the database system has to figure out an efficient
means of accessing data. A query is a statement requesting the retrieval of information. The
portion of a DML that involves information retrieval is called a query language. Although
technically incorrect, it is common practice to use the terms query language and data-manipulation
language synonymously.
We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language. These
statements define the implementation details of the database schemas, which are usually
hidden from the users.
The data values stored in the database must satisfy certain consistency constraints.
For example, suppose the university requires that the account balance of a department must never be
negative. The DDL provides facilities to specify such constraints. The database system checks these
constraints every time the database is updated. In general, a constraint can be an arbitrary predicate
pertaining to the database. However, arbitrary predicates may be costly to test. Thus, database
systems implement integrity constraints that can be tested with minimal overhead.
• Domain Constraints. A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Declaring an attribute to be of a
particular domain acts as a constraint on the values that it can take. Domain constraints are the
most elementary form of integrity constraint. They are tested easily by the system whenever a
new data item is entered into the database.
• Referential Integrity. There are cases where we wish to ensure that a value that appears in one
relation for a given set of attributes also appears in a certain set of attributes in another relation
(referential integrity). For example, the department listed for each course must be one that actually
exists. More precisely, the dept name value in a course record must appear in the dept name
attribute of some record of the department relation.
Database modifications can cause violations of referential integrity. When a referential-
integrity constraint is violated, the normal procedure is to reject the action that caused the
violation.
• Assertions. An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions. However, there are
many constraints that we cannot express by using only these special forms. For example, “Every
department must have at least five courses offered every semester” must be expressed as an
assertion. When an assertion is created, the system tests it for validity. If the assertion is valid, then
any future modification to the database is allowed only if it does not cause that assertion to be
22
violated.
• Authorization. We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows insertion of new data, but not
modification of existing data; update authorization, which allows modification, but not deletion, of
data; and delete authorization, which allows deletion of data. We may assign the user all, none,
or a combination of these types of authorization.
The DDL, just like any other programming language, gets as input some instructions
(statements) and generates some output. The output of the DDL is placed in the data
dictionary,which contains metadata—that is, data about data. The data dictionary is considered
to be a special type of table that can only be accessed and updated by the database system itself
(not a regular user). The database system consults the data dictionary before reading or
modifying actual data.
Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such “data about data” were labeled
metadata. The DBMS data dictionary provides the DBMS with its self describing characteristic. In
effect, the data dictionary resembles and X-ray of the company’s entire data set, and is a crucial
element in the data administration function.
The two main types of data dictionary exist, integrated and stand alone. An integrated data
dictionary is included with the DBMS. For example, all relational DBMSs include a built in data
dictionary or system catalog that is frequently accessed and updated by the RDBMS. Other DBMSs
especially older types, do not have a built in data dictionary instead the DBA may use third party
stand alone data dictionary systems.
Data dictionaries can also be classified as active or passive. An active data dictionary is automatically
updated by the DBMS with every database access, thereby keeping its access information up-to-
date. A passive data dictionary is not updated automatically and usually requires a batch process to
23
be run. Data dictionary access information is normally used by the DBMS for query optimization
purpose.
• Data elements that are define in all tables of all databases. Specifically the data dictionary
stores the name, datatypes, display formats, internal storage formats, and validation rules.
The data dictionary tells where an element is used, by whom it is used and so on.
• Tables define in all databases. For example, the data dictionary is likely to store the name of
the table creator, the date of creation access authorizations, the number of columns, and
so on.
• Indexes define for each database tables. For each index the DBMS stores at least the index
name the attributes used, the location, specific index characteristics and the creation date.
• Define databases: who created each database, the date of creation where the database is located,
who the DBA is and so on.
• End users and The Administrators of the data base
• Programs that access the database including screen formats, report formats application
formats, SQL queries and so on.
• Access authorization for all users of all databases.
• Relationships among data elements which elements are involved: whether the relationship are
mandatory or optional, the connectivity and cardinality and so on.
A primary goal of a database system is to retrieve information from and store new information in the
database. People who work with a database can be categorized as database users or database
administrators.
There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different
types of users.
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs to
24
transfer $50 from account A to account B invokes a program called transfer. This program asks the
teller for the amount of money to be transferred, the account from which the money is to be
transferred, and the account to which the money is to be transferred.
Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands. Analysts who submit queries to explore data in the database fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view summaries
of data in different ways. For instance, an analyst can see total sales by region (for example, North,
South, East, and West), or by product, or by a combination of region and product (that is, total
sales of each product in each region). The tools also permit the analyst to select specific regions,
look at data in more detail (for example, sales by city within a region) or look at the data in less
detail (for example, aggregate products together by category).
Another class of tools for analysts is data mining tools, which help them find certain kinds of patterns in
data. Specialized users are sophisticated users who write specialized database applications that do not
fit into the traditional data-processing framework.
Among these applications are computer-aided design systems, knowledge base and expert systems,
systems that store data with complex data types (for example, graphics data and audio data), and
environment-modeling systems.
Database Architecture:
We are now in a position to provide a single picture (Figure 1.3) of the various components of a
database system and the connections among them.
25
The architecture of a database system is greatly influenced by the underlying computer system on
which the database system runs. Database systems can be centralized, or client-server, where
one server machine executes work on behalf of multiple client machines. Database systems can also
be designed to exploit parallel computer architectures. Distributed databases span multiple
geographically separated machines.
A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the
storage manager and the query processor components. The storage manager is important
because databases typically require a large amount of storage space. The query processor is
important because it helps the database system simplify and facilitate access to data.
26
It is the job of the database system to translate updates and queries written in a nonprocedural
language, at the logical level, into an efficient sequence of operations at the physical level.
Database applications are usually partitioned into two or three parts, as in Figure 1.4. In a two-tier
architecture, the application resides at the client machine, where it invokes database system
functionality at the server machine through query language statements. Application program
interface standards like ODBC and JDBC are used for interaction between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does not
contain any direct database calls. Instead, the client end communicates with an application
server, usually through a forms interface.
The application server in turn communicates with a database system to access data. The business
logic of the application, which says what actions to carry out under what conditions, is embedded in
the application server, instead of being distributed across multiple clients. Three-tier
applications are more appropriate for large applications, and for applications that run on the
WorldWideWeb.
Query Processor:
· DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
· DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give
the same result. The DML compiler also performs query optimization, that is, it picks the lowest
cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Storage Manager:
A storage manager is a program module that provides the interface between the lowlevel data
stored in the database and the application programs and queries submitted to the system. The
storage manager is responsible for the interaction with the file manager. The raw data are stored
on the disk using the file system, which is usually provided by a conventional operating system. The
storage manager translates the various DML statements into low-level file-system commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
· Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
· Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
· File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
· Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
28
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
Transaction Manager:
Entity-Relationship Model
Design Process
Modeling
Constraints
E-R Diagram
Design Issues
Database Design
UML
Modeling
An entity set is a set of entities of the same type that share the same
properties.
Relationship Sets
Example:
Relationship sets that involve two entity sets are binary (or degree
two). Generally, most relationship sets in a database system are
binary.
Relationships between more than two entity sets are rare. Most relationships are binary.
Attributes
Attribute types:
Derived attributes
31
Composite Attributes
One to one
One to many
Many to one
32
Many to many
One employee is assigned with only one parking space and one parking space is assigned to only one
employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1)
One organization can have many employees , but one employee works in only one organization. Hence
it is a 1:N relationship and cardinality is One-To-Many (1:N)
One employee works in only one organization But one organization can have many employees. Hence
it is a M:1 relationship and cardinality is Many-to-One (M :1)
One student can enroll for many courses and one course can be enrolled by many students. Hence it is a M:N
relationship and cardinality is Many-to-Many (M:N)
Design Issues
4. The primary key of a weak entity set is formed by the primary key
of the strong entity set on which the weak entity set is existence
dependent, plus the weak entity set’s discriminator.
Generalization
A bottom-up design process – combine a number of entity sets that share the same
features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are
represented in an E-R diagram in the sameway.
The terms specialization and generalization are used interchangeably.
Can have multiple specializations of an entity set based on different
features.
The relational data model was introduced by C. F. Codd in 1970. Currently, it is the most
widely used data model. The relational data model describes the world as “a collection of
inter-related relations (or tables).” A relational data model involves the use of data tables
that collect groups of elements into relations. These models work based on the idea that
each table setup will include a primary key or identifier. Other tables use that identifier to
provide "relational" data links and results.
Today, there are many commercial Relational Database Management System (RDBMS),
such as Oracle, IBM DB2, and Microsoft SQL Server. There are also many free and open-
source RDBMS, such as MySQL, mSQL (mini-SQL) and the embedded Java DB (Apache
Derby). Database administrators use Structured Query Language (SQL) to retrieve data
elements from a relational database.
As mentioned, the primary key is a fundamental tool in creating and using relational data
models. It must be unique for each member of a data set. It must be populated for all
members. Inconsistencies can cause problems in how developers retrieve data. Other
issues with relational database designs include excessive duplication of data, faulty or
partial data, or improper links or associations between tables. A large part of routine
database administration involves evaluating all the data sets in a database to make sure
that they are consistently populated and will respond well to SQL or any other data
retrieval method.
• Eliminate Data Redundancy: the same piece of data shall not be stored in more
than one place. This is because duplicate data not only waste storage spaces but also
easily lead to inconsistencies.
• Ensure Data Integrity and Accuracy: is the maintenance of, and the assurance of
the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect
to the design, implementation, and usage of any system which stores, processes, or
retrieves data.
• The standard database access language called structured query language (SQL)
Relational databases go together with the development of SQL. The simplicity of SQL -
where even a novice can learn to perform basic queries in a short period of time - is a large
part of the reason for the popularity of the relational model.
The two tables below relate to each other through the product code field. Any two tables
can relate to each other simply by creating a field they have in common.
Table 1
3804 1 A416 15
3804 2 C923 24
• Relations and attributes − The various tables and attributes related to each table
are identified. The tables represent entities, and the attributes represent the
properties of the respective entities.
• Primary keys − The attribute or set of attributes that help in uniquely identifying a
record is identified and assigned as the primary key.
• Relationships −The relationships between the various tables are established with
the help of foreign keys. Foreign keys are attributes occurring in a table that are
primary keys of another table. The types of relationships that can exist between the
relations (tables) are One to one, One to many, and Many to many
The main advantages of relational databases are that they enable users to easily categorize
and store data that can later be queried and filtered to extract specific information for
reports. Relational databases are also easy to extend and aren't reliant on the physical
organization. After the original database creation, a new data category can be added
without all existing applications being modified.
Other Advantages
Normalization
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
Suppose for a new admission, until and unless a student opts for a branch, data of the
student cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information
will be repeated for all those 100 students.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department?
In that case all the student records will have to be updated, and if by mistake we miss any
record, it will lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student information
and Branch information. Hence, at the end of the academic year, if student records are
deleted, we will also lose the branch information. This is Deletion anomaly.
Normalization Rule
For a table to be in the First Normal Form, it should follow the following 4 rules:
In the next tutorial, we will discuss about the First Normal Form in details.
To understand what is Partial Dependency and how to normalize a table to 2nd normal for,
jump to the Second Normal Form tutorial.
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not
have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF,
following conditions must be satisfied:
To learn about BCNF in detail with a very easy to understand example, head to Boye-Codd
Normal Form tutorial.
Here is the Fourth Normal Form tutorial. But we suggest you to understand other normal
forms before you head over to the fourth normal form.
In this tutorial we will learn about the 1st(First) Normal Form which is more like the Step 1
of the Normalization process. The 1st Normal form expects you to design your table in such
a way that it can easily be extended and it is easier for you to retrieve data from it
whenever required.
In our last tutorial we learned and understood how data redundancy or repetition can lead
to several issues like Insertion, Deletion and Updation anomalies and
how Normalization can reduce data redundancy and make the data more meaningful.
Each column of your table should be single valued which means they should not contain
multiple values. We will explain this with help of an example later, let's see the other rules
for now.
This is more of a "Common Sense" rule. In each column the values stored must be of the
same kind or type.
For example: If you have a column dob to save date of births of a set of people, then you
cannot or you must not save 'names' of some of them in that column along with 'date of
birth' of others in that column. It should hold only 'date of birth' for all the records/rows.
This rule expects that each column in a table should have a unique name. This is to avoid
confusion at the time of retrieving data or performing any other operation on the stored
data.
If one or more columns have same name, then the DBMS system will be left confused.
This rule says that the order in which you store the data in your table doesn't matter.
When existence of one or more rows in a table implies one or more other rows in the same
table, then the Multi-valued dependencies occur.
->->
Example
DeptId = Department ID
DeptName = Department Name
The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute.
This is because if you want to know the department name, then at first you need to have
the DeptId.
DeptId DeptName
001 Finance
002 Marketing
003 HR
Therefore, the above functional dependency between DeptId and DeptName can be
determined as DeptId is functionally dependent on DeptName −
A ->B
Example
We are considering the same <Department> table with two attributes to understand the
concept of trivial dependency.
A ->B
Example
A ->B
The property suggests rules that hold true if the following are satisfied:
• Transitivity
If A->B and B->C, then A->C i.e. a transitive relation.
• Reflexivity
A-> B, if B is a subset of A.
• Augmentation
The last rule suggests: AC->BC, if A->B
• The words normalization and normal form refer to the structure of a database.
Example: Let's assume, a school can store the data of teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
TEACHER_SUBJECT table:
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must
be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
Example
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
The main objectives of database designing are to produce logical and physical designs
models of the proposed database system.
The logical model concentrates on the data requirements and the data to be stored
independent of physical considerations. It does not concern itself with how the data will be
stored or where it will be stored physically.
The physical data design model involves translating the logical design of the database onto
physical media using hardware resources and software systems such as database
management systems (DBMS).
Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data models
used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This model was initially described
by Edgar F. Codd, in 1969. The relational data model is the widely used model which is
primarily used by commercial data processing applications.
4) Semistructured Data Model: This type of data model is different from the other three
data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have different
attributes sets. The Extensible Markup Language, also known as XML, is widely used for
representing the semistructured data. Although XML was initially designed for including
the markup information to the text document, it gains importance because of its application
in the exchange of data.
TEMPORAL DATA
Temporal databases support managing and accessing temporal data by providing one or
more of the following features:[1][2]
• A time period datatype, including the ability to represent time periods with no end
(infinity or forever)
• The ability to define valid and transaction time period attributes and bitemporal
relations
• System-maintained transaction time
• Temporal primary keys, including non-overlapping period constraints
• Temporal constraints, including non-overlapping uniqueness and referential
integrity
• Update and deletion of temporal records with automatic splitting and coalescing of
time periods
• Temporal queries at current time, time points in the past or future, or over
durations
• Predicates for querying time periods, often based on Allen’s interval relations
UNIT III
Database System Architecture:
➢ Centralized and Client-Server architecture
➢ Server system architecture
➢ Parallel systems
➢ Distributed systems
Network types. Parallel databases:
➢ I/O parallelism
➢ Interquery Parallelism
Distributed Databases:
➢ Homogeneous and Heterogeneous databases
➢ Distributed Data storage
➢ Distributed transactions
➢ Distributed query processing.
Database System Architecture
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get
their request done.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing
and transaction management.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
Architectures for DBMSs have followed trends similar to those for general computer
system architectures. Earlier architectures used mainframe computers to provide the main
processing for all system functions, including user application programs and user interface
programs, as well as all the DBMS functionality. The reason was that most users accessed
such systems via computer terminals that did not have processing power and only
provided display capabilities. Therefore, all processing was performed remotely on the
computer system, and only display information and controls were sent from the computer
to the display terminals, which were connected to the central computer via various types of
communications networks.
As prices of hardware declined, most users replaced their terminals with PCs and
workstations. At first, database systems used these computers similarly to how they had
used display terminals, so that the DBMS itself was still a centralized DBMS in which all
the DBMS functionality, application program execution, and user inter-face processing
were carried out on one machine. Figure 2.4 illustrates the physical components in a
centralized architecture. Gradually, DBMS systems started to exploit the available
processing power at the user side, which led to client/server DBMS architectures.
2. Client/Server Architectures
Web servers, e-mail servers, and other software and equipment are connected via a
network. The idea is to define specialized servers with specific functionalities. For
example, it is possible to connect a number of PCs or small workstations as clients to a file
server that maintains the files of the client machines. Another machine can be designated
as a printer server by being connected to various printers; all print requests by the clients
are forwarded to this machine. Web servers or e-mail servers also fall into the specialized
server category. The resources provided by specialized servers can be accessed by many
client machines. The client machines provide the user with the appropriate interfaces to
utilize these servers, as well as with local processing power to run local applications. This
concept can be carried over to other software packages, with specialized programs—such
as a CAD (computer-aided design) package—being stored on specific server machines and
being made accessible to multiple clients. Figure 2.5 illustrates client/server architecture at
the logical level; Figure 2.6 is a simplified diagram that shows the physical architecture.
Some machines would be client sites only (for example, diskless work-stations or
workstations/PCs with disks that have only client software installed).
Other machines would be dedicated servers, and others would have both client and server
functionality.
PARALLEL SYSTEMS
Companies need to handle huge amount of data with high data transfer rate. The client
server and centralized system is not much efficient. The need to improve the efficiency
gave birth to the concept of Parallel Databases.
It also performs many parallelization operations like, data loading and query processing.
DISTRIBUTED SYSTEMS
• Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
• Data is physically stored across multiple sites. Data in each site can be managed by
a DBMS independent of the other sites.
• The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
NETWORK TYPES
Network database management systems (Network DBMSs) are based on a network data
model that allows each record to have multiple parents and multiple child records. A
network database allows flexible relationship model between entities.
There are several types of database management systems such as relational, network,
graph, and hierarchical.
The following diagram represents a network data model that shows that the Stores entity
has relationship with multiple child entities and the Transactions entity has relationships
with multiple parent entities. In other words, a network database model allows one parent
to have multiple child record sets and each record set can be linked to multiple nodes
(parents) and children.
The network model was developed and presented by Charles Bachman in 1969. The
network model often used to build computer network systems and is an enhancement to
the hierarchical database model. Learn more here - What are hierarchical databases
The key advantage of a network database model is its supports many-to-many relationship
and hence provides greater flexibility and accessibility. The result is a faster data access,
search, and navigation.
Introduction
I/O PARALLELISM
Reduce the time required to retrieve relations from disk by partitioning the relations on
multiple disks.
• Horizontal partitioning – tuples of a relation are divided among many disks such that each
tuple resides on one disk.
• Partitioning techniques (number of disks = n): Round-robin : Send the ith tuple inserted
in the relation to disk i mod n.
Hash partitioning :
– Choose one or more attributes as the partitioning attributes.
– Choose hash function h with range 0 ...n − 1.
– Let i denote result of hash function h applied to the partitioning attribute value of a tuple.
Send tuple to disk i
Range partitioning :
– Choose an attribute as the partitioning attribute.
– A partitioning vector [v0, v1,...,vn−2] is chosen
– Let v be the partitioning attribute value of a tuple.
Tuples such that vi ≤ v < vi+1 go to disk i + 1.
Tuples with v < v0 go to disk 0 and tuples with v ≥ vn−2 go to disk n − 1. E.g., with a
partitioning vector [5,11], a tuple with partitioning attribute value of 2 will go to disk 0, a
tuple with value 8 will go to disk 1, while a tuple with value 20 will go to disk 2.
Evaluate how well partitioning techniques support the following types of data access:
1. Scanning the entire relation.
2. Locating a tuple associatively – point queries. – E.g., r.A = 25.
3. Locating all tuples such that the value of a given attribute lies within a specified range –
range queries. – E.g., 10 ≤ r.A < 25
Round-robin.
– Best suited for sequential scan of entire relation on each query.
∗ All disks have almost an equal number of tuples; retrieval work is thus well balanced
between disks.
– Range queries are difficult to process
∗ No clustering – tuples are scattered across all disks
Hash partitioning.
– Good for sequential access
∗ Assuming hash function is good, and partitioning attributes form a key, tuples will be
equally distributed between disks
∗ Retrieval work is then well balanced between disks.
– Good for point queries on partitioning attribute
∗ Can lookup single disk, leaving others available for answering other queries.
∗ Index on partitioning attribute can be local to disk, making lookup and update more
efficient
– No clustering, so difficult to answer range queries
Range partitioning.
Interquery Parallelism
Intraquery Parallelism
DISTRIBUTED DATABASES:
Features
• Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
• Data is physically stored across multiple sites. Data in each site can be managed by
a DBMS independent of the other sites.
• The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
Features
• It is used in application areas where large volumes of data are processed and
accessed by numerous users simultaneously.
• Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This demands
common databases or replicated databases that should be used in a synchronized
manner.
• Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may
have common data. Distributed database systems aid both these processing by
providing synchronized data.
More Reliable − In case of database failures, the total system of centralized databases
comes to a halt. However, in distributed systems, when a component fails, the functioning
of the system continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be
met from local data itself, thus providing faster response. On the other hand, in centralized
systems, all queries have to pass through the central computer for processing, which
increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally
where it is mostly used, then the communication costs for data manipulation can be
minimized. This is not feasible in centralized systems.
• Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the
several sites.
• Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
In a homogeneous distributed database, all the sites use identical DBMS and operating
systems. Its properties are −
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user
requests.
• Autonomous − Each database is independent that functions on its own. They are
integrated by a controlling application and use message passing to share data
updates.
• A site may not be aware of other sites and so there is limited co-operation in
processing user requests.
• Distribution − It states the physical distribution of data across the different sites.
• Autonomy − It indicates the distribution of control of the database system and the
degree to which each constituent DBMS can operate independently.
Architectural Models
Some of the common architectural models are −
This is a two-level architecture where the functionality is divided into servers and clients.
The server functions primarily encompass data management, query processing,
optimization and transaction management. Client functions include mainly user interface.
However, they have some functions like consistency checking and transaction
management.
In these systems, each peer acts both as a client and a server for imparting database
services. The peers share their resource with other peers and co-ordinate their activities.
• Local database Conceptual Level − Depicts local data organization at each site.
• Local database Internal Level − Depicts physical data organization at each site.
The distribution design alternatives for the tables in a DDBMS are as follows −
• Fully replicated
• Partially replicated
• Fragmented
• Mixed
In this design alternative, different tables are placed at different sites. Data is placed so
that it is at a close proximity to the site where it is used most. It is most suitable for
database systems where the percentage of queries needed to join information in tables
placed at different sites is low. If an appropriate distribution strategy is adopted, then this
design alternative helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of
queries is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the
fact that the frequency of accessing the tables vary considerably from site to site. The
number of copies of the tables (or portions) depends on how frequently the access queries
execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or
partitions, and each fragment can be stored at different sites. This considers the fact that it
seldom happens that all data stored in a table is required at a given site. Moreover,
fragmentation increases parallelism and provides better disaster recovery. Here, there is
only one copy of each fragment in the system, i.e. no redundant data.
• Vertical fragmentation
• Horizontal fragmentation
• Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are
initially fragmented in any form (horizontal or vertical), and then these fragments are
partially replicated across the different sites according to the frequency of accessing the
fragments.
DISTRIBUTED TRANSACTIONS
Distributed transaction management deals with the problems of always providing a
consistent distributed database in the presence of a large number of transactions (local
and global) and failures (communication link and/or site failures). This is accomplished
through
(i) distributed commit protocols that guarantee atomicity property;
(ii) distributed concurrency control techniques to ensure consistency and
isolation properties; and
iii) distributed recovery methods to preserve consistency and durability when
failures occur.
A transaction is a sequence of actions on a database that forms a basic unit of reliable and
consistent computing, and satisfies the ACID property. In a distributed database system
(DDBS), transactions may be local or global. In local transactions, the actions access and
update data in a single site only, and hence it is straightforward to ensure the ACID
property.
Distributed query processing is the procedure of answering queries (which means mainly
read operations on large data sets) in a distributed environment where data is managed
at multiple sites in a computer network. Query processing involves the transformation of
a high-level query (e.g., formulated in SQL) into a query execution plan (consisting of
lower-level query operators in some variation of relational algebra) as well as the
execution of this plan. The goal of the transformation is to produce a plan which is
equivalent to the original query (returning the same result) and efficient, i.e., to minimize
resource consumption like total costs or response time.
Distributed Query Processing Architecture
The process of mapping global queries to local ones can be realized as follows −
• The tables required in a global query have fragments distributed across multiple
sites. The local databases have information only about local data. The controlling
site uses the global data dictionary to gather information about the distribution and
reconstructs the global view from the fragments.
• If there is no replication, the global optimizer runs local queries at the sites where
the fragments are stored. If there is replication, the global optimizer selects the site
based upon communication cost, workload, and server speed.
• The global optimizer generates a distributed execution plan so that least amount of
data transfer occurs across the sites. The plan states the location of the fragments,
order in which query steps needs to be executed and the processes involved in
transferring intermediate results.
• The local queries are optimized by the local database servers. Finally, the local
query results are merged together through union operation in case of horizontal
fragments and join operation for vertical fragments.
For example, let us consider that the following Project schema is horizontally fragmented
according to City, the cities being New Delhi, Kolkata and Hyderabad.
Distributed query optimization requires evaluation of a large number of query trees each
of which produce the required results of a query. This is primarily due to the presence of
large amount of replicated and fragmented data. Hence, the target is to find an optimal
solution instead of the best solution.
• Query trading.
A distributed system has a number of database servers in the various sites to perform the
operations pertaining to a query. Following are the approaches for optimal resource
utilization −
Operation Shipping − In operation shipping, the operation is run at the site where the
data is stored and not at the client site. The results are then transferred to the client site.
This is appropriate for operations where the operands are available at the same site.
Example: Select and Project operations.
Data Shipping − In data shipping, the data fragments are transferred to the database
server, where the operations are executed. This is used in operations where the operands
are distributed at different sites. This is also appropriate in systems where the
communication costs are low, and local processors are much slower than the client server.
Hybrid Shipping − This is a combination of data and operation shipping. Here, data
fragments are transferred to the high-speed processors, where the operation runs. The
results are then sent to the client site.
Query Trading
In query trading algorithm for distributed database systems, the controlling/client site for
a distributed query is called the buyer and the sites where the local queries execute are
called sellers. The buyer formulates a number of alternatives for choosing sellers and for
reconstructing the global results. The target of the buyer is to achieve the optimal cost.
The algorithm starts with the buyer assigning sub-queries to the seller sites. The optimal
plan is created from local optimized query plans proposed by the sellers combined with
the communication cost for reconstructing the final result. Once the global optimal plan is
formulated, the query is executed.
Optimal solution generally involves reduction of solution space so that the cost of query
and data transfer is reduced. This can be achieved through a set of heuristic rules, just as
heuristics in centralized systems.
• Perform selection and projection operations as early as possible. This reduces the
data flow over communication network.
• Use semi-join operation to qualify tuples that are to be joined. This reduces the
amount of data transfer which in turn reduces communication cost.
1. DATA INTEGRITY
2. CREATING AND MAINTAINING TABLES
3. INDEXES
4. SEQUENCES
5. VIEWS
6. USERS, PRIVILEGES, AND ROLES
7. SYNONYMS
DATA INTEGRITY
Data integrity is normally enforced in a database system by a series of integrity constraints
or rules.
empname VARCHAR2(80),
empno INTEGER,
deptno INTEGER
);
Create constraint to enforce rule that all values in department table are unique:
Create constraint to enforce rule that every employee must work for a valid department:
Now, whenever you insert an employee record into emp_tab, Oracle Database checks that
its deptno value appears in dept_tab
DESCRIBE DEPARTMENTS;
Result:
MANAGER_ID NUMBER(6)
LOCATION_ID NUMBER(4)
Result:
ERROR at line 4:
A table can have at most one primary key, but that key can have multiple columns (that is, it can be a
composite key). To designate a primary key, use the PRIMARY KEY constraint.
UNIQUE Constraints
Use a UNIQUE constraint (which designates a unique key) on any column or combination of columns
(except the primary key) where duplicate non-NULL values are not allowed. For example:
shows a table with a UNIQUE constraint, a row that violates the constraint, and a row that satisfies it.
Designate one table as the referenced or parent table and the other as the dependent or child table. In
the parent table, define either a PRIMARY KEY or UNIQUE constraint on the shared columns. In the child
table, define a FOREIGN KEY constraint on the shared columns. The shared columns now comprise
a foreign key. Defining additional constraints on the foreign key affects the parent-child relationship
Figure 5-2 Rows That Violate and Satisfy a FOREIGN KEY Constraint
SYNTAX:
CREATE TABLE table_name
( field1 datatype [ NOT NULL ],
field2 datatype [ NOT NULL ],
field3 datatype [ NOT NULL ]...)
A simple example of a CREATE TABLE statement follows.
INPUT/OUTPUT:
SQL> CREATE TABLE BILLS (
2 NAME CHAR(30),
3 AMOUNT NUMBER,
4 ACCOUNT_ID NUMBER);
Table created.
ANALYSIS:
This statement creates a table named BILLS. Within the BILLS table are three
fields: NAME, AMOUNT, and ACCOUNT_ID. The NAME field has a data type of character
and can store strings up to 30 characters long. The AMOUNT and ACCOUNT_ID fields
can contain number values only.
When creating a table, several constraints apply when naming the table. First, the
table name can be no more than 30 characters long. Because Oracle is case
insensitive, you can use either uppercase or lowercase for the individual
characters. However, the first character of the name must be a letter
between A and Z. The remaining characters can be letters or the symbols _, #, $,
and @. Of course, the table name must be unique within its schema. The name
also cannot be one of the Oracle or SQL reserved words (such as SELECT).
The same constraints that apply to the table name also apply to the field name.
However, a field name can be duplicated within the database. The restriction is
that the field name must be unique within its table.
If you have ever programmed in any language, you are familiar with the concept
of data types, or the type of data that is to be stored in a specific field. For
instance, a character data type constitutes a field that stores only character string
data
SYNTAX:
CREATE TABLE NEW_TABLE(FIELD1, FIELD2, FIELD3)
AS (SELECT FIELD1, FIELD2, FIELD3
FROM OLD_TABLE <WHERE...>
EXAMPLE:
SQL> CREATE TABLE NEW_BILLS(NAME, AMOUNT, ACCOUNT_ID)
2 AS (SELECT * FROM BILLS WHERE AMOUNT < 50);
Table created.
SYNTAX:
ALTER TABLE table_name
<ADD column_name data_type; |
MODIFY column_name data_type;>
The following command changes the NAME field of the BILLS table to hold 40
characters:
SQL> ALTER TABLE BILLS
2 MODIFY NAME CHAR(40);
Table altered.
SQL provides a command to completely remove a table from a database. The DROP
TABLE command deletes a table along with all its associated views and indexes.
SYNTAX:
DROP TABLE table_name;
EXAMPLE:
SQL> DROP TABLE NEW_BILLS;
Table dropped.
INDEXES
Data can be retrieved from a database using two methods. The first method,
often called the Sequential Access Method, requires SQL to go through each
record looking for a match. This search method is inefficient. Adding indexes to
your database enables SQL to use the Direct Access Method. SQL uses a treelike
structure to store and retrieve the index's data. Pointers to a group of data are
stored at the top of the tree. These groups are called nodes. Each node contains
pointers to other nodes. The nodes pointing to the left contain values that are
less than its parent node. The pointers to the right point to values greater than
the parent node.
The database system starts its search at the top node and simply follows the
pointers until it is successful.
SYNTAX:
CREATE [UNIQUE | DISTINCT] [CLUSTER] INDEX index_name
ON table_name (column_name [ASC | DESC],
column_name [ASC | DESC]...)
Notice that all of these implementations have several things in common, starting
with the basic statement
CREATE INDEX index_name ON table_name (column_name, ...)
EXAMPLE:
SQL> SELECT * FROM BILLS;
NAME AMOUNT ACCOUNT_ID
Phone Company 125 1
Power Company 75 1
Record Club 25 2
Software Company 250 1
Cable TV Company 35 3
Joe's Car Palace 350 5
S.C. Student Loan 200 6
Florida Water Company 20 1
U-O-Us Insurance Company 125 5
Debtor's Credit Card 35 4
10 rows selected.
Index created.
10 rows selected.
The BILLS table is sorted by the ACCOUNT_ID field until the index is dropped using
the DROP INDEX statement. As usual, the DROP INDEX statement is very
straightforward:
SYNTAX:
SQL> DROP INDEX index_name;
EXAMPLE:
SQL> DROP INDEX ID_INDEX;
Index dropped.
SEQUENCES
Sequences are database objects from which multiple users can generate unique
integers. You can use sequences to automatically generate primary key values.
Creating Sequences
To create a sequence in your schema, you must have the CREATE SEQUENCE system
privilege. To create a sequence in another user's schema, you must have
the CREATE ANY SEQUENCE privilege.
The CACHE option pre-allocates a set of sequence numbers and keeps them in
memory so that sequence numbers can be accessed faster.
Altering Sequences
To alter a sequence, your schema must contain the sequence, or you must have
the ALTER ANY SEQUENCE system privilege. You can alter a sequence to change any of
the parameters that define how it generates sequence numbers except the
sequence's starting number. To change the starting point of a sequence, drop the
sequence and then re-create it.
Alter a sequence using the ALTER SEQUENCE statement. For example, the following
statement alters the emp_sequence:
ALTER SEQUENCE emp_sequence
INCREMENT BY 10
MAXVALUE 10000
CYCLE
CACHE 20;
Dropping Sequences
You can drop any sequence in your schema. To drop a sequence in another
schema, you must have the DROP ANY SEQUENCE system privilege. If a sequence is no
longer required, you can drop the sequence using the DROP SEQUENCE statement.
For example, the following statement drops the order_seq sequence:
DROP SEQUENCE order_seq;
VIEWS
A view is a logical representation of another table or combination of tables. A view derives its data from
the tables on which it is based. These tables are called base tables. Base tables might in turn be actual
tables or might be views themselves. All operations performed on a view actually affect the base table of
the view. You can use views in almost the same way as tables. You can query, update, insert into, and
delete from views, just as you can standard tables.
Views can provide a different representation (such as subsets or supersets) of the data that resides
within other tables and views. Views are very powerful because they allow you to tailor the presentation
of data to different types of users.
Creating Views
To create a view, you must meet the following requirements:
To create a view in your schema, you must have the CREATE VIEW privilege
The following statement creates a view on a subset of data in the emp table:
CREATE VIEW sales_staff AS SELECT empno, ename, deptno FROM emp WHERE deptno = 10
Join Views
You can also create views that specify more than one base table or view in the FROM clause. These are
called join views. The following statement creates the division1_staff view that joins data from
the emp and dept tables:
CREATE VIEW division1_staff AS SELECT ename, empno, job, dname FROM emp, dept
Oracle relies on a mechanism that allows you to register a person, called user.
Each registered user has an access password , which must be provided in various
situations. Each user is then assigned individual privileges or roles.
The types of users and their roles and responsibilities depend on the database
site. A small site can have one database administrator who administers the
database for application developers and users
ERROR: ORA-01045:
user JOHN lacks CREATE SESSION privilege; logon denied
To enable the user john to log in, you need to grant the CREATE SESSION system
privilege to the user john by using the following statement:
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
PRIVILEGES
A user privilege is a right to execute a particular type of SQL statement, or a right
to access another user's object. The types of privileges are defined by Oracle.
System Privileges
There are over 100 distinct system privileges. Each system privilege allows a
user to perform a particular database operation or class of database operations.
Object Privileges
You can specify ALL [PRIVILEGES] to grant or revoke all available object privileges
for an object. ALL is not a privilege; rather, it is a shortcut, or a way of granting or
revoking all object privileges with one word in GRANT and REVOKE statements
USER ROLES
A role groups several privileges and roles, so that they can be granted to and
revoked from users simultaneously. A role must be enabled for a user before it
can be used by the user.
You can grant system privileges and roles to other users and roles using
the GRANT statement. The following privileges are required:
• To grant a system privilege, you must have been granted the system
privilege with the ADMIN OPTION or have been granted the GRANT ANY
PRIVILEGE system privilege.
• To grant a role, you must have been granted the role with the ADMIN
OPTION or have been granted the GRANT ANY ROLE system privilege.
The following statement grants the system privilege CREATE SESSION and
the accts_pay role to the user jward:
The following statement grants the SELECT, INSERT, and DELETE object privileges for
all columns of the emp table to the users jfee and tsmith:
GRANT SELECT, INSERT, DELETE ON emp TO jfee, tsmith;
To grant all object privileges on the salary view to the user jfee, use
the ALL keyword, as shown in the following example:
GRANT ALL ON salary TO jfee;
You can revoke system privileges and roles using the SQL statement REVOKE.
Any user with the ADMIN OPTION for a system privilege or role can revoke the
privilege or role from any other database user or role. The revoker does not have
to be the user that originally granted the privilege or role. Users with GRANT ANY
ROLE can revoke any role.
The following statement revokes the CREATE TABLE system privilege and
the accts_rec role from tsmith:
REVOKE CREATE TABLE, accts_rec FROM tsmith;
You can only revoke the privileges that you, the grantor, directly authorized, not
the grants made by other users to whom you granted the GRANT OPTION.
However, there is a cascading effect. The object privilege grants propagated
using the GRANT OPTION are revoked if a grantor's object privilege is revoked.
Assuming you are the original grantor, the following statement revokes
the SELECT and INSERT privileges on the emp table from the users jfee and tsmith:
REVOKE SELECT, insert ON emp FROM jfee, tsmith;
The following statement revokes all object privileges for the dept table that you
originally granted to the human_resource role
REVOKE ALL ON dept FROM human_resources;
SYNONYMS
A synonym is an alias for a schema object. Synonyms can provide a level of
security by masking the name and owner of an object and by providing location
transparency for remote objects of a distributed database. Also, they are
convenient to use and reduce the complexity of SQL statements for database
users.
Synonyms allow underlying objects to be renamed or moved. You can create both
public and private synonyms.
A public synonym is owned by the special user group named PUBLIC and is
accessible to every user in a database. A private synonym is contained in the
schema of a specific user and available only to the user and the user's grantees.
Creating Synonyms
To create a private synonym in your own schema, you must have the CREATE
SYNONYM privilege. To create a private synonym in another user's schema, you
must have the CREATE ANY SYNONYM privilege. To create a public synonym, you must
have the CREATE PUBLIC SYNONYM system privilege.
Create a synonym using the CREATE SYNONYM statement. The underlying schema
object need not exist, nor do you need privileges to access the object. The
following statement creates a public synonym named public_emp on the emp table
contained in the schema of jward:
CREATE PUBLIC SYNONYM public_emp FOR jward.emp;
Dropping Synonyms
You can drop any private synonym in your own schema. To drop a private
synonym in another user's schema, you must have the DROP ANY SYNONYM system
privilege. To drop a public synonym, you must have the DROP PUBLIC
SYNONYM system privilege.
Drop a synonym that is no longer required using DROP SYNONYM statement. To
drop a private synonym, omit the PUBLIC keyword. To drop a public synonym,
include the PUBLIC keyword.
For example, the following statement drops the private synonym named emp:
1. PL/SQL
2. TRIGGERS
3. STORED PROCEDURE AND FUNCTIONS
4. PACKAGE
5. CURSORS
6. TRANSACTIONS
PL/SQL
PL/SQL, the Oracle procedural extension of SQL, is a completely portable, high-performance
transaction-processing language.
PL/SQL combines the data-manipulating power of SQL with the processing power of procedural
languages.
When a problem can be solved using SQL, you can issue SQL statements from your PL/SQL
programs, without learning new APIs.
Like other procedural programming languages, PL/SQL lets you declare constants and variables,
control program flow, define subprograms, and trap run-time errors.
You can break complex problems into easily understandable subprograms, which you can reuse in
multiple applications.
PL/SQL Blocks
The basic unit of a PL/SQL source program is the block, which groups related declarations and
statements.
A PL/SQL block is defined by the keywords DECLARE, BEGIN, EXCEPTION, and END. These keywords
partition the block into a declarative part, an executable part, and an exception-handling part. Only
the executable part is required.
Declarations are local to the block and cease to exist when the block completes execution, helping
to avoid cluttered namespaces for variables and subprograms.
END;
Declaring PL/SQL Variables
A PL/SQL variable can have any SQL data type (such as CHAR, DATE, or NUMBER) or a PL/SQL-only
data type (such as BOOLEAN or PLS_INTEGER).
Example 1-2 declares several PL/SQL variables. One has a PL/SQL-only data type; the others have
SQL data types.
wages NUMBER;
country VARCHAR2(128);
counter NUMBER := 0;
done BOOLEAN;
country := 'France';
country := UPPER('Canada');
END;
%TYPE Attribute
The %TYPE attribute provides the data type of a variable or database column. This is particularly useful when
declaring variables that will hold database values. For example, assume there is a column
named last_name in a table named employees. To declare a variable named v_last_name that has the
same data type as column last_name, use dot notation and the %TYPE attribute, as follows:
v_last_name employees.last_name%TYPE;
%ROWTYPE Attribute
In PL/SQL, records are used to group data. A record consists of a number of related fields in which data values
can be stored. The %ROWTYPE attribute provides a record type that represents a row in a table. The record can
store an entire row of data selected from the table or fetched from a cursor or cursor variable. See Cursors.
Columns in a row and corresponding fields in a record have the same names and data types. In the following
example, you declare a record named dept_rec, whose fields have the same names and data types as the
columns in the departments table:
v_deptid := dept_rec.department_id;
Conditional Control
Iterative Control
LOOP statements let you execute a sequence of statements multiple times. You place the
keyword LOOP before the first statement in the sequence and the keywords END LOOP after the last
statement in the sequence. The following example shows the simplest kind of loop, which repeats a
sequence of statements continually:
LOOP
-- sequence of statements
END LOOP;
TRIGGERS
A trigger is a named program unit that is stored in the database and fired (executed) in
response to a specified event. The specified event is associated with either a table, a view, a
schema, or the database, and it is one of the following:
Trigger Types
A DML trigger is fired by a DML statement, a DDL trigger is fired by a DDL statement,
a DELETE trigger is fired by a DELETE statement, and so on.
An INSTEAD OF trigger is a DML trigger that is defined on a view (not a table). The database fires
the INSTEAD OF trigger instead of executing the triggering DML statement. For more information,
see Modifying Complex Views (INSTEAD OF Triggers).
A system trigger is defined on a schema or the database. A trigger defined on a schema fires for each event
associated with the owner of the schema (the current user). A trigger defined on a database fires for each event
associated with all users.
A simple trigger can fire at exactly one of the following timing points:
A compound trigger can fire at more than one timing point. Compound triggers make it easier to program an
approach where you want the actions you implement for the various timing points to share common data. For
more information, see Compound Triggers.
Trigger States
A trigger can be in either of two states:
Enabled. An enabled trigger executes its trigger body if a triggering statement is entered and the trigger
restriction (if any) evaluates to TRUE.
Disabled. A disabled trigger does not execute its trigger body, even if a triggering statement is entered and the
trigger restriction (if any) evaluates to TRUE.
Uses of Triggers
Triggers supplement the standard capabilities of your database to provide a highly customized
database management system. For example, you can use triggers to:
// TRIGGER PROGRAM
STORED PROCEDURE AND FUNCTION
A stored procedure is a prepared SQL code that you can save, so the code can be reused
over and over again.
So if you have an SQL query that you write over and over again, save it as a stored
procedure, and then just call it to execute it.
You can also pass parameters to a stored procedure, so that the stored procedure can act
based on the parameter value(s) that is passed.
//Example program
FUNCTION
A function is same as procedure ,but it returns a value.
The CREATE FUNCTION statement creates or replaces a standalone stored function or a call
specification.
A standalone stored function is a function (a subprogram that returns a single value) that is stored
in the database.
Creating a Function: Examples The following statement creates the function get_bal on the
sample table oe.orders:
RETURN NUMBER
IS acc_bal NUMBER(11,2);
BEGIN
SELECT order_total
INTO acc_bal
FROM orders
WHERE customer_id = acc_no;
RETURN(acc_bal);
END;
When you call the function, you must specify the argument acc_no, the number of the account
whose balance is sought. The data type of acc_no is NUMBER.
The function returns the account balance. The RETURN clause of the CREATE FUNCTION statement
specifies the data type of the return value to be NUMBER.
The function uses a SELECT statement to select the balance column from the row identified by the
argument acc_no in the orders table. The function uses a RETURN statement to return this value to
the environment in which the function is called.
The function created in the preceding example can be used in a SQL statement. For example:
GET_BAL(165)
------------
2519
PL/SQL PACKAGE
A package is a schema object that groups logically related PL/SQL types, variables, and
subprograms. Packages usually have two parts, a specification ("spec") and a body; sometimes the
body is unnecessary.
The specification is the interface to the package. It declares the types, variables, constants,
exceptions, cursors, and subprograms that can be referenced from outside the package. The body
defines the queries for the cursors and the code for the subprograms.
You can think of the spec as an interface and of the body as a black box. You can debug, enhance,
or replace a package body without changing the package spec.
To create a package spec, use the CREATE PACKAGE Statement. To create a package body, use
the CREATE PACKAGE BODY Statement.
The spec holds public declarations, which are visible to stored subprograms and other code outside
the package. You must declare subprograms at the end of the spec after all other items (except
pragmas that name a specific function; such pragmas must follow the function spec).
The body holds implementation details and private declarations, which are hidden from code outside
the package. Following the declarative part of the package body is the optional initialization part,
which holds statements that initialize package variables and do any other one-time setup steps.
The package specification contains public declarations. The declared items are accessible from
anywhere in the package and to any other subprograms in the same schema
The package body contains the implementation of every cursor and subprogram declared in the
package spec. Subprograms defined in a package body are accessible outside the package only if
their specs also appear in the package spec. If a subprogram spec is not included in the package
spec, that subprogram can only be invoked by other subprograms in the same package. A package
body must be in the same schema as the package spec.
END emp_bonus;
PROCEDURE calc_bonus
(date_hired employees.hire_date%TYPE) IS
BEGIN
DBMS_OUTPUT.PUT_LINE
END;
END emp_bonus;
/
DBMS_OUTPUT Package
DBMS_OUTPUT package enables you to display output from PL/SQL blocks, subprograms, packages,
and triggers. The package is especially useful for displaying PL/SQL debugging information. The
procedure PUT_LINE outputs information to a buffer that can be read by another trigger,
subprogram, or package. You display the information by invoking the procedure GET_LINE or by
setting SERVEROUTPUT ON in SQL*Plus. Example 10-4 shows how to display output from a PL/SQL
block.
SET SERVEROUTPUT ON
BEGIN
DBMS_OUTPUT.PUT_LINE
LOOP
DBMS_OUTPUT.PUT_LINE(item.table_name);
END LOOP;
END;
CURSORS
A cursor is a pointer to this context area. PL/SQL controls the context area through a
cursor. A cursor holds the rows (one or more) returned by a SQL statement. The set of
rows the cursor holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and process
the rows returned by the SQL statement, one at a time. There are two types of cursors
−
• Implicit cursors
• Explicit cursors
Implicit Cursors
Implicit cursors are automatically created by Oracle whenever an SQL statement is
executed, when there is no explicit cursor for the statement. Programmers cannot
control the implicit cursors and the information in it.
Whenever a DML statement (INSERT, UPDATE and DELETE) is issued, an implicit
cursor is associated with this statement. For INSERT operations, the cursor holds the
data that needs to be inserted. For UPDATE and DELETE operations, the cursor
identifies the rows that would be affected.
In PL/SQL, you can refer to the most recent implicit cursor as the SQL cursor, which
always has attributes such as %FOUND, %ISOPEN, %NOTFOUND,
and %ROWCOUNT. The SQL cursor has additional
attributes, %BULK_ROWCOUNT and %BULK_EXCEPTIONS, designed for use with
the FORALL statement. The following table provides the description of the most used
attributes −
%FOUND
1
Returns TRUE if an INSERT, UPDATE, or DELETE statement affected one or more rows or
statement returned one or more rows. Otherwise, it returns FALSE.
%NOTFOUND
2
The logical opposite of %FOUND. It returns TRUE if an INSERT, UPDATE, or DELETE state
rows, or a SELECT INTO statement returned no rows. Otherwise, it returns FALSE.
%ISOPEN
3
Always returns FALSE for implicit cursors, because Oracle closes the SQL cursor automatical
its associated SQL statement.
%ROWCOUNT
4
Returns the number of rows affected by an INSERT, UPDATE, or DELETE statement, o
SELECT INTO statement.
Explicit Cursors
Explicit cursors are programmer-defined cursors for gaining more control over
the context area. An explicit cursor should be defined in the declaration section of the
PL/SQL Block. It is created on a SELECT Statement which returns more than one row.
The syntax for creating an explicit cursor is −
CURSOR cursor_name IS select_statement;
Working with an explicit cursor includes the following steps −
Example
TRANSACTIONS
Transaction is a logical unit of work that contains one or more SQL statements. A transaction is an
atomic unit. The effects of all the SQL statements in a transaction can be either
all committed (applied to the database) or all rolled back (undone from the database).
A transaction begins with the first executable SQL statement. A transaction ends when it is
committed or rolled back, either explicitly with a COMMIT or ROLLBACK statement or implicitly when a
DDL statement is issued.
To illustrate the concept of a transaction, consider a banking database. When a bank customer
transfers money from a savings account to a checking account, the transaction can consist of three
separate operations:
Oracle Database must allow for two situations. If all three SQL statements can be performed to
maintain the accounts in proper balance, the effects of the transaction can be applied to the
database. However, if a problem such as insufficient funds, invalid account number, or a hardware
failure prevents one or two of the statements in the transaction from completing, the entire
transaction must be rolled back so that the balance of all accounts is correct.
Before a transaction that modifies data is committed, the following has occurred:
• Oracle Database has generated undo information. The undo information contains the old data values
changed by the SQL statements of the transaction.
• Oracle Database has generated redo log entries in the redo log buffer of the SGA. The redo log record
contains the change to the data block and the change to the rollback block. These changes may go to
disk before a transaction is committed.
• The changes have been made to the database buffers of the SGA. These changes may go to disk before
a transaction is committed.
Rollback of Transactions
Rolling back means undoing any changes to data that have been performed by SQL statements
within an uncommitted transaction. Oracle Database uses undo tablespaces (or rollback segments)
to store old values. The redo log contains a record of changes.
Oracle Database lets you roll back an entire uncommitted transaction. Alternatively, you can roll
back the trailing portion of an uncommitted transaction to a marker called a savepoint.
All types of rollbacks use the same procedures:
A distributed transaction is a transaction that includes one or more statements that update data on
two or more distinct nodes of a distributed database.
A two-phase commit mechanism guarantees that all database servers participating in a distributed
transaction either all commit or all undo the statements in the transaction. A two-phase commit
mechanism also protects implicit DML operations performed by integrity constraints, remote
procedure calls, and triggers.
The Oracle Database two-phase commit mechanism is completely transparent to users who issue
distributed transactions