CS2255 Notes
CS2255 Notes
com
CS 2255 DATABASE MANAGEMENT SYSTEMS
m
UNIT1. INTRODUCTION
co
A database is a structured collection of data. The data are typically organized to model relevant
aspects of reality (for example, the availability of rooms in hotels), in a way that supports
s.
processes requiring this information (for example, finding a hotel with vacancies).
The term database is correctly applied to the data and their supporting data structures, and not to
te
the database management system (DBMS). The database data collection with DBMS is called a
database system.
The term database system implies that the data are managed to some level of quality (measured
no
in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use
of a general-purpose database management system (DBMS).[1] A general-purpose DBMS is
typically a complex software system that meets many usage requirements to properly maintain
ar
This is specially the case with client-server, near-real time transactional systems, in which
t
multiple users have access to data, data is concurrently entered and inquired for in ways that
5s
preclude single-thread batch processing. Most of the complexity of those requirements are still
present with personal, desktop-based database systems.
Well known DBMSs include Oracle, FoxPro, IBM DB2, Linter, Microsoft Access, Microsoft
w.
SQL Server, MySQL,PostgreSQL and SQLite. A database is not generally portable across
different DBMS, but different DBMSs can inter-operate to some degree by using standards like
SQL and ODBC together to support a single application built over more than one database. A
ww
DBMS also needs to provide effective run-time execution to properly support (e.g., in terms of
performance, availability, and security) as many database end-users as needed.
A way to classify databases involves the type of their contents, for example: bibliographic,
document-text, statistical, or multimedia objects. Another way is by their application area, for
example: accounting, music compositions, movies, banking, manufacturing, or insurance.
www.5starnotes.com
www.5starnotes.com
The term database may be narrowed to specify particular aspects of organized collection of data
and may refer to the logical database, to the physical database as data content in computer data
m
storage or to many other database sub-definitions.
co
Purpose of Database System
A DBMS has evolved into a complex software system and its development typically requires
s.
thousands of person-years of development effort. Some general-purpose DBMSs, like Oracle,
Microsoft SQL Server, FoxPro, and IBM DB2, have been undergoing upgrades for thirty years
or more. General-purpose DBMSs aim to satisfy as many applications as possible, which
te
typically makes them even more complex than special-purpose databases. However, the fact that
they can be used "off the shelf", as well as their amortized cost over many applications and
instances, makes them an attractive alternative (Vs. one-time development) whenever they meet
no
an application's requirements.
Though attractive in many cases, a general-purpose DBMS is not always the optimal solution:
ar
When certain applications are pervasive with many operating instances, each with many users, a
general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large
amount of unnecessary, unutilized software code). Such applications usually justify dedicated
t
development. Typical examples are email systems, though they need to possess certain DBMS
5s
properties: email systems are built in a way that optimizes email messages handling and
managing, and do not need significant portions of a general-purpose DBMS functionality.
Views of data
w.
In database theory, a view consists of a stored [clarify] query accessible as a virtual [clarify] table in a
relational database or a set of documents in a document-oriented database composed of the result
ww
set of a query or map-and-reduce functions. Unlike ordinary tables (base tables) in a relational
database, a view does not form part of the physical schema: it is a dynamic, virtual table
computed or collated from data in the database. Changing the data in a table alters the data
shown in subsequent invocations of the view. In some NoSQL databases views are the only way
to query data.
www.5starnotes.com
www.5starnotes.com
Views can provide advantages over tables:
m
Views can represent a subset of the data contained in a table
Views can join and simplify multiple tables into a single virtual table
co
Views can act as aggregated tables, where the database engine aggregates data (sum,
average etc.) and presents the calculated results as part of the data
Views can hide the complexity of data; for example a view could appear as Sales2000 or
s.
Sales2001, transparently partitioning the actual underlying table
Views take very little space to store; the database contains only the definition of a view,
not a copy of all the data it presents
te
Depending on the SQL engine used, views can provide extra security
Views can limit the degree of exposure of a table or tables to the outer world
no
Just as functions (in programming) can provide abstraction, so database users can create
abstraction by using views. In another parallel with functions, database users can manipulate
nested views, thus one view can aggregate data from other views. Without the use of views the
ar
normalization of databases above second normal form would become much more difficult.
Views can make it easier to create lossless join decomposition.
t
Just as rows in a base table lack any defined ordering, rows available through a view do not
appear with any default sorting. A view is a relational table, and the relational model defines a
5s
table as a set of rows. Since sets are not ordered - by definition - the rows in a view are not
ordered, either. Therefore, an ORDER BY clause in the view definition is meaningless. The SQL
standard (SQL:2003) does not allow an ORDER BY clause in a subselect [clarify] in a CREATE
w.
VIEW statement, just as it is not allowed in a CREATE TABLE statement. However, sorted data
can be obtained from a view, in the same way as any other table - as part of a query statement.
ww
Nevertheless, some DBMS (such as Oracle Database and SQL Server[ambiguous]) allow a view to
be created with an ORDER BY clause in a subquery, affecting how data is displayed.
Data models A data model is an abstract structure that provides the means to effectively
describe specific data structures needed to model an application. As such a data model needs
sufficient expressive power to capture the needed aspects of applications. These applications are
www.5starnotes.com
www.5starnotes.com
often typical to commercial companies and other organizations (like manufacturing, human-
resources, stock, banking, etc.). For effective utilization and handling it is desired that a data
m
model is relatively simple and intuitive. This may be in conflict with high expressive power
co
needed to deal with certain complex applications. Thus any popular general-purpose data model
usually well balances between being intuitive and relatively simple, and very complex with high
expressive power. The application's semantics is usually not explicitly expressed in the model,
s.
but rather implicit (and detailed by documentation external to the model) and hinted to by data
item types' names (e.g., "part-number") and their connections (as expressed by generic data
structure types provided by each specific model).
te
Hierarchical model
In the Hierarchical model different record types (representing real-world entities) are embedded
no
in a predefined hierarchical (tree-like) structure. This hierarchy is used as the physical order of
records in storage. Record access is done by navigating through the data structure using pointers
combined with sequential accessing.
ar
This model has been supported primarily by the IBM IMS DBMS, one of the earliest DBMSs.
Various limitations of the model have been compensated at later IMS versions by additional
t
Network model
In this model a hierarchical relationship between two record types (representing real-world
w.
entities) is established by the set construct. A set consists of circular linked lists where one
record type, the set owner or parent, appears once in each circle, and a second record type, the
subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be
ww
established between any two record types, e.g., type A is the owner of B. At the same time
another set may be defined where B is the owner of A. Thus all the sets comprise a general
directed graph (ownership defines a direction), or network construct. Access to records is either
sequential (usually in each record type) or by navigation in the circular linked lists.
www.5starnotes.com
www.5starnotes.com
This model is more general and powerful than the hierarchical, and has been the most popular
before being replaced by the Relational model. It has been standardized by CODASYL. Popular
m
DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS. IDMS gained
co
a considerable customer base and exists and supported until today. In the 1980s it has adopted
the Relational model and SQL in addition to its original tools and languages.
An inverted file or inverted index of a first file, by a field in this file (the inversion field), is a
s.
second file in which this field is the key. A record in the second file includes a key and pointers
to records in the first file where the inversion field has the value of the key. This is also the
logical structure of contemporary database indexes. The related Inverted file data model utilizes
te
inverted files of primary database files to efficiently directly access needed records in these files.
Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970.
no
ADABAS has gained considerable customer base and exists and supported until today. In the
1980s it has adopted the Relational model and SQL in addition to its original tools and
languages.
ar
Relational model
t
The relational model is a simple model that provides flexibility. It organizes data based on two-
dimensional arrays known as relations, or tables as related to databases. These relations consist
5s
of a heading and a set of zero or more tuples in arbitrary order. The heading is an unordered set
of zero or more attributes, or columns of the table. The tuples are a set of unique attributes
mapped to values, or the rows of data in the table. Data can be associated across multiple tables
w.
with a key. A key is a single, or set of multiple, attribute(s) that is common to both tables. The
most common language associated with the relational model is the Structured Query Language
(SQL), though it differs in some places.
ww
Entity-relationship model
been applied in areas such as engineering and spatial databases, telecommunications and in
various scientific domains. The conglomeration of object oriented programming and database
technology led to this new kind of database. These databases attempt to bring the database world
www.5starnotes.com
www.5starnotes.com
and the application-programming world closer together, in particular by ensuring that the
database uses the same type system as the application program. This aims to avoid the overhead
m
(sometimes referred to as the impedance mismatch) of converting information between its
co
representation in the database (for example as rows in tables) and its representation in the
application program (typically as objects). At the same time, object databases attempt to
introduce key ideas of object programming, such as encapsulation and polymorphism, into the
s.
world of databases.
A variety of these ways have been tried[by whom?] for storing objects in a database. Some products
have approached the problem from the application-programming side, by making the objects
te
manipulated by the program persistent. This also typically requires the addition of some kind of
query language, since conventional programming languages do not provide language-level
functionality for finding objects based on their information content. Others[which?] have attacked
no
the problem from the database end, by defining an object-oriented data model for the database,
and defining a database programming language that allows full programming capabilities as well
as traditional query facilities.
ar
the relational model are sometimes classified as post-relational.[7] Alternate terms include
"hybrid database", "Object-enhanced RDBMS" and others. The data model in such products
5s
incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires
that
w.
all information in the database must be cast explicitly in terms of values in relations and in no
other way
ww
Some of these extensions to the relational model integrate concepts from technologies that pre-
date the relational model. For example, they allow representation of a directed graph with trees
on the nodes. The German company sones implements this concept in its GraphDB.
Some post-relational products extend relational systems with non-relational features. Others
arrived in much the same place by adding relational features to pre-relational systems.
www.5starnotes.com
www.5starnotes.com
Paradoxically, this allows products that are historically pre-relational, such as PICK and
MUMPS, to make a plausible claim to be post-relational.
m
The resource space model (RSM) is a non-relational data model based on multi-dimensional
co
classification.
Database languages
s.
Database languages are dedicated programming languages, tailored and utilized to
te
define a database (i.e., its specific data types and the relationships among them),
manipulate its content (e.g., insert new data occurrences, and update or delete existing
ones), and
no
query it (i.e., request information: compute and retrieve any information based on its
data).
Database languages are data-model-specific, i.e., each language assumes and is based on a
ar
certain structure of the data (which typically differs among different data models). They typically
have commands to instruct execution of the desired operations in the database. Each such
command is equivalent to a complex expression (program) in a regular programming language,
t
and thus programming in dedicated (database) languages simplifies the task of handling
5s
examples:
A major Relational model language supported by all the relational DBMSs and a standard.
SQL was one of the first commercial languages for the relational model. Despite not adhering to
the relational model as described by Codd, it has become the most widely used database
language.[10][11] Though often described as, and to a great extent is a declarative language, SQL
also includes procedural elements. SQL became a standard of the American National Standards
www.5starnotes.com
www.5starnotes.com
Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987.
Since then the standard has been enhanced several times with added features. However, issues of
m
SQL code portability between major RDBMS products still exist due to lack of full compliance
co
with, or different interpretations of the standard. Among the reasons mentioned are the large size,
and incomplete specification of the standard, as well as vendor lock-in.
s.
An object model language standard (by the Object Data Management Group) that has influenced
the design of some of the newer query languages like JDOQL and EJB QL, though they cannot
te
be considered as different flavors of OQL.
Database architecture (to be distinguished from DBMS architecture; see below) may be viewed,
t
of different end-users from a same database, as well as for other benefits. For example, a
financial department of a company needs the payment details of all employees as part of the
company's expenses, but not other many details about employees, that are the interest of the
w.
human resources department. Thus different departments need different views of the company's
database, that both include the employees' payments, possibly in a different level of detail (and
presented in different visual forms). To meet such requirement effectively database architecture
ww
consists of three levels: external, conceptual and internal. Clearly separating the three levels was
a major feature of the relational database model implementations that dominate 21st century
databases.[13]
www.5starnotes.com
www.5starnotes.com
The external level defines how each end-user type understands the organization of its
respective relevant data in the database, i.e., the different needed end-user views. A single
m
database can have any number of views at the external level.
co
The conceptual level unifies the various external views into a coherent whole, global
view.[13] It provides the common-denominator of all the external views. It comprises all
the end-user needed generic data, i.e., all the data from which any view may be
s.
derived/computed. It is provided in the simplest possible way of such generic data, and
comprises the back-bone of the database. It is out of the scope of the various database
end-users, and serves database application developers and defined by database
te
administrators that build the database.
The Internal level (or Physical level) is as a matter of fact part of the database
implementation inside a DBMS (see Implementation section below). It is concerned with
no
cost, performance, scalability and other operational matters. It deals with storage layout
of the conceptual level, provides supporting storage-structures like indexes, to enhance
performance, and occasionally stores data of individual views (materialized views),
ar
computed from generic data, if performance justification exists for such redundancy. It
balances all the external views' performance requirements, possibly conflicting, in
attempt to optimize the overall database usage by all its end-uses according to the
t
All the three levels are maintained and updated according to changing needs by database
administrators who often also participate in the database design.
w.
The above three-level database architecture also relates to and being motivated by the concept of
data independence which has been described for long time as a desired database property and
was one of the major initial driving forces of the Relational model. In the context of the above
ww
architecture it means that changes made at a certain level do not affect definitions and software
developed with higher level interfaces, and are being incorporated at the higher level
automatically. For example, changes in the internal level do not affect application programs
written using conceptual level interfaces, which saves substantial change work that would be
needed otherwise.
www.5starnotes.com
www.5starnotes.com
In summary, the conceptual is a level of indirection between internal and external. On one hand
it provides a common view of the database, independent of different external view structures,
m
and on the other hand it is uncomplicated by details of how the data are stored or managed
co
(internal level). In principle every level, and even every external view, can be presented by a
different data model. In practice usually a given DBMS uses the same data model for both the
external and the conceptual levels (e.g., relational model). The internal level, which is hidden
s.
inside the DBMS and depends on its implementation (see Implementation section below),
requires a different level of detail and uses its own data structure types, typically different in
nature from the structures of the external and conceptual levels which are exposed to DBMS
te
users (e.g., the data models above): While the external and conceptual levels are focused on and
serve DBMS users, the concern of the internal level is effective implementation details
The role includes the development and design of database strategies, system monitoring and
improving database performance and capacity, and planning for future expansion requirements.
They may also plan, co-ordinate and implement security measures to safeguard the database.
t
5s
an Entity – Relationship model (ER model for short) is an abstract way to describe a database.
It usually starts with a relational database, which stores data in tables. Some of the data in these
w.
tables point to data in other tables - for instance, your entry in the database could point to several
entries for each of the phone numbers that are yours. The ER model would say that you are an
ww
entity, and each phone number is an entity, and the relationship between you and the phone
numbers is 'has a phone number'. Diagrams created to design these entities and relationships are
called entity–relationship diagrams or ER diagrams.
This article refers to the techniques proposed in Peter Chen's 1976 paper.[1] However, variants of
the idea existed previously,[2] and have been devised subsequently such as supertype and subtype
www.5starnotes.com
www.5starnotes.com
[3]
data entities and commonality relationships (an example with additional concepts is the
enhanced entity–relationship model).
m
Using the three schema approach to software engineering, there are three levels of ER models
co
that may be developed. The conceptual data model is the highest level ER model in that it
contains the least granular detail but establishes the overall scope of what is to be included within
the model set. The conceptual ER model normally defines master reference data entities that are
s.
commonly used by the organization. Developing an enterprise-wide conceptual ER model is
useful to support documenting the data architecture for an organization.
te
A conceptual ER model may be used as the foundation for one or more logical data models. The
purpose of the conceptual ER model is then to establish structural metadata commonality for the
master data entities between the set of logical ER models. The conceptual data model may be
no
used to form commonality relationships between ER models as a basis for data model
integration.
ar
A logical ER model does not require a conceptual ER model especially if the scope of the logical
ER model is to develop a single disparate information system. The logical ER model contains
more detail than the conceptual ER model. In addition to master data entities, operational and
t
transactional data entities are now defined. The details of each data entity are developed and the
entity relationships between these data entities are established. The logical ER model is however
5s
One or more physical ER models may be developed from each logical ER model. The physical
w.
The physical model is normally forward engineered to instantiate the structural metadata into a
database management system as relational database objects such as database tables, database
indexes such as unique key indexes, and database constraints such as a foreign key constraint or
a commonality constraint. The ER model is also normally used to design modifications to the
relational database objects and to maintain the structural metadata of the database.
www.5starnotes.com
www.5starnotes.com
The first stage of information system design uses these models during the requirements analysis
to describe information needs or the type of information that is to be stored in a database. The
m
data modeling technique can be used to describe any ontology (i.e. an overview and
co
classifications of used terms and their relationships) for a certain area of interest. In the case of
the design of an information system that is based on a database, the conceptual data model is, at a
later stage (usually called logical design), mapped to a logical data model, such as the relational
s.
model; this in turn is mapped to a physical model during physical design. Note that sometimes,
both of these phases are referred to as "physical design".
E-R Diagrams
te
no
ar
Two related entities
t
Primary key
www.5starnotes.com
www.5starnotes.com
of a domain. When we speak of an entity, we normally speak of some aspect of the real world
which can be distinguished from other aspects of the real world.[4]
m
An entity may be a physical object such as a house or a car, an event such as a house sale or a car
co
service, or a concept such as a customer transaction or order. Although the term entity is the one
most commonly used, following Chen we should really distinguish between an entity and an
entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given
s.
entity-type. There are usually many instances of an entity-type. Because the term entity-type is
somewhat cumbersome, most people tend to use the term entity as a synonym for this term.
te
Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical
theorem.
over other models like the hierarchical database model or the network model.
5s
The relational database was first defined in June 1970 by Edgar Codd, of IBM's San Jose
Research Laboratory.
w.
Terminology
ww
www.5starnotes.com
www.5starnotes.com
Relational database terminology.
m
Relational database theory uses a set of mathematical terms, which are roughly equivalent to
SQL database terminology. The table below summarizes some of the most important relational
co
database terms and their SQL database equivalents.
s.
Relational term SQL equivalent
te
derived relvar view, query result, result set
tuple
no row
The relational model for database management is a database model based on first-order
predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.[1][2] In the relational
model of a database, all data is represented in terms of tuples, grouped into relations. A database
organized in terms of the relational model is a relational database.
www.5starnotes.com
www.5starnotes.com
m
co
s.
te
no
Diagram of an example database according to the Relational model.
t ar
5s
w.
ww
In the relational model, related records are linked together with a "key".
The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they
www.5starnotes.com
www.5starnotes.com
want from it, and let the database management system software take care of describing data
structures for storing the data and retrieval procedures for answering queries.
m
Most implementations of the relational model use the SQL data definition and query language. A
co
table in an SQL database schema corresponds to a predicate variable; the contents of a table to a
relation; key constraints, other constraints, and SQL queries correspond to predicates. However,
SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely
s.
argued against deviations that compromise the original principles.
The catalog:
te
Types – Keys
* Alternate key - An alternate key is any candidate key which is not selected to be the primary
no
key
* Candidate key - A candidate key is a field or combination of fields that can act as a primary
key field for that table to uniquely identify each record in that table.
ar
For Eg:
The table:
t
Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)
In the above example Ssn no. and employee identity are ccandidate keys.
5s
* Compound key - compound key (also called a composite key or concatenated key) is a key
that consists of 2 or more attributes.
w.
* Primary key - a primary key is a value that can be used to identify a unique row in a table.
Attributes are associated with it. Examples of primary keys are Social Security numbers
(associated to a specific person) or ISBNs (associated to a specific book).
ww
In the relational model of data, a primary key is a candidate key chosen as the main method of
uniquely identifying a tuple in a relation.
For Eg:
Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)
www.5starnotes.com
www.5starnotes.com
* Superkey - A superkey is defined in the relational model as a set of attributes of a relation
variable (relvar) for which it holds that in all relations assigned to that variable there are no two
m
distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a
co
superkey can also be defined as a set of attributes of a relvar upon which all attributes of the
relvar are functionally dependent.
For Eg:
s.
Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)
<Ssn,Name,Address>
<Ssn,Name>
te
<Ssn>
All the above are super keys.
no
* Foreign key - a foreign key (FK) is a field or group of fields in a database record that points to
a key field or group of fields forming a key of another database record in some (usually
different) table. Usually a foreign key in one table refers to the primary key (PK) of another
ar
table. This way references can be made to link
For Eg:
For a Student....
t
School(Name,Address,Phone,School_Reg_noprimary_key
5s
Relational algebra
In computer science, relational algebra is an offshoot of first-order logic and of algebra of sets
w.
concerned with operations over finitary relations, usually made more convenient to work with by
identifying the components of a tuple by a name (called attribute) rather than by a numeric
ww
The main application of relational algebra is providing a theoretical foundation for relational
databases, particularly query languages for such databases, chiefly among which is
www.5starnotes.com
www.5starnotes.com
Introduction
m
Relational algebra received little attention outside of pure mathematics until the publication of
E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for
co
database query languages. (See section Implementations.)
Both a named and a unnamed perspective are possible for relational algebra, depending on
s.
whether the tuples are endowed with component names or not. In the unnamed perspective, a
tuple is simply a member of a Cartesian product. In the named perspective, tuples are functions
from a finite set U of attributes (of the relation) to a domain of values (assumed distinct from
te
U).[1] The relational algebras obtained from the two perspectives are equivalent.[2] The typical
undergraduate textbooks present only the named perspective though,[3][4] and this article follows
suit. no
Relational algebra is essentially equivalent in expressive power to relational calculus (and thus
first-order logic); this result is known as Codd's theorem. One must be careful to avoid a
ar
mismatch that may arise between the two languages because negation, applied to a formula of
the calculus, constructs a formula that may be true on an infinite set of possible tuples, while the
difference operator of relational algebra always returns a finite result. To overcome these
t
difficulties, Codd restricted the operands of relational algebra to finite relations only and also
proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are
5s
found in many other logic-based computer languages. Codd defined the term relational
completeness to refer to a language that is complete with respect to first-order predicate calculus
apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the
w.
Primitive operations
ww
As in any algebra, some operators are primitive and the others are derived in terms of the
primitive ones. It is useful if the choice of primitive operators parallels the usual choice of
primitive logical operators.
www.5starnotes.com
www.5starnotes.com
Five primitive operators of Codd's algebra are the selection, the projection, the Cartesian
product (also called the cross product or cross join), the set union, and the set difference.
m
Another operator, rename was not noted by Codd, but the need for it is shown by the inventors of
co
ISBL. These six operators are fundamental in the sense that omitting any one of them causes a
loss of expressive power. Many other operators have been defined in terms of these six. Among
the most important are set intersection, division, and the natural join. In fact ISBL made a
s.
compelling case for replacing the Cartesian product with the natural join, of which the Cartesian
product is a degenerate case.
Altogether, the operators of relational algebra have identical expressive power to that of domain
te
relational calculus or tuple relational calculus. However, for the reasons given in section
Introduction, relational algebra is less expressive than first-order predicate calculus without
function symbols. Relational algebra corresponds to a subset of first-order logic, namely Horn
no
clauses without recursion and negation.
Set operators
ar
Although three of the six basic operators are taken from set theory, there are additional
constraints that are present in their relational algebra counterparts: For set union and set
t
difference, the two relations involved must be union-compatible—that is, the two relations must
have the same set of attributes. Because set intersection can be defined in terms of set difference,
5s
The Cartesian product is defined differently from the one in set theory in the sense that tuples are
w.
considered to be 'shallow' for the purposes of the operation. That is, the Cartesian product of an
n-tuple by an m-tuple has the 2-tuple "flattened" into an (n + m)-tuple. In set theory, the
Cartesian product is a set of 2-tuples. More formally, R × S is defined as follows:
ww
R × S = {(r1, r2, ..., rn, s1, s2, ..., sm) | (r1, r2, ..., rn) ∈ R, (s1, s2, ..., sm) ∈ S}
Like the Cartesian product, the cardinality of the result is the product of the cardinalities of its
factors, i.e., |R × S| = |R| × |S|. In addition, for the Cartesian product to be defined, the two
www.5starnotes.com
www.5starnotes.com
relations involved must have disjoint headers—that is, they must not have a common attribute
name.
m
Projection (π)
co
A projection is a unary operation written as where is a set of attribute
names. The result of such projection is defined as the set that is obtained when all tuples in R are
s.
restricted to the set .
This specifies the specific subset of columns (attributes of each tuple) to be retrieved. To obtain
te
the names and phone numbers from an address book, the projection might be written
Selection (σ)
ar
(and), (or) and (negation). This selection selects all those tuples in R for which holds.
5s
To obtain a listing of all friends or business associates in an address book, the selection might be
relation containing every attribute of every unique record where isFriend is true or where
isBusinessContact is true.
Rename (ρ)
ww
A rename is a unary operation written as where the result is identical to R except that
the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the
attribute of a relation or the relation itself.
www.5starnotes.com
www.5starnotes.com
To rename the 'isFriend' attribute to 'is BusinessContact' in a relation,
might be used.
m
co
Domain relational calculus
In computer science, domain relational calculus (DRC) is a calculus that was introduced by
s.
Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data
model.[1]
te
In DRC, queries have the form:
no
where each Xi is either a domain variable or constant, and denotes a
DRC formula. The result of the query is the set of tuples Xi to Xn which makes the DRC formula
true.
ar
This language uses the same operators as tuple calculus, the logical connectives ∧ (and), ∨ (or)
and ¬ (not). The existential quantifier (∃) and the universal quantifier (∀) can be used to bind the
t
variables.
5s
Examples
w.
In this example, A, B, C denotes both the result set and a set in the table Enterprise.
www.5starnotes.com
www.5starnotes.com
Find Names of Enterprise crewmembers who are in Stellar Cartography:
m
co
In this example, we're only looking for the name, and that's B. F = C is a requirement, because
s.
we need to find Enterprise crew members AND they are in the Stellar Cartography Department.
te
Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational
model, in order to provide a declarative database-query language for this data model. It formed
no
the inspiration for the database-query languages QUEL and SQL, of which the latter, although
far less faithful to the original relational model and calculus, is now the de-facto-standard
database-query language; viz., a dialect of SQL is used by nearly every relational-database-
ar
management system. Lacroix and Pirotte proposed domain calculus, which is closer to first-order
logic and which showed that both of these calculi (as well as relational algebra) are equivalent in
expressive power. Subsequently, query languages for the relational model were called
t
Relational database
Since the calculus is a query language for relational databases we first have to define a relational
database. The basic relational building block is the domain, or data type. A tuple is an ordered
multiset of attributes, which are ordered pairs of domain and value; or just a row. A relvar
ww
(relation variable) is a set of ordered pairs of domain and name, which serves as the header for a
relation. A relation is a set of tuples. Although these relational concepts are mathematically
defined, those definitions map loosely to traditional database concepts. A table is an accepted
visual representation of a relation; a tuple is similar to the concept of row.
We first assume the existence of a set C of column names, examples of which are "name",
"author", "address" et cetera. We define headers as finite subsets of C. A relational database
schema is defined as a tuple S = (D, R, h) where D is the domain of atomic values (see relational
www.5starnotes.com
www.5starnotes.com
model for more on the notions of domain and atomic value), R is a finite set of relation names,
and
m
h : R → 2C
co
a function that associates a header with each relation name in R. (Note that this is a simplification
from the full relational model where there is more than one domain and a header is not just a set
of column names but also maps these column names to a domain.) Given a domain D we define
a tuple over D as a partial function
s.
t:C→D
that maps some column names to an atomic value in D. An example would be (name : "Harry",
te
age : 25).
The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is
called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t).
no
Finally we define a relational database given a schema S = (D, R, h) as a function
db : R → 2TD
ar
that maps the relation names in R to finite subsets of TD, such that for every relation name r in R
and tuple t in db(r) it holds that
dom(t) = h(r).
t
The latter requirement simply says that all the tuples in a relation should contain the same
5s
Originally based upon relational algebra and tuple relational calculus, its scope includes data
insert, query, update and delete, schema creation and modification, and data access control.
ww
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks".[4] Despite not adhering to the relational model as described by Codd, it became the most
widely used database language.[5][6] Although SQL is often described as, and to a great extent is,
a declarative language, it also includes procedural elements. SQL became a standard of the
American National Standards Institute (ANSI) in 1986, and of the International Organization for
Standards (ISO) in 1987. Since then, the standard has been enhanced several times with added
www.5starnotes.com
www.5starnotes.com
features. However, issues of SQL code portability between major RDBMS products still exist
due to lack of full compliance with, or different interpretations of, the standard. Among the
reasons mentioned are the large size and incomplete specification of the standard, as well as
m
vendor lock-in.
co
SQL fundamentals
Language elements
s.
The SQL language is subdivided into several language elements, including:
Clauses, which are constituent components of statements and queries. (In some cases,
these are optional.)[10]
te
Expressions, which can produce either scalar values or tables consisting of columns and
rows of data.
Predicates, which specify conditions that can be evaluated to SQL three-valued logic
(3VL) or Boolean (true/false/unknown) truth values and which are used to limit the
no
effects of statements and queries, or to change program flow.
Queries, which retrieve the data based on specific criteria. This is the most important
element of SQL.
Statements, which may have a persistent effect on schemata and data, or which may
control transactions, program flow, connections, sessions, or diagnostics.
ar
o SQL statements also include the semicolon (";") statement terminator. Though not
required on every platform, it is defined as a standard part of the SQL grammar.
Integrity
t
5s
In computing, data integrity refers to maintaining and assuring the accuracy and consistency of
data over its entire life-cycle,[1] and is an especially important feature of a database or RDBMS
system. Data warehousing and business intelligence in general demand the accuracy, validity and
correctness of data despite hardware failures, software bugs or human error. Data that has
integrity is identically maintained during any operation, such as transfer, storage or retrieval.
w.
All characteristics of data, including business rules, rules for how pieces of data relate, dates,
definitions and lineage must be correct for its data integrity to be complete. When functions
operate on the data, the functions must ensure integrity. Examples include transforming the data,
ww
www.5starnotes.com
www.5starnotes.com
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which
states that every table must have a primary key and that the column or columns chosen to be
the primary key should be unique and not null.
m
Referential integrity concerns the concept of a foreign key. The referential integrity rule states
that any foreign-key value can only be in one of two states. The usual state of affairs is that the
co
foreign key value refers to a primary key value of some table in the database. Occasionally, and
this will depend on the rules of the data owner, a foreign-key value can be null. In this case we
are explicitly saying that either there is no relationship between the objects represented in the
database or that this relationship is unknown.
Domain integrity specifies that all columns in relational database must be declared upon a
s.
defined domain. The primary unit of data in the relational data model is the data item. Such
data items are said to be non-decomposable or atomic. A domain is a set of values of the same
type. Domains are therefore pools of values from which actual values appearing in the columns
of a table are drawn.
te
If a database supports these features it is the responsibility of the database to insure data integrity
as well as the consistency model for the data storage and retrieval. If a database does not support
these features it is the responsibility of the applications to insure data integrity while the database
no
supports the consistency model for the data storage and retrieval.
As of 2012, since all modern databases support these features (see Comparison of relational
database management systems), it has become the de-facto responsibility of the database to
5s
ensure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets,
ISAM, flat files, etc.) for their consistency model lack any[citation needed] kind of data-integrity
model. This requires organizations to invest a large amount of time, money, and personnel in
building data-integrity systems on a per-application basis that effectively just duplicate the
w.
existing data integrity systems found in modern databases. Many companies, and indeed many
database systems themselves, offer products and services to migrate out-dated and legacy
systems to modern databases to provide these data-integrity features. This offers organizations
substantial savings in time, money, and resources because they do not have to develop per-
ww
application data-integrity systems that must be re-factored each time business requirements
change.
Trigger
www.5starnotes.com
www.5starnotes.com
new worker) is added to the employees table, new records should also be created in the tables of
the taxes, vacations and salaries.
m
Triggers in Microsoft SQL Server
co
Microsoft SQL Server supports triggers either after or instead of an insert, update or delete
operation. They can be set on tables and views with the constraint that a view can be referenced
only by an INSTEAD OF trigger.
s.
Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers,
which can fire in reaction to a very wide range of events, including:
Drop table
te
Create table
Alter table
Login events
By default, both DML and DDL triggers execute under the context of the user that calls the
trigger. The caller of a trigger is the user that executes the statement that causes the trigger to
run. For example, if user Mary executes a DELETE statement that causes DML trigger
5s
DML_trigMary to run, the code inside DML_trigMary executes in the context of the user
privileges for Mary. This default behavior can be exploited by users who want to introduce
malicious code in the database or server instance. For example, the following DDL trigger is
created by user JohnDoe:
w.
ON DATABASE
ww
FOR ALTER_TABLE
AS
GO
www.5starnotes.com
www.5starnotes.com
What this trigger means is that as soon as a user that has permission to execute a GRANT
CONTROL SERVER statement, such as a member of the sysadmin fixed server role, executes
an ALTER TABLE statement, JohnDoe is granted CONTROL SERVER permission. In other
m
words, although JohnDoe cannot grant CONTROL SERVER permission to himself, he enabled
the trigger code that grants him this permission to execute under escalated privileges. Both DML
co
and DDL triggers are open to this kind of security threat.
s.
Simple Features (officially Simple Feature Access) is both an OpenGIS and ISO standard (ISO
19125) that specifies a common storage model of mostly two-dimensional geographical data
(point, line, polygon, multi-point, multi-line, etc.)
te
The ISO 19125 standard comes in two parts. Part one, ISO 19125-1 (SFA-CA for "common
architecture"), defines a model for two-dimensional simple features, with linear interpolation
between vertices. The data model defined in SFA-CA is a hierarchy of classes. This part also
defines representation using Well-Known Text (and Binary). Part 2 of the standard, ISO 19125-2
(SFA-SQL), defines an implementation using SQL.[1] The OpenGIS standard(s) cover
no
implementations in CORBA and OLE/COM as well, although these have lagged behind the SQL
one and are not standardized by ISO.
The ISO/IEC 13249-3 SQL/MM Spatial extends the Simple Features data model mainly with
ar
circular interpolations (e.g. circular arcs) and adds other features like coordinate transformations
and methods for validating geometries as well as Geography Markup Language support.
Embedded SQL
t
and the database manipulation capabilities of SQL. Embedded SQL statements are SQL
statements written inline with the program source code of the host language. The embedded SQL
statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to
a code library. The output from the preprocessor is then compiled by the host compiler. This
w.
allows programmers to embed SQL statements in programs written in any number of languages
such as: C/C++, COBOL and Fortran.
The ANKITA SQL standards committee defined the embedded SQL standard in two steps: a
formalism called Module Language was defined, then the embedded SQL standard was derived
ww
from Module Language.[1] The SQL standard defines embedding of SQL as embedded SQL and
the language in which SQL queries are embedded is referred to as the host language. A popular
host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase
database management systems. In the PostgreSQL database management system this
precompiler is called ECPG. Other embedded SQL precompilers are Pro*Ada, Pro*COBOL,
Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.
www.5starnotes.com
www.5starnotes.com
PL/SQL supports variables, conditions, loops and exceptions. Arrays are also supported, though
in a somewhat unusual way, involving the use of PL/SQL collections. PL/SQL collections is a
slightly advanced topic.
m
Implementations from version 8 of Oracle Database onwards have included features associated
co
with object-orientation.
Dynamic SQL
s.
Once the program units have been stored into the database, they become available for execution
at a later time.
While programmers can readily embed Data Manipulation Language (DML) statements directly
te
into their PL/SQL code using straightforward SQL statements, Data Definition Language (DDL)
requires more complex "Dynamic SQL" statements to be written in the PL/SQL code. However,
DML statements underpin the majority of PL/SQL code in typical software applications.
In the case of PL/SQL dynamic SQL, early versions of the Oracle Database required the use of a
no
complicated Oracle DBMS_SQL package library. More recent versions have however
introduced a simpler "Native Dynamic SQL", along with an associated EXECUTE
IMMEDIATE syntax.
ar
Oracle Corporation customarily extends package functionality with each successive release of
the Oracle Database.
A distributed database is a database in which storage devices are not all attached to a common
5s
processing unit such as the CPU. It may be stored in multiple computers located in the same
physical location, or may be dispersed over a network of interconnected computers. Unlike
parallel systems, in which the processors are tightly coupled and constitute a single database
w.
system, a distributed database system consists of loosely coupled sites that share no physical
components.
Collections of data (e.g. in a database) can be distributed across multiple physical locations. A
ww
distributed database can reside on network servers on the Internet, on corporate intranets or
extranets, or on other company networks. The replication and distribution of databases improves
database performance at end-user worksites. [1][clarification needed]
www.5starnotes.com
www.5starnotes.com
To ensure that the distributive databases are up to date and current, there are two processes:
replication and duplication. Replication involves using specialized software that looks for
m
changes in the distributive database. Once the changes have been identified, the replication
co
process makes all the databases look the same. The replication process can be very complex and
time consuming depending on the size and number of the distributive databases. This process can
also require a lot of time and computer resources. Duplication on the other hand is not as
s.
complicated. It basically identifies one database as a master and then duplicates that database.
The duplication process is normally done at a set time after hours. This is to ensure that each
distributed location has the same data. In the duplication process, changes to the master database
te
only are allowed. This is to ensure that local data will not be overwritten. Both of the processes
can keep the data current in all distributive locations.[2]
Besides distributed database replication and fragmentation, there are many other distributed
no
database design technologies. For example, local autonomy, synchronous and asynchronous
distributed database technologies. These technologies' implementation can and does depend on
the needs of the business and the sensitivity/confidentiality of the data to be stored in the
ar
database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.
t
5s
Functional dependencies
w.
projection is a function, i.e. Y is a function of X.[1][2] In simple words, if the values for
the X attributes are known (say they are x), then the values for the Y attributes corresponding to x
www.5starnotes.com
www.5starnotes.com
can be determined by looking them up in any tuple of R containing x. Customarily X is called the
determinant set and Y the dependent set. A functional dependency FD: X → Y is called trivial if
m
Y is a subset of X.
co
The determination of functional dependencies is an important part of designing databases in the
relational model, and in database normalization and denormalization. A simple application of
functional dependencies is Heath’s theorem; it says that a relation R over an attribute set U and
s.
satisfying a functional dependency X → Y can be safely split in two relations having the lossless-
te
the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions
in database theory.) An important notion in this context is a candidate key, defined as a minimal
set of attributes that functionally determine all of the attributes in a relation. The functional
no
dependencies, along with the attribute domains, are selected so as to generate constraints that
would exclude as much data inappropriate to the user domain from the system as possible.
A notion of logical implication is defined for functional dependencies in the following way: a set
ar
. The notion of logical implication for functional dependencies admits a sound and
complete finite axiomatization, known as Armstrong's axioms.
5s
Non-loss decomposition
w.
Functional dependency
projection is a function, i.e. Y is a function of X.[1][2] In simple words, if the values for
www.5starnotes.com
www.5starnotes.com
the X attributes are known (say they are x), then the values for the Y attributes corresponding to x
can be determined by looking them up in any tuple of R containing x. Customarily X is called the
m
determinant set and Y the dependent set. A functional dependency FD: X → Y is called trivial if
co
Y is a subset of X.
s.
functional dependencies is Heath’s theorem; it says that a relation R over an attribute set U and
satisfying a functional dependency X → Y can be safely split in two relations having the lossless-
te
join decomposition property, namely into where Z = U − XY are
the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions
in database theory.) An important notion in this context is a candidate key, defined as a minimal
no
set of attributes that functionally determine all of the attributes in a relation. The functional
dependencies, along with the attribute domains, are selected so as to generate constraints that
would exclude as much data inappropriate to the user domain from the system as possible.
ar
A notion of logical implication is defined for functional dependencies in the following way: a set
of functional dependencies logically implies another set of dependencies , if any relation R
satisfying all dependencies from also satisfies all dependencies from ; this is usually written
t
. The notion of logical implication for functional dependencies admits a sound and
5s
In the process of efficiently storing data, and eliminating redundancy, tables in a database are
w.
designed and created to be in one of five possible normal forms. Each normal form contains and
enforces the rules of the previous form, and, in turn, applies some stricter rules on the design of
tables.
ww
www.5starnotes.com
www.5starnotes.com
- The table has a primary key.
- No single attribute (column) has multiple values.
m
- The non-key attributes (columns) depend on the primary key.
co
Some examples of placing a table in first normal form are:
s.
author_id: stories:
000024 novelist, playwright // multiple values
000034 magazine columnist
te
002345 novella, newpaper columnist // multiple values
author_id: stories:
no
000024 novelist
ar
000024 playwright
000034 magazine columnist
002345 novella
t
=================
Tables are said to be in second normal form when:
- The tables meet the criteria for first normal form.
ww
www.5starnotes.com
www.5starnotes.com
Third Normal Form:
m
================
co
Tables are said to be in third normal form when:
- The tables meet the criteria for second normal form.
- Each non-key attribute in a row does not depend on the entry in
s.
another key column.
Dependency preservation
te
Databases can be described as all of the following:
Computer data – information in a form suitable for use with a computer. Data is often
distinguished from programs. A program is a sequence of instructions that detail a task for the
5s
computer to perform. In this sense, data is everything in software that is not program code.
w.
developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly
not dealt with by 3NF as originally defined.[1]
Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper
by Ian Heath in 1971.[2] Date writes:
"Since that definition predated Boyce and Codd's own definition by some three years, it seems to
me that BCNF ought by rights to be called Heath normal form. But it isn't."[3]
www.5starnotes.com
www.5starnotes.com
If a relational scheme is in BCNF then all redundancy based on functional dependency has been
removed, although other types of redundancy may still exist. A relational schema R is in Boyce–
Codd normal form if and only if for every one of its dependencies X → Y, at least one of the
m
following conditions hold:[4]
co
X → Y is a trivial functional dependency (Y ⊆ X)
X is a superkey for schema R
Multivalued dependency
s.
In database theory, multivalued dependency is a full constraint between two sets of attributes in
a relation.
te
In contrast to the functional dependency, the multivalued dependency requires that certain
tuples be present in a relation. Therefore, a multivalued dependency is a special case of tuple-
no
generating dependency. The multivalued dependency plays a role in the 4NF database
normalization.
ar
A multivalued dependency is a special case of a join dependency, with only two sets of values
involved, i.e. it is a 2-ary join dependency.
Formal definition
t
Let be a relation schema and let and (subsets). The multivalued dependency
(which can be read as multidetermines ) holds on if, in any legal relation , for all
w.
pairs of tuples and in such that , there exist tuples and in such that
ww
In more simple words the above condition can be expressed as follows: if we denote by
the tuple having values for collectively equal to
correspondingly, then whenever the tuples and exist in , the tuples
and should also exist in .
www.5starnotes.com
www.5starnotes.com
Fourth normal form
m
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by
Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form
co
(BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with
functional dependencies, 4NF is concerned with a more general type of dependency known as a
multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial
s.
multivalued dependencies X Y, X is a superkey—that is, X is either a candidate key or a
superset thereof.[1]
te
Join dependencies and fifth normal form.
A join dependency is a constraint on the set of legal relations over a database scheme. A table T
no
is subject to a join dependency if T can always be recreated by joining multiple tables each
having a subset of the attributes of T. If one of the tables in the join has all the attributes of the
table T, the join dependency is called trivial.
ar
The join dependency plays an important role in the Fifth normal form, also known as project-join
normal form, because it can be proven that if you decompose a scheme in tables to ,
t
the decomposition will be a lossless-join decomposition if you restrict the legal relations on to
5s
Another way to describe a join dependency is to say that the set of relationships in the join
dependency is independent of each other.
w.
Unlike in the case of functional dependencies, there is no sound and complete axiomatization for
join dependencies.
ww
Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of
database normalization designed to reduce redundancy in relational databases recording multi-
www.5starnotes.com
www.5starnotes.com
valued facts by isolating semantically related multiple relationships. A table is said to be in the
5NF if and only if every join dependency in it is implied by the candidate keys.
m
A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each
co
of A, B, …, Z is a superkey for R.[1]
s.
Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a
complex real-world constraint governing the valid combinations of attribute values in the 4NF
table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the
te
burden of maintaining the logical consistency of the data within the table must be carried partly
by the application responsible for insertions, deletions, and updates to it; and there is a
heightened risk that the data within the table will become inconsistent. In contrast, the 5NF
no
design excludes the possibility of such inconsistencies.
UNIT IV TRANSACTIONS
ar
Transaction concepts
A transaction comprises a unit of work performed within a database management system (or
t
similar system) against a database, and treated in a coherent and reliable way independent of
5s
1. To provide reliable units of work that allow correct recovery from failures and keep a
w.
database consistent even in cases of system failure, when execution stops (completely or
partially) and many operations upon a database remain uncompleted, with unclear status.
2. To provide isolation between programs accessing a database concurrently. If this
ww
www.5starnotes.com
www.5starnotes.com
Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a
database must either complete in its entirety or have no effect whatsoever. Further, the system
m
must isolate each transaction from other transactions, results must conform to existing
co
constraints in the database, and transactions that complete successfully must get written to
durable storage.
Transaction recovery
s.
A transactional database is a DBMS where write transactions on the database are able to be
rolled back if they are not completed properly (e.g. due to power or connectivity loss).
te
Most modern relational database management systems fall into the category of databases that
support transactions.
no
In a database system a transaction might consist of one or more data-manipulation statements
and queries, each reading and/or writing information in the database. Users of database systems
ar
consider consistency and integrity of data as highly important. A simple transaction is usually
issued to the database system in a language like SQL wrapped in a transaction, using a pattern
similar to the following:
t
If no errors occurred during the execution of the transaction then the system commits the
transaction. A transaction commit operation applies all data manipulations within the scope of
ww
the transaction and persists the results to the database. If an error occurs during the transaction,
or if the user specifies a rollback operation, the data manipulations within the transaction are not
persisted to the database. In no case can a partial transaction be committed to the database since
that would leave the database in an inconsistent state(Article by Vishak P Nair).
www.5starnotes.com
www.5starnotes.com
Internally, multi-user databases store and process transactions, often by using a transaction ID or
XID.
m
There are multiple varying ways for transactions to be implemented other than the simple way
co
documented above. Nested transactions, for example, are transactions which contain statements
within them that start new transactions (i.e. sub-transactions). Multi-level transactions are similar
but have a few extra properties[citation needed]
. Another type of transaction is the compensating
s.
transaction.
ACID properties
te
n computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties
that guarantee that database transactions are processed reliably. In the context of databases, a
no
single logical operation on the data is called a transaction. For example, a transfer of funds from
one bank account to another, even involving multiple changes such as debiting one account and
crediting another, is a single transaction.
ar
Jim Gray defined these properties of a reliable transaction system in the late 1970s and
developed technologies to achieve them automatically.[1] In 1983, Andreas Reuter and Theo
t
Atomicity
Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the
w.
entire transaction fails, and the database state is left unchanged. An atomic system must
guarantee atomicity in each and every situation, including power failures, errors, and crashes.
Consistency
ww
The consistency property ensures that any transaction will bring the database from one valid state
to another. Any data written to the database must be valid according to all defined rules,
including but not limited to constraints, cascades, triggers, and any combination thereof.
Isolation
www.5starnotes.com
www.5starnotes.com
m
The isolation property ensures that the concurrent execution of transactions results in a system
state that could have been obtained if transactions are executed serially, i.e. one after the other.
co
Each transaction has to execute in total isolation i.e. if T1 and T2 are being executed
concurrently then both of them should remain unaware of each other's presence[citation needed]
s.
Durability
Durability means that once a transaction has been committed, it will remain so, even in the event
te
of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL
statements execute, the results need to be stored permanently (even if the database crashes
immediately thereafter).
System recovery
no
ar
UNIT V IMPLEMENTATION TECHNIQUES
Overview of Physical Storage Media
t
Several types of data storage exist in most computer systems. They vary in speed of access, cost
per unit of data, and reliability.
5s
Cache: most costly and fastest form of storage. Usually very small, and managed by the
operating system.
w.
Main Memory (MM): the storage area for data available to be operated on.
Usually too small (even with megabytes) and too expensive to store the entire database.
www.5starnotes.com
www.5starnotes.com
Reading data from flash memory takes about 10 nano-secs (roughly as fast as from main
memory), and writing data into flash memory is more complicated: write-once takes about 4-10
microsecs.
m
To overwrite what has been written, one has to first erase the entire bank of the memory. It may
co
support only a limited number of erase cycles ( to ).
It has found its popularity as a replacement for disks for storing small volumes of data (5-10
megabytes).
s.
Magnetic-disk storage: primary medium for long-term storage.
te
Data must be moved from disk to main memory in order for the data to be operated on.
After operations are performed, data must be copied back to disk if any changes were made.
no
Disk storage is called direct access storage as it is possible to read data on the disk in any order
(unlike sequential access).
Cheaper, but much slower access, since tape must be read sequentially from the beginning.
The storage device hierarchy is presented in Figure 10.1, where the higher levels are expensive
(cost per bit), fast (access time), but the capacity is smaller.
ww
www.5starnotes.com
www.5starnotes.com
m
co
s.
te
Figure Storage-device hierarchy
Secondary (or on-line) storage: the next level of the hierarchy, e.g., magnetic disks.
ar
Tertiary (or off-line) storage: magnetic tapes and optical disk juke boxes.
Volatility of storage. Volatile storage loses its contents when the power is removed. Without
power backup, data in the volatile storage (the part of the hierarchy from main memory up) must
t
Magnetic Disks
Magnetic storage and magnetic recording are terms from engineering referring to the storage
w.
hard disks, are widely used to store computer data as well as audio and video signals. In the field
of computing, the term magnetic storage is preferred and in the field of audio and video
production, the term magnetic recording is more commonly used. The distinction is less
technical and more a matter of preference. Other examples of magnetic storage media include
floppy disks, magnetic recording tape, and magnetic stripes on credit cards.
www.5starnotes.com
www.5starnotes.com
Information is written to and read from the storage medium as it moves past devices called read-
and-write heads that operate very close (often tens of nanometers) over the magnetic surface.
The read-and-write head is used to detect and modify the magnetization of the material
m
immediately under it.
co
The magnetic surface is conceptually divided into many small sub-micrometer-sized magnetic
regions, referred to as magnetic domains, (although these are not magnetic domains in a rigorous
physical sense), each of which has a mostly uniform magnetization. Due to the polycrystalline
nature of the magnetic material each of these magnetic regions is composed of a few hundred
s.
magnetic grains. Magnetic grains are typically 10 nm in size and each form a single true
magnetic domain. Each magnetic region in total forms a magnetic dipole which generates a
magnetic field. In older hard disk drive (HDD) designs the regions were oriented horizontally
and parallel to the disk surface, but beginning about 2005, the orientation was changed to
perpendicular to allow for closer magnetic domain spacing.
te
For reliable storage of data, the recording material needs to resist self-demagnetization, which
occurs when the magnetic domains repel each other. Magnetic domains written too densely
together to a weakly magnetizable material will degrade over time due to rotation of the
no
magnetic moment one or more domains to cancel out these forces. The domains rotate sideways
to a halfway position that weakens the readability of the domain and relieves the magnetic
stresses. Older hard disk drives used iron(III) oxide as the magnetic material, but current disks
use a cobalt-based alloy.[1]
ar
A write head magnetizes a region by generating a strong local magnetic field, and a read head
detects the magnetization of the regions. Early HDDs used an electromagnet both to magnetize
the region and to then read its magnetic field by using electromagnetic induction. Later versions
of inductive heads included metal in Gap (MIG) heads and thin film heads. As data density
t
increased, read heads using magnetoresistance (MR) came into use; the electrical resistance of
the head changed according to the strength of the magnetism from the platter. Later development
5s
made use of spintronics; in read heads, the magnetoresistive effect was much greater than in
earlier types, and was dubbed "giant" magnetoresistance (GMR). In today's heads, the read and
write elements are separate, but in close proximity, on the head portion of an actuator arm. The
read element is typically magneto-resistive while the write element is typically thin-film
inductive.[2]
w.
The heads are kept from contacting the platter surface by the air that is extremely close to the
platter; that air moves at or near the platter speed. The record and playback head are mounted on
a block called a slider, and the surface next to the platter is shaped to keep it just barely out of
ww
RAID
www.5starnotes.com
www.5starnotes.com
on what level of redundancy and performance (via parallel communication) is required. In
October 1986, the IBM S/38 announced "checksum". Checksum was an implementation of
RAID- The implementation was in the operating system and was software only and had a
m
minimum of 10% overhead. The S/38 "scatter loaded" all data for performance. The downside
was the loss of any single disk required a total system restore for all disks. Under checksum,
co
when a disk failed, the system halted and was then shutdown. Under maintenance, the bad disk
was replaced and then a parity-bit disk recovery was run. The system was restarted using a
recovery procedure similar to the one run after a power failure. While difficult, the recovery
from a drive failure was much shorter and easier than without checksum.
s.
RAID is an example of storage virtualization and was first defined by David Patterson, Garth A.
Gibson, and Randy Katz at the University of California, Berkeley in 1987.[3] Marketers
representing industry RAID manufacturers later attempted to reinvent the term to describe a
redundant array of independent disks as a means of disassociating a low-cost expectation from
te
RAID technology.[4]
RAID is now used as an umbrella term for computer data storage schemes that can divide and
replicate data among multiple physical drives. The physical drives are said to be "in a RAID",
no
however the more common, incorrect parlance is to say that they are "in a RAID array".[5] The
array can then be accessed by the operating system as one single drive. The different schemes or
architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each
scheme provides a different balance between three key goals: resiliency, performance, and
ar
capacity
Tertiary Storage
Tertiary storage or tertiary memory,[4] provides a third level of storage. Typically it involves a
t
robotic mechanism which will mount (insert) and dismount removable mass storage media into a
storage device according to the system's demands; these data are often copied to secondary
5s
storage before use. It is primarily used for archiving rarely accessed information since it is much
slower than secondary storage (e.g. 5–60 seconds vs. 1–10 milliseconds). This is primarily useful
for extraordinarily large data stores, accessed without human operators. Typical examples
include tape libraries and optical jukeboxes.
w.
When a computer needs to read information from the tertiary storage, it will first consult a
catalog database to determine which tape or disc contains the information. Next, the computer
will instruct a robotic arm to fetch the medium and place it in a drive. When the computer has
finished reading the information, the robotic arm will return the medium to its place in the
ww
library.
File Organization
The arrangement of records in a file is known as file organization. It shows the arrangement of
data in a file, deals with the arrangement of data items in secondary storage device. The main
objectives of file organization are as follows:
www.5starnotes.com
www.5starnotes.com
1. To provide an efficient method to locate records neede for processing.
2. To facilitate file creation and its updation in future.
m
For organizing records efficiently in the form of a computer file, following things are important:
co
1. A logical method should be observed to organize records in a file.
2. File structure should be so designed that it would allow quick access to needed data
items.
3. Means of adding or deleting data items or records from files must be present.
s.
Organization of Records in Files
te
1. Serial file
2. Sequential file
3. Direct file
4. Indexed-sequential file no
Serial file
ar
In a serial file the records are placed one after another, however there is no specific order in the
arrangement of these records. On a magnetic tape storage device, the records are stored in a
serial manner.
t
5s
Sequential file
A sequential file is a file in which the records are stored in some order i.e. either in ascending or
descending.
w.
Direct access
ww
A sequential file is not suitable for on-line enquiry, because in sequential access we have to
traverse the records from the beginning to the end. Random access file organization is best suited
for on-line processing systems where current information is the one that is always required.
Index-sequential file
www.5starnotes.com
www.5starnotes.com
Index- sequential files are also known as indexed sequential access method (ISAM) files. When
files may require supporting both batch and on-line processing at that time we have to us the
ISAM files. These files are basically a sequential file organized serially on key fields. In
m
addition, an index is maintained which speeds up the access of isolated records. The file is
divided into a number of blocks and the highest key in each block is indexed. Within each block
co
the record is searched sequentially. This method is much faster than searching the entire file
sequentially.
s.
te
no
t ar
5s
w.
ww
www.5starnotes.com