Chapter 2 Data Models Final
Chapter 2 Data Models Final
Data Models
Database models describe how the data is presented to the user and programmer for
access and is actually just a conceptual description of how the database works. In this
chapter, students will learn about the following data models:
Network Model
Hierarchical Model
Relational Model
Object-Oriented Model
Object-Relational Model
2.0 INTRODUCTION
Today, the importance and impact of databases is unquestioned as government
organizations, academic institutions, and business entities create and maintain extensive
databases containing all kinds of information ranging from natural-language text
documents, statistical tables, financial data, and multimedia objects to data of a scientific
and technical nature. Many databases are composed of metadata, which means the
records hold "data about data" such as information about the size and character of another
database rather than primary source content such as a person's name and address.
Database technologies, including architecture and access methods, are rapidly developing
to keep pace with this demand for information management mechanisms.
Database designers and managers face many challenges that reflect the complexity of the
burgeoning information environment. Database technologies must handle massive
amounts of data, extract useful information from these repositories, and have the ability
to reflect relationships between data maintained in different databases. In addition the
architecture and system must provide integrity, recovery, concurrency, and security.
There are a variety of different ways to organize data within a database. A particular
method selected for structuring information within a database is called a Data Model.
Database models describe how the data is presented to the user and programmer for
access and is actually just a conceptual description of how the database works. Here the
structure of the data is decided, which includes the data types, relationships and
constraints that should hold on the data. Thus, a Logical Data Model specifies:
1
A collection of operators or rules of inference, which can be applied to any
valid instance of the data types.
A collection of general integrity rules, which implicitly or explicitly define the
set of consistent data base states or change of state or both.
Various models have been proposed for these activities. These models can be divided into
two categories which are
Record based data model: Record-based logical models describe data at the conceptual
and view levels. These are used to specify overall logical structure of the database, and
provide a higher-level description of the implementation. Examples of the categories are:
Hierarchical Model
Network Model
Relational Model
Object based data model: The object-based model is based on a collection of objects.
Examples are:
The systematic development of databases started with the hierarchical and network
models, which were popular in the 1960’s and 1970’s. After that the relational model was
introduced which is of current interest also. Most of the database applications are based
on this model. Object Oriented model is the latest development in the database
technology. Object Relational model is the hybrid form of relational and object oriented
model. The development of database models are represented in fig 2.1
2
and IBM. There are so many Organizations, which are still using IMS database for their
applications.
In hierarchical model, data elements are connected to one another through links. Records
are arranged in a top-down tree like structure. The top node is called the root, the bottom
nodes are called leaves, and intermediate nodes have one parent node and several child
nodes. The root can have any number of child nodes but a child node can have only one
parent node. Data are related in a nested, one-to-many set of relationships, while many-
to-many relationship cannot be directly expressed. To understand it practically consider
an example given here.
Example:
Customer-order-item database:
There are three data types (record types) in the database: customers, orders, and line
items. For each customer, there may be several orders, but for each order, there is just one
customer. Likewise, for each order, there may be many line items, but each line item
occurs in just one order. (This is the schema for the database.)
So, each customer record is the root of a tree, with the orders as children. The children of
the orders are the line items.
Consider one more example of hierarchical model as represented in fig. 2.2.
The physical structure of the data on the disk doesn’t matter under the Hierarchical
model; the DMBS can (and usually does) store the data as a linked list of fields, with
pointer that go from parent node to child node, ending in a null or terminal-pointer leaf. It
3
quickly becomes obvious that this design makes it easy to add new fields at any level, as
the DBMS only has to change the null terminal pointer to pointer of the new sibling node
(field) in the list. This concept gives us a deal of flexibility.
There are so many rules for data manipulation and data integrity, contained in the general
schema for a hierarchical system; however, these will differ system to system. For
example IMS has its own data manipulation and integrity rules.
Every record occurrence (except root node) must have a parent occurrence, which may
have so many implications like:
No record occurrence, except a root record, can exist without being linked to a
parent-record occurrence. This means that a child record cannot be inserted unless
it is linked to a parent record and also that deletion of a parent record causes
automatic deletion of all linked child records.
If a child record type has two or more parent record types, then a child record
must be duplicated once for each parent record.
Advantages
The hierarchical model had many advantages over the file system it replaced. It can be
said that the advantages and features of the hierarchical database system was the reason
for the development of the database models that followed it. Due to the hierarchical
structure of the model the design of the database is simple. It was also the first model,
which provided the data security. The data integrity is enjoyed due to the parent/child
relationship as the child segments one always automatically referred to its parent
segment.
Besides the above explained points, other advantages of this model are:-
4
Data Integrity – Since the hierarchical model is based on the parent / child
relationship, there is always a link between the parent segment and the child
segments
Disadvantages
The first problem is that the initial structure of the database is arbitrary and must be
defined by the programmers when the database is created. From that point on, the parent-
child relationship can’t be changed without redesigning the whole structure. In order to
make this change, the programmer has to create an entirely new structure, with new
parent-child relationships, ad copy the data from the original database to the proper
locations in the new one.
The next disadvantage of this model is that it has no easy way to define many-to-many
relationships. The most common approach to solving the many-to-many relationship
problem is to add secondary parent-child and sibling pointer to the hierarchical structure.
This method creates numerous circular relationships; as these relationships become more
complex, the database structure eventually evolves into the next model
Among the oldest of the database architecture, many hierarchical databases exist in larger
organizations today. This technology is best applied when the conceptual data model also
resembles a tree and when most data access begins with the same (root) file. Hierarchical
database technology is used for high-volume transaction processing and MIS
applications. Few new databases are developed with hierarchical DBMS’s since newer
summarization of transaction data.
5
Thus, a DBMS may STORE, DELETE or MODIFY records within a database. In this
way, a number of records within a network database are dynamically changed.
In accordance with the Data Base Task Group (DBTG) proposals, each record directly
corresponds to the concrete entity, but relationships between the records are implemented
by means of a special logical construction. This logical construction is called a Data Set
(or simply Set).
Each set includes exactly one record of the first type. This record is called an
Owner of the set.
Each set may include 0 (i.e. an Empty set occurrence), 1 or N records of the same
type. These records are called members of the data set.
All members within one set occurrence have a fixed order (are sorted).
There may be two or more sets consisting of records of the same types and describing the
same relationship between records. In the simplest case, each set (or more precisely, set
occurrence) consists of records of two different types (e.g. Father and Child).
The member sets belonging to different owners are disjoint. To define a network
database, one needs to define.
6
The record types which consist of data item and
The set types.
In other words, according to the network data model the information within a database is
arranged as a collection of record occurrences and a collection of set occurrences.
All record types and all set types must be described in the data base schema. The data
base consists of record occurrences and of set occurrences of such types as were
previously defined in the data base schema.
The network model can be graphically represented by using a box type diagram, called as
Bachman diagram. The notations of this diagram are as follows:
Two or more different records within a network database may have duplicate values of all
data items. The Data Base Key (DBK) is conceptually a data item, whose value is
associated with each stored record in the database. We can think of it as a unique internal
record identifier used inside a database to distinguish one record from another. Each
record is assigned a data base key value when it is stored in the database for the first time.
A record retains a value of the Data Base Key even if the record is modified until the
record is finally deleted from the database. In some ways, the data base key for the
CODASYL record is like a unique roll number for every student of a class or a personal
identification number.
Like the hierarchical model, the network model includes built-in support for certain types
of referential integrity, by virtue of its primary data structure, the link. Therefore it is
possible to enforce the similar rules as we were having in hierarchical model e.g a child
can not be inserted without having the parent record.
Advantages
7
Network model can handle one-to-many and many-to-many relationships, which makes
the modeling of database easier. The data access is easier in network model than the
hierarchical model. The owner record and all its member records can be accessed through
the application program. Moreover the sets and the relationships between record types
involved in the set are predefined. The predefined relationships are usually implemented
at the physical level with the use of link structure. This results in faster access of records.
All the network database systems are based on the standard formed by DBTG and
augmented by ANSI/SPARC, which gives the universal standard for database.
Disadvantages
The query language used to manipulate the data in network database is procedural and
requires the user to navigate through the database by specifying sets, owners and
members. This requires that the user must be very well versed with the structure.
8
7654 Martin Salesman 7698 28-SEP-81 1250 1400 30
7698 Blake Manager 7839 01-MAY-81 2850 30
Table 2 An example of EMP relation
Relational model uses Structured Query Language (SQL) as the primary interface
language for interacting the database. By using SQL, we can define the relations and their
relationships. Also we can manipulate (insertion, modification and deletion of records)
and query the data by using SQL. The type of queries varies from single table to complex
multi-table involving joins, nesting, set operators etc.
Advantages
Unlike hierarchical and network model, where all the relationships are predefined, a
relational model develops new relations on user commands thus make the it easier and
flexible than the two.
The relational model is independent from the data storage details. Changes in the
database structure do not affect the data access. Moreover because designs are free from
physical storage details, they can concentrate more on the logical view of the database.
SQL is the main power of relational model. It is a very powerful, flexible and easy to use
language to make queries. Using SQL user can specify what information they want and
leave the details of how to get the information to the database.
Disadvantages
Main disadvantage of relational model is the unawareness of the physical data storage
details, which may lead to the bad designing of the database. As database grows poorly
designed database will slow down the system and may result in performance degradation
and data corruption.
9
explicitly with the between two record foreign key attributes in one
limitation that it types are explicitly relation that reference the
can have only one represented by the set primary key of another
real parent and type construct, and relation. Individual tuples that
one virtual parent the DBMS physically have matching values in the
in the hierarchical connects related foreign and primary key
model. records together in a attributes are logically related,
set instance. even though they are not
physically connected.
Table 3 Comparison between the three models
The main difference between the approaches is that the structure of hierarchical and
network models encourages navigational or record-at-a-time processing. This means that
Network and hierarchical models provide almost no separation between conceptual and
physical schemas (very little abstraction), relational model, however, does support
conceptual separation (not navigational). DBMS based on such data models generally
maintain procedural DMLs. In contrast, relational systems work with entire tables and
encourage the production of non-procedural interfaces. Note, however, that until quite
recently relational systems were poor at directly supporting entity and referential
integrity. In contrast, because of the inherent pointer structures in hierarchical systems
certain forms of referential integrity must be automatically supported.
As the figure moves from left to right the OODBMS gets more complex. Starting with
the left side, which represents OODBMS in its earlier stage; object oriented languages
have been extended to provide simple persistence for application objects to persist
between user sessions. At the mid-point, there are the same features just mentioned
except now the database products are sufficient enough to handle complex data and to
develop complex data management applications. Finally, database products with
declarative semantics have the ability to greatly reduce development efforts and to
10
enforce uniformity in the application of these semantics. Most OODBMS products are
currently in the middle with a few of them exhibiting declarative semantics such as
constraints, referential integrity rules, and security capabilities.
The desire to represent complex objects has lead to the development of object-oriented
systems. It is not just about new data types. The new distributed applications will be built
more and more on modular, object-oriented architectures. Object-oriented or object-based
architectures are very appropriate for managing complexity (e.g., complex data
relationships). Competitive business pressures today are increasing the level of
complexity that business software must model and support. As more and more
applications are implemented in these object-oriented or object-based architectures, there
will be increasing pressures on application developers to have high-performance storage
mechanisms that are fundamentally compatible with the object model. Hence this demand
makes object-oriented database management systems (OODBMSs) necessary.
OODBMSs are defined and implemented explicitly to provide efficient storage for
object-oriented applications.
The following principles apply to object oriented data base models (and to object oriented
programming :
Encapsulation: An application (another object) can only communicate with an
object via messages. The operations provided by an object define the set of
messages which can be understood by it; no other operations can be applied to an
object.
Inheritance: New object classes can be derived from another class (the super-
class) by inheritance. The new classes inherit the attributes and methods of the
super-class and offer additional attributes and operations. The relation between a
derived class and its super-class is called ``isA" relation because an instance of the
derived class also is an instance of the super-class.
11
Aggregation: Composite objects may be constructed as consisting of a set of
elementary objects. The container object can communicate with its contained
objects via their methods. The relation between the container object and its
components is called ``partOf" relation because a component is a part of the
container object.
A quick example of using the OODBMs approach is automobile vendors. Each specific
car such as a Jaguar, Ford Escort ZX2, and a Dodge Ram is an object while the type of
objects is a class such as cars and trucks. A class defines the data members and methods
that are associated with objects of the class. An object can only belong to one and only
one class. An object is said to be an instance of a class.
12
data model. The data model consists of data types, type constructors, etc., and is similar
to the SQL report that describes the standard model for relational databases. The ODL is
designed so as to support semantic constructs of ODMG 2.0 object model. It is
independent of any programming language. The ODL is used to create object
specifications. The OQL is designed to work closely with the programming languages for
which an ODMG binding is defined such as C++, Java and SMALLTALK. The syntax of
the OQL queries is similar to the syntax of SQL (a query language for relational
databases) with some additional features such as object identity, complex objects,
inheritance, polymorphism and relationships. An object-oriented language is the language
for both the application and the database. OODBMSs have been integrated with C++, C,
Java and LISP. The primary interface in an OODBMS for creating and modifying objects
is directly via the object language (C++, Java, etc.) using the native language syntax. A
key difference between relational databases and OO databases is the way in which
relationships are handled. In OO databases, the relationships are represented explicitly
with OIDs, which improves the data access performance. In relational databases,
relationships among tuples are specified by attributes having the same domain.
Although object oriented data bases can be used more easily by any application than
relational data bases where all the operations have to be realized in programs outside the
data base. However, object oriented data bases are not yet as widely used in commercial
products as relational databases. Even though the "relational" and the "object oriented"
principles contradict to each other (e.g., it is hard to imagine encapsulation in a relational
data base where the data belonging to an object can be distributed in a considerably large
number of tables), there are efforts going on to combine the advantages of the wide
acceptance of relational data bases and the benefits of the object oriented paradigm in
object-relational data base systems (described in next section).
Object relational database management systems (ORDBMS) are an attempt to meet the
demands of more complex data representation through extending relational database
systems with object-oriented technologies. The main contribution of the ORDBMS is its
ability to handle complex, object-centric, persistent data while maintaining the easy-to-
use RDBMS querying methods (e.g. SQL-3) to operate on that data. Not only are more
complex data types supported, but user-defined types can be supported as well, greatly
increasing the application domains to which an ORDBMS can be applied. ORDBMSs are
often considered the bridge between RDBMSs and OODBMSs, implementing the ease of
use of the RDBMS and the flexibility of the OODBMS to handle complex data types. An
ORDBMS is employed in application domains where complex data types are required
and simply cannot be managed using a traditional RDBMS and its accompanying query
language.
13
The main objective of ORDBMS design was to achieve the benefits of both the relational
and the object models such as scalability and support for rich data types. ORDBMSs
employ a data model that attempts to incorporate OO features into RDBMSs. All
database information is stored in tables, but some of the tabular entries may have richer
data structure, termed abstract data types (ADTs). An ORDBMS supports an extended
form of SQL called SQL3 that is still in the development stages. The extensions are
needed because ORDBMSs have to support ADT's. The ORDBMS has the relational
model in it because the data is stored in the form of tables having rows and columns and
SQL is used as the query language and the result of a query is also table or tuples (rows).
But the relational model has to be drastically modified in order to support the classic
features of object-oriented programming. Hence the characteristics of an ORDBMSs are:
Base datatype extension,
Support complex objects,
Inheritance, and
Rule Systems.
ORDBMSs allow users to define datatypes, functions and operators. As a result, the
functionality of the ORDBMSs increases along with their performance. An example
schema of a student relation which ORDBMS supports is :
EMP( empno, ename, job, hiredate, deptno, location, picture)
Notice the extra attributes "location" and "picture" which are not present in the traditional
EMP relation of RDBMS. The datatype of "location" is "geographic point" and that of
"picture" is image".
14
Performance Very good Relatively less Expected to perform
performance performance very well
Product Relatively old and so This concept is few Still in development
maturity very mature years old and so stage so immature.
relatively mature
The use of Extensive supports OQL is similar to SQL, SQL3 is being
SQL SQL but with additional developed with OO
features like Complex features
objects and object- incorporated in it
oriented features.
Advantages Its dependence on It can handle all types Ability to query
SQL, relatively of complex complex
simple query applications, reusability applications and
optimization hence of code, less coding ability to handle
good performance large and complex
applications
The application domain of an ORDBMS consists of the instances where complex data
simply cannot be represented using a traditional RDBMS and its accompanying query
language. The field of multimedia is the quintessential example of this application
domain. In this domain, the data types themselves may be audio clips, video segments, or
15
a combination of the two. However, even with the video and audio media, the
applications ORDBMSs are used for require that frequent querying / updating access to
large collections of data be performed. This implies that although complex data types are
involved, there is still the need to perform relational-type accessing of the data.
Although OODBMSs are geared towards complex data types much the same as
ORDBMSs, these systems are more oriented to performing more complex operations on
smaller (overall) collections of data, and performing querying less frequently. A common
application of an OODBMS is in the field of computer aided design (CAD). In this case,
the access model is to retrieve some datum (perhaps a large circuit layout), perform many
complex operations on it, and update (or replace) the original version. As can be seen,
although the data types supported by both ORDBMSs and OODBMSs can be considered
equivalent, the access patterns of the data are considerably different.
Since RDBMSs were the first on the market, there are several well-established
development tools available. First and foremost is the availability and integration of the
SQL (or SQL2) query language for modifying and viewing elements of the database.
Next, there are several client side tools, which provide a means for data entry and display.
These tools are often referred to as fourth generation languages (4GLs). For security
reasons, a RDBMS runs in a different address space (usually on a server that is physically
a different machine) than the client accessing it.
Although ORDBMSs are an extension of traditional RDBMSs, there are some significant
differences which must be considered. As previously mentioned an ORDBMS supports
complex and user-defined data types. These types are not supported in the traditional
SQL query language. Therefore, another querying language supporting these data types,
such as SQL-3 or OQL, must be employed.
The client interface tools for an ORDBMS are also different from those used for a
RDBMS. Whereas in a RDBMS a 4GL can be used to construct queries, an ORDBMS
requires additional facilities to support visual and audio data components. In other words,
there must be client side facilities, which allow the complex data types to be requested
and displayed appropriately.
16
considering if security is necessary for the given DBMS, since it could instead be traded
off for performance reasons.
Client side tools are synonymous with the programming language used. In a RDBMS or
ORDBMS, where a separate client application is written, perhaps in a 4GL, OODBMSs
utilize the OO language of the system to implement the client applications. This,
however, leads to possible security issues. On one hand, if the client and server exist in
separate address spaces (which is an acceptably secure solution), then the implementation
of persistent data can result in commands running up to 2-3 orders of magnitude slower
than the non-persistent case. However, persistence is the major point of OODBMSs.
Hence, security is often traded off so that the client and server run in the same address
space in order to achieve an acceptable level of performance.
EXERCISE
1. What is the need of data model in DBMS?
2. Differentiate the three data models namely Network, Hierarchical and Relational
models.
3. What are the advantages of Object-Relational model over other available data
models?
17
18