Unit-02 Different Types of Database
Unit-02 Different Types of Database
2.0 Introduction
2.2 Unit Objectives
2.2 Data Models
2.2.1 Object-based Data Models
2.2.2 Record-based Data Models
2.2.3 Physical Data Models
2.2.4 Conceptual Modeling
2.3 Relational Database
2.4 Distributed Database
2.5 Centralized Database
2.6 Difference between Centralized and Distributed Databases
2.7 Summary
2.8 Key Terms
2.9 Check Your Progress
2.0 Introduction:- The classification of database systems is based on several criteria. The first important criterion is the
data model on which DBMS is based. The relational data model is the most widely used data model for many
commercial DBMSs. various older applications run on the database systems based on hierarchical and network data
models. Another data model that was limited to some commercial systems is the object data model. The relational
DBMSs are evolving continuously, and have been incorporating many of the concepts that were developed in object
databases. This has led to a new class of DBMSs called object-relational DBMSs. Hence DBMSs can be
categorized on the basis of data models: relational, object, object-relational, hierarchical, network, and other. The
second criterion used to classify DBMSs is the number of users supported by the system. Single-user systems
support only one user at a time and are mostly used with personal computers. Multiuser systems, which include the
majority of DBMSs, support multiple users. A third criterion is the number of sites over which the database is
distributed. A DBMS is centralized if the data is stored at a single computer site. A centralized DBMS can support
multiple users, but the DBMS and the database themselves reside totally at a single computer site. A distributed
DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites, connected by a
computer network. Homogeneous DDBMSs use the same DBMS software at multiple sites. A recent trend is to
develop software to access several autonomous pre-existing databases stored under heterogeneous llBMSs. This
leads to a federated DBMS (or multi-database system), in which the participating DBMSs are loosely coupled and
have a degree of local autonomy. Many DBMSs use client-server architecture. This unit describes various data
models and the basic types of databases, like relational, distributed, and centralized databases.
2.1 Unit Objectives:- After completing this unit, the reader will be able to:
Learn about the different types of data models.
Describe the classification of databases.
Illustrate the features of relational, centralized, and distributed databases.
2.2 Data Models:- The data model is an integrated collection of concepts for describing and manipulating data,
relationships between data, and constraints on the data in an organization. A model is a representation of real-world
objects and events and their associations. It is an abstraction that concentrates on the essential, inherent aspects of
an organization and ignores the accidental properties. A data model represents the organization itself. It should
provide the basic concepts and notations that will allow database designers and end-users to communicate
unambiguously and accurately their understanding of the organizational data. A data model can be thought of as
comprising three components:
1. A structural part, consisting of a set of rules according to which databases can be constructed.
2. A manipulative part, defining the types of operations that are allowed on the data (this includes the operations that
are used for updating or retrieving data from the database and for changing the structure of the database).
3. A set of integrity constraints, which ensures that the data is accurate. The purpose of a data model is to represent
data and to make the data understandable. If it does this, then it can be easily used to design a database. There
have been many data models proposed in the literature. They fall into three broad categories: object-based, record-
based, and physical data models. The first two are used to describe data at the conceptual and external levels, the
third is used to describe data at the internal level.
2.2.1 Object-based Data Models:- Object-based data models use concepts such as entities, attributes, and relationships.
An entity is a distinct object (a person, place, thing, concept, and event) in the organization that is to be represented
in the database. An attribute is a property that describes some aspect of the object that we wish to record, and a
relationship is an association between entities. Some of the more common types of the object-based data model are:
Entity-Relationship (ER):- The entity-relationship (ER) data model uses a collection of basic objects, called entities,
and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from
other objects. The entity-relationship model is widely used in database design.
Object-Oriented: Object-oriented programming (especially in Java, C++, or C#) has become the dominant software
development methodology. This led to the development of an object-oriented data model that can be seen as
extending the E-R model with notions of encapsulation, methods (functions), and object identity. The object-
relational data model combines features of the object-oriented data model and relational data model. The ER model
has emerged as one of the main techniques for database design and forms the basis for the database design
methodology used in this book. The object-oriented data model extends the definition of an entity to include not only
the attributes that describe the state of the object but also the actions that are associated with the object, that is, its
behaviour. The object is said to encapsulate both state and behaviour.
For example, it shows that employee John White is a manager with a salary of £30,000, who works at
branch (branchNo) B005, which, from the first table, is at 22 Deer Rd in London. It is important to note that there is a
relationship between Staff and Branch: a branch office has staff. However, there is no explicit link between these two
tables; it is only by knowing that the attribute branchNo in the Staff relation is the same as the branchNo of the
Branch relation that we can establish that a relationship exists.
Hierarchical data model:- The hierarchical model is a restricted type of network model. Again, data is rep-resented
as collections of records, and relationships are represented by sets. However, the hierarchical model allows a node
to have only one parent. A hierarchical model can be represented as a tree graph, with records appearing as nodes
(also called segments) and sets as edges. Figure 2.3 illustrates an instance of a hierarchical schema for the same
data set presented in Figure 2.1
2.2.4 Conceptual Modeling:- From an examination of the three-level architecture, we see that the conceptual schema is
the heart of the database. It supports all the external views and is, in turn, supported by the internal schema.
However, the internal schema is merely the physical implementation of the conceptual schema. The conceptual
schema should be a complete and accurate representation of the data requirements of the enterprise (business
organizations). If this is not the case, some information about the enterprise will be missing or incorrectly
represented and we will have difficulty fully implementing one or more of the external views.
Conceptual modelling or conceptual database design is the process of constructing a model of the
information used in an enterprise that is independent of implementation details, such as the target DBMS,
application programs, programming languages, or any other physical considerations. This model is called a
conceptual data model. Conceptual models are also referred to as “logical models” in the literature. However, the
conceptual model is independent of all implementation details, whereas the logical model assumes knowledge of the
underlying data model of the target DBMS.
2.3 Relational Databases:- A data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. It is based on the relational model and uses a collection of tables to
represent both data and the relationships among those data. It also includes a DML and DDL.
The relational model is an example of a record-based model. Record-based models are so named because
the database is structured in fixed-format records of several types. Each table contains records of a particular type.
Each record type defines a fixed number of fields or attributes. The columns of the table correspond to the attributes
of the record type.
A relational database consists of a collection of tables, each of which is assigned a unique name. For
example, consider the instructor table of Figure 2.4 (a), which stores information about instructors. The table has
four column headers: ID, name, dept name, and salary. Each row of this table records information about an
instructor, consisting of the instructor’s ID, name, dept_name, and salary. Similarly, the course table of Figure 2.4 (b)
stores information about courses, consisting of a course_id, title, dept_name, and credits, for each course. Note that
each instructor is identified by the value of the column ID, while each course is identified by the value of the column
course_id. Figure 2.4 (c) shows a third table, prereq, which stores the prerequisite courses for each course. The
table has two columns,
course_id,and prereq_id.
Each row consists of a
pair of course identifiers
such that the second
course is a prerequisite
for the first course.
Thus, a row in
the prereq table
indicates that two
courses are related in
the sense that one
course is a prerequisite
for the other. As another example, we consider the table instructor, a row in the table can be thought of as
representing the relationship between a specified ID and the corresponding values for name, dept_name, and salary
values.
In general, a row in a table represents a relationship among a set of values. Since a table is a collection of
such relationships, there is a close correspondence between the concept of the table and the mathematical concept
of relation, from which the relational data model takes its name. In mathematical terminology, a tuple is simply a
sequence (or list) of values. A relationship between n values is represented mathematically by an n-tuple of values,
i.e., a tuple with n values, which corresponds to a row in a table. The order in which tuples appear in a relation is
irrelevant since relation is a set of tuples.
Thus, in the relational model, the term relation is used to refer to a table, while the term tuple is used to refer
to a row. Similarly, the term attribute refers to a column of a table. From Figure 2.4 (a), we can see that the relation
instructor has four attributes: ID, name, dept name, and salary. For each attribute of relation, there is a set of
permitted values, called the domain of that attribute. Thus, the domain of the salary attribute of the instructor relation
is the set of all possible salary values, while the domain of the name attribute is the set of all possible instructor
names.
We require that, for all relations r, the domains of all attributes of r be atomic. A domain is atomic if elements
of the domain are considered to be indivisible units. For example, suppose the table instructor in figure 2.4 (a) had
an attribute phone_number, which can store a set of phone numbers corresponding to the instructor. Then the
domain of phone_number would not be atomic, since an element of the domain is a set of phone numbers, and it
has subparts, namely the individual phone numbers in the set. The null value is a special value that signifies that the
value is unknown or does not exist.
2.4 Distributed Databases:- A major motivation behind the development of database systems is the desire to integrate
the operational data of an organization and to provide controlled access to the data. Although we may think that
integration and controlled access implies centralization, this is not the intention. In fact, the development of computer
networks promotes a decentralized mode of work. This decentralized approach mirrors the organizational structure
of many companies, which are logically distributed into divisions, departments, projects, and so on, and physically
distributed into offices, plants, or factories, where each unit maintains its own operational data. The development of
a distributed DBMS that reflects this organizational structure makes the data in all units accessible, and stores data
proximate to the location where it is most frequently used, should improve the ability to share the data and should
improve the efficiency with which we can access the data.
A distributed database is a logically interrelated collection of shared data (and a description of this data),
physically distributed over a computer network. Distributed DBMS is the software system that permits the
management of the distributed database and makes the distribution transparent to users.
A distributed database management system (DDBMS) consists of a single logical database that is split into a
number of fragments. Each fragment is stored on one or more computers (replicas) under the control of a separate
DBMS, with the computers connected by a communications network. Each site is capable of independently
processing user requests that require access to local data (that is, each site has some degree of local autonomy)
and is also capable of processing data stored on other computers in the network.
Users access the distributed database via applications. Applications are classified as those that do not
require data from other sites (local applications) and those that do require data from other sites (global applications).
We require DDBMS to have at least one global application. A DDBMS, therefore, has the following characteristics:
.
2.5 Centralized Databases:- Centralized database systems are those that run on a single computer system and do not
interact with other computer systems. Such database systems span a range from single-user database systems
running on personal computers to high-performance database systems running on high-end server systems.
A modern, general-purpose computer system consists of one to a few processors and a number of device
controllers that are connected through a common bus that provides access to shared memory as shown in figure
2.7. The processors have local cache memories that store local copies of parts of the memory, to speed up access
to data. Each processor may have several independent cores, each of which can execute a separate instruction
stream. Each device controller is in charge of a specific type of device (for example, a disk drive, an audio device, or
a video display). The processors and the device controllers can execute concurrently, competing for memory
access. Cache memory reduces the contention for memory access since it reduces the number of times that the
processor needs to access the shared memory
Computers can be used in two distinct ways: as
single-user systems and as multiuser systems. Personal
computers and workstations fall into the first category. A
typical single-user system is a desktop unit used by a
single person, usually with only one processor and one
or two hard disks, and usually only one person using the
machine at a time. A typical multiuser system, on the
other hand, has more disks and more memory and may
have multiple processors. It serves a large number of
users who are connected to the system remotely.
Database systems designed for use by single users usually do not provide many of the facilities that a
multiuser database provides. In particular, they may not support concurrency control, which is not required when
only a single user can generate updates. Provisions for crash recovery in such systems are either absent or primitive
—for example, they may consist of simply making a backup of the database before any update. In contrast,
database systems designed for multiuser systems support the full transactional features that we have studied earlier.
Although most general-purpose computer systems in use today have multiple processors, they have coarse-
granularity parallelism, with only a few processors (about two to four, typically), all sharing the main memory.
Databases running on such machines usually do not attempt to partition a single query among the processors;
instead, they run each query on a single processor, allowing multiple queries to run concurrently. Thus, such
systems support a higher throughput; that is, they allow a greater number of transactions to run per second,
although individual transactions do not run any faster.
Databases designed for single-processor machines already provide multitasking, allowing multiple
processes to run on the same processor in a time-shared manner, giving a view to the user of multiple processes
running in parallel. Thus, coarse-granularity parallel machines logically appear to be identical to single-processor
machines, and database systems designed for time-shared machines can be easily adapted to run on them.
In contrast, machines with fine-granularity parallelism have a large number of processors, and database
systems running on such machines attempt to parallelize single tasks (queries, for example) submitted by users.
Parallelism is emerging as a critical issue in the future design of database systems. Whereas today those computer
systems with multicore processors have only a few cores, future processors will have large numbers of cores. As a
result, parallel database systems, which once were specialized systems running on specially designed hardware, will
become the norm.
Client-Server Systems:- As personal computers became faster, more powerful, and cheaper, there was a shift
away from the centralized system architecture. Personal computers supplanted terminals connected to centralized
systems. Correspondingly, personal computers assumed the user-interface functionality that used to be handled
directly by the centralized systems. As a result, centralized systems today act as server systems that satisfy
requests generated by client systems. Figure 2.8 shows the general structure of a client-server system.
He functionality provided by database systems can be broadly divided into two parts- the front end and the
back end. The back end manages access structures, query evaluation and optimization, concurrency control, and
recovery. The front end of a database system consists of tools such as the SQL user interface, forms interfaces,
report generation tools, and data mining and analysis tools. The interface between the front end and the back end is
through SQL, or through an application program. Certain application programs, such as spreadsheets and statistical-
analysis packages, use the client–server interface directly to access data from a back-end server. In effect, they
provide front ends specialized for particular tasks.
Some transaction-processing systems provide a transactional remote procedure call interface to connect
clients with a server. These calls appear like ordinary procedure calls to the programmer, but all the remote
procedure calls from a client are enclosed in a single transaction at the server end. Thus, if the transaction aborts,
the server can undo the effects of the individual remote procedure calls.
We have already learned that a centralized database is basically a type of database that is stored, located
as well as maintained at a single location only. This type of database is modified and managed from that location
itself. This location is thus mainly any database system or a centralized computer system. The centralized location is
accessed via an internet connection (LAN, WAN, etc). This centralized database is mainly used by institutions or
organizations.
On the other hand, a distributed database is basically a type of database which consists of multiple
databases that are connected with each other and are spread across different physical locations. The data that is
stored on various physical locations can thus be managed independently of other physical locations. The
communication between databases at different physical locations is thus done by a computer network. A comparison
between centralized and distributed databases is illustrated in Table 2.1.
2.6 Summary:-
The data model is an integrated collection of concepts for describing and manipulating data, relationships between
data, and constraints on the data in an organization.
Object-based data models use concepts such as entities, attributes, and relationships. In a record-based model, the
database consists of a number of fixed-format records, possibly of differing types. Physical data models describe
how data is stored in the computer, representing information such as record structures, record orderings, and access
paths.
A relational database is based on the relational model and uses a collection of tables to represent both data and the
relationships among those data.
A distributed database is a logically interrelated collection of shared data (and a description of this data), physically
distributed over a computer network.
Centralized database systems are those that run on a single computer system and do not interact with other
computer systems. Such database systems span a range from single-user database systems running on personal
computers to high-performance database systems running on high-end server systems
Conceptual Modeling: It is also known as conceptual database design. It is the process of constructing a model of
the information used in an enterprise that is independent of implementation details.
Tuple: Tuples are used to store multiple items in a single variable.
Domain: For each attribute of relation, there is a set of permitted values, called the domain of that attribute.
Granularity: In parallel computing, granularity (or grain size) of a task is a measure of the amount of work (or
computation) that is performed by that task.
Coarse-grained parallelism: In coarse-grained parallelism, a program is split into large tasks. Due to this, a large
amount of computation takes place in processors.
Fine-grained parallelism: In fine-grained parallelism, a program is broken down into a large number of small tasks.
These tasks are assigned individually to many processors