0% found this document useful (0 votes)
72 views45 pages

BPA DBMS Chapter1 - DBMS Overview

The document discusses database concepts and history. It provides definitions of key terms like database, data, information and metadata. It describes database management systems (DBMS), their functions, advantages and applications. The document also discusses different methods of record organization like flat files and inverted indexes. It covers the history and evolution of databases and various file organization techniques.

Uploaded by

CD PA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views45 pages

BPA DBMS Chapter1 - DBMS Overview

The document discusses database concepts and history. It provides definitions of key terms like database, data, information and metadata. It describes database management systems (DBMS), their functions, advantages and applications. The document also discusses different methods of record organization like flat files and inverted indexes. It covers the history and evolution of databases and various file organization techniques.

Uploaded by

CD PA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Prabin Babu Dhakal, CDPA, TU

Database Concepts

दत्तांशसांग्रह अवधतरणत
Prabin Babu Dhakal, CDPA, TU

IceCube Neutrino
Observatory

Big science is data


driven. 2
Prabin Babu Dhakal, CDPA, TU

How CBS uses data?


• CBS collects, stores,
analyzes, and
disseminate data.
• Data necessary for
every decision
making
• Government and
every other
organizations guilty
of using lackluster
evidence to justify
agendas or
assumptions in
absence of data
Prabin Babu Dhakal, CDPA, TU

Increasingly many companies see


themselves as data driven.
4
Prabin Babu Dhakal, CDPA, TU

THE WORLD IS INCREASINGLY


DRIVEN BY DATA…

This class teaches the basics of


how to use & manage data

5
Prabin Babu Dhakal, CDPA, TU

Turing Award for Innovations in


RDBMSs in 2014
“The Nobel of Computing”
• 2014 Turing Award Winner: Michael
Stonebraker
• Helped to invent many RDBMS (Relational
DBMS) concepts:
– Query modification
– The Object-Relational model
– More recently: work on column-store,
streaming data
• Made / helped to start many popular
RDBMS implementations:
– Postgres, Vertica, Streambase, VoltDB, …

The relational data model is one of the most important concepts in computing! 6
Prabin Babu Dhakal, CDPA, TU

Nepal Government Data ecosystem


Prabin Babu Dhakal, CDPA, TU

Why should you study databases?


• Mercenary- make more $$$:
– Startups need DB talent right away
• Intellectual:
– Science: data poor to data rich
• No idea how to handle the data!
– Fundamental ideas to all the field of study
– More important for interdisciplinary fields

Many great computer systems ideas started in DB.


8
Prabin Babu Dhakal, CDPA, TU

What is expected from you


• Attend lectures
– If you don’t, it’s at your own peril
• Be active and think critically
– Ask questions
• Do programming and homework projects
– Start early and be honest
• Study for tests and exams

9
Prabin Babu Dhakal, CDPA, TU

What this course is (and is not)


• Discuss fundamentals of data management
– Different data storage technologies
– How to design databases, query databases
– Accessing data from Programming to use in MIS
– Not how to be a DBA or how to tune Oracle 12g.
• We’ll cover how database management
systems work
• But not in-depth principles of how to build
them
10
Prabin Babu Dhakal, CDPA, TU

Introduction to DBMS
• Characteristics and definition of database and
database management system
• Database history
• Organization of records
• File organizations

• Read: https://en.wikipedia.org/wiki/Database
Prabin Babu Dhakal, CDPA, TU

Definitions
• Database: organized collection of logically related
data
• Data: stored representations of meaningful objects
and events
– Structured: numbers, text, dates
– Unstructured: images, video, documents
• Information: processed data that increase
knowledge in the person using the data
• Metadata: data that describes the properties and
context of user data
Prabin Babu Dhakal, CDPA, TU

Database (दत्तांशनिधध)
• A Database is a collection of related data
• Organized in a way that data can be easily accessed,
managed and updated
• DB is large and integrated collection of data
• Any piece of information can be a data, for example
name of your University – TU
• Database is actually a place where related piece of
information is stored and various operations can be
performed on it
• Models a real-world enterprise
– Entities (generally name e.g., Students study Courses)
– Relationships (generally verb e.g., Teacher teach student)
Prabin Babu Dhakal, CDPA, TU

Database Management System


• DBMS is a software that allows creation,
definition, manipulation, and administration of
database
• DBMS is actually a tool used to perform any kind
of operation on data in database.
• Provides protection and security to database
• Maintains data consistency in case of multiple
users
• Example of popular DBMS - Oracle, MySQL,
MsSQL, Sybase, PostgreSQL, Microsoft Access,
IBM DB2 etc.
Prabin Babu Dhakal, CDPA, TU

Database Management System


• A software system that is used to create, maintain, and
provide controlled access to user databases
Order Filing
System

Invoicing Central database


System DBMS
Contains employee,
order, inventory,
pricing, and
Payroll
customer data
System

• DBMS manages data resources like an operating system


manages hardware resources
Prabin Babu Dhakal, CDPA, TU

Components of Database systems


Users Database
Management Database
Database System
Application

• Users : Users may be of various type such as DB


administrator, System developer and End users.
• Database application : Database application may be
Personal, Departmental, Enterprise and Internal E.g.
Websites, E-governance portal, MIS, DSS etc.
• DBMS : Software that allow users to define, create and
manages database, E.g: MySql, Oracle etc.
• Database : Integrated collection of data
Prabin Babu Dhakal, CDPA, TU

Functions of DBMS
• Data Independence
• Concurrency Control
• Recovery services
• Utility services
• Provides a clear and logical view of the
process that manipulates data
Prabin Babu Dhakal, CDPA, TU

Advantages of DBMS
• Segregation of application and data
• Minimal and planned data redundancy
(duplication)
• Improved data consistency
• Easier data retrieval, sharing, accessibility and
responsiveness
• Increased productivity
• Enforcement of security and standards
• Reduced development time and maintenance
need
Prabin Babu Dhakal, CDPA, TU

Disadvantages of DBMS
• New specialized manpower needed
• Installation and management cost and
complexity increased
• Complexity
• Large in size
• Overkill for small processes
Prabin Babu Dhakal, CDPA, TU

Applications of DBMS
• Support internal operations of organizations,
online interactions with customers and suppliers
etc.
• Hold administrative information and more
specialized data, such as engineering data or
economic models
• Examples: computerized library systems, flight
reservation systems, computerized parts
inventory systems, content management systems
that store websites as collections of webpages in
a database
Prabin Babu Dhakal, CDPA, TU

History
• Navigational Databases developed in 1960s
hierarchical and network models
• RDBMS developed in 1970s to manage tabular
data and later SQL developed
• Widespread deployment of RDBMS in 1980s
• Object-relational databases developed in 1980s
• RDBMS dominated database world since 1990s
• New generation NoSQL database developed in
later 2000s
Prabin Babu Dhakal, CDPA, TU

History and Evolution


Prabin Babu Dhakal, CDPA, TU

Record organization
• Flat file
– Plain text file or Binary file; usually one record per line
– Delimited, fixed length or indexed
– Used in smaller system and simple data
• Inverted Index
– index data structure
– Stores mapping from words or numbers, to its locations
in database file, or document
– Allow fast full text searches
– Used on a large scale e.g. in search engines
Prabin Babu Dhakal, CDPA, TU

File and record organization


• Many alternative file organizations exist, each
appropriate in some situation
• If selection queries are frequent, sorting the
file or building an index is important
• Indexes support efficient retrieval of records
based on the values in some fields
• Understanding the nature of the workload for
the application, and the performance goals, is
essential to developing a good design
Prabin Babu Dhakal, CDPA, TU

File organization
• Method of arranging the records in a file when
the file is stored on disk. A relation is typically
stored as a file of records

• Store records in a file in a collection of disk


pages
• Keep track of pages allocated to each file.
• Track available space within pages allocated to
the file
Prabin Babu Dhakal, CDPA, TU

Think
• Do you think it is a good idea to sort the
records stored in a file? Why?
Prabin Babu Dhakal, CDPA, TU

Data dictionary
• Dictionary about the data that we store in the database
• Contains all the information about the data objects
• It is like storing all up-to-date information about the objects
like tables, columns, index, constraints, functions etc.
• Makes us easily identify access and understand the factors
about the object.
• In the case of a table, data dictionary provides information
about
– Its name
– Security information like who is the owner of the table, when
was it created, and when it was last accessed.
– Physical information like where is the data stored for this table
– Structural information like its attribute names and its data types,
constraints and indexes.
Prabin Babu Dhakal, CDPA, TU

Record organization in a file


• Heap File Organization
• Sequential File Organization
• Indexed sequential file organization
• Hash File Organization
• B+ Tree file organization
• Clustered File Organization
Prabin Babu Dhakal, CDPA, TU

Heap File Organization


• When a file is created using Heap File
Organization, the Operating System allocates
memory area to that file without any further
accounting details.
• Records can be placed anywhere in that
memory area.
• It is the responsibility of the software to
manage the records.
• Heap File does not support any ordering,
sequencing, or indexing on its own.
Prabin Babu Dhakal, CDPA, TU

Sequential File Organization


• Every file record contains a data field
(attribute) to uniquely identify that record.
• Records are placed in the file in some
sequential order based on the unique key field
or search key. E.g. date, id or name.
• Practically, it is not possible to store all the
records sequentially in physical form.
• Sorting the records makes it easier for search
Prabin Babu Dhakal, CDPA, TU

What is an index

1 Record for index 2


2
3 Record for index 1
4 Record for index 3
5 Record for index 5
6
Record for index 4
7
Record for index 6
Record for index 7
Prabin Babu Dhakal, CDPA, TU

Indexed File Organization


• Contain two sets of information – Index, Record
• Record can be anywhere in the file.
• Record location is given by index
• Index is sorted
• Easier to sort only index compared to sorting
whole records
• Faster access
Prabin Babu Dhakal, CDPA, TU

Hash File Organization


• Hash File Organization uses Hash function
computation on some fields of the records.
• The output of the hash function determines
the location of disk block where the records
are to be placed.
• Unique identifier field in record is used to
compute hash
• No need to store index separately. Hash does
the work.
Prabin Babu Dhakal, CDPA, TU

Tree file organization


• Data stored hierarchically, sorted like in tree
• Contains two parts
– Nodes
– Edges
• Nodes are connected by edges
Prabin Babu Dhakal, CDPA, TU

Clustered File Organization


• In this mechanism, related records from one
or more relations are kept in the same disk
block, that is, the ordering of records is not
based on primary key or search key.
• Ordering based on common property of data
e.g.
– Near each other
– Related organization
– Range of data etc.
– Related to each other in some apect
Sequential Heap/Direct Hash ISAM B+ tree
Prabin Babu Dhakal,
Cluster
CDPA, TU

Method of Stored as they Stored at the end Stored at the hash Address index is Stored in a tree like Frequently joined
storing come of the file. address generated appended to the structure tables are clubbed
record into one file based
on cluster key
Types Simple file and Static and dynamic Dense, Sparse, Indexed and Hash
sorted file hashing multilevel indexing
Design Simple Simplest Medium Complex Complex Simple

Storage Cost Cheap Cheap Medium Costlier Costlier Medium

Advantage Fast and Best suited for Faster Access Searching records is Searching range of data Best suited for
efficient when bulk insertion, No Need to Sort faster. Suitable for efficient. No frequently joined
there is large and small Handles multiple large database. Any performance degrades tables.
volumes of files/tables transactions of the columns can when there is insert / Suitable for 1:M
data, Report Suitable for Online be used as key delete / update. mappings
generation, transactions column. Searching Grows and shrinks with
statistical range of data & data. Works well in
calculations etc partial data are secondary storage.
efficient. Searching is easy.
All data sorted
Dis- Sorting of data Records are Use of Memory is Extra cost to Not suitable for static Not suitable for
advantage each time for scattered and inefficient maintain index. tables large database.
insert/delete/ memory Searching range of File reconstruction Suitable only for
update takes inefficiently used. data, partial data, is needed as the joins on which
time and makes Proper memory non-hash key insert/update/delet clustering is done.
system slow. management is column, or e. Less frequently
needed. Not frequently updated Does not grow with used joins and 1: 1
suitable for large column as hash key data. Mapping are
tables. are inefficient. inefficient.
Prabin Babu Dhakal, CDPA, TU

Disadvantages of File Processing


• Program-Data Dependence
– All programs maintain metadata for each file they use
• Duplication of Data
– Different systems/programs have separate copies of the same data
• Limited Data Sharing
– No centralized control of data
• Lengthy Development Times
– Programmers must design their own file formats
• Excessive Program Maintenance
– 80% of information systems budget
Prabin Babu Dhakal, CDPA, TU

Problems with Data Dependency


• Each application programmer must maintain
his/her own data
• Each application program needs to include
code for the metadata of each file
• Each application program must have its own
processing routines for reading, inserting,
updating, and deleting data
• Lack of coordination and central control
• Non-standard file formats
Prabin Babu Dhakal, CDPA, TU

Problems with Data Redundancy


• Waste of space to have duplicate data
• Causes more maintenance headaches
• The biggest problem:
– Data changes in one file could cause
inconsistencies
– Compromises in data integrity (प्रमाणिकता
वा पववत्रता)
Prabin Babu Dhakal, CDPA, TU

Old file processing system


Duplicate Data

Prabin Babu Dhakal, CDPA, TU

The Database approach


• Central repository of shared data
• Data is managed by a controlling agent
• Stored in a standardized, convenient form

Requires Database Management System (DBMS)


Prabin Babu Dhakal, CDPA, TU

English to Nepali
• Data: दत्, दत्तांश
• Information: सच ू ित, सचू ्तांश (computer)
• Database: दत्तांशनिधध
• Entity: सत्व
• Relation: सांवन्ध
• Entity relationship diagram: व््भिलेख (computer)
• Schema: प्रबन्न्धकत (computer)
• Syntax: शब्दतिश ु तसि
• SQL: सां्ोगितषत, सांववपष ृ त
Prabin Babu Dhakal, CDPA, TU

DBMS viva questions


• Ques 1. Define dbms and file management system ?
Ans- Database management system (DBMS) is a collection of interrelated
data and a set of programs to access those data. Some of the very well
known DBMS are PostgreSQL, Mysql, Oracle, Microsoft Access, Microsoft
SQL Server, SAP, dBASE, FoxPro, IBM dB2, SQLite etc.
A file management system is an abstraction to store, retrieve,
management and update a set of files. A File Management System keep
track on the files and also manage them. Let’s take an example of
hierarchical Management System using its directories manage the
different files in a tree structure.
• Ques 2. What are the disadvantages of file management system over
DBMS ?
Ans- The disadvantages of file management systems over DBMS are:
a) Data redundancy and inconsistency
b) Difficulty in accessing data
c) Data isolation
d) Integrity problems
e) Atomicity problems
f) Concurrent access anomalies
Prabin Babu Dhakal, CDPA, TU

Viva questions
• Ques 3. Which have more quick response DBMS or file management system ? how ?
Ans- Fast response to information requests: Because data are integrated into a single database,
complex requests can be handled much more rapidly if the data were located in separate, non-
integrated files. In many businesses, faster response means better customer service.
• Ques 4. How DBMS provide program data insulation and data abstraction?
Ans- DBMS provide following levels of data abstraction
a) Physical level
b) logical level
c) view level
• Ques 5. Is there any difference between DBMS and file management system in terms of ACID
properties?
Ans- DBMS ensures data integrity by managing transactions through ACID test = atomicity,
consistency, isolation, durability. While such integrity is absent in file management system.
• Ques 6. What are the advantages of DBMS over file management system ?
Ans- The advantages of DBMS over file management system are:
a) Control redundancy
b) Restrict unauthorized access
c) Provide multiple user interfaces
d) Enforce integrity constraints.
e) Provide backup and recover
Prabin Babu Dhakal, CDPA, TU

Viva questions
• Ques 7. What are basic differences between DBMS and file management system ?
Ans- These are basic differences between DBMS and file management
1) Flexibility
2) Fast response to information requests
3) Multiple access
4) Lower user training costs
5) Less storage
• Ques 8. Is there any goal differences in file management system and DBMS ?
Ans- Following are the goal differences between File Management System and DBMS-
1) Data Management- An FMS should provide data management services to the application.
Generality with respect to storage devices. The FMS data abstractions and access methods should
remain unchanged irrespective of the devices involved in data storage.
2) Validity- An FMS should guarantee that at any given moment the stored data reflect the
operations performed on them.
3) Protection- Illegal or potentially dangerous operations on the data should be controlled by the
FMS.
4) Concurrency- In multiprogramming systems, concurrent access to the data should be allowed
with minimal differences.
5) Performance- Compromise data access speed and data transfer rate with functionality.
• Ques 9. Which one take utilization of resources DBMS or file management system and how ?
Ans- Database is multi user and provides utilization of resources whereas in file management there
is no utilization of resources.
• Ques 10. How data redundancy is controlled in DBMS while not in file management system ?
Ans- It reduces data redundancy (duplication of data) and maximizes database integrity (data
without errors). So DBMS is less redundant than file management system.

You might also like