0% found this document useful (0 votes)
114 views

Introduction To Databases: Name: Akanksha Sharma

The document provides an introduction to database concepts and relational database management systems (RDBMS). It defines key terms like data, information, and knowledge. It then explains what a DBMS is and typical database applications. It describes different data models including flat file, hierarchical, network, and relational models. It provides examples of using Oracle PL/SQL including basic structure, variables and types, simple programs, and control flow structures.

Uploaded by

Siddhant Jain
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views

Introduction To Databases: Name: Akanksha Sharma

The document provides an introduction to database concepts and relational database management systems (RDBMS). It defines key terms like data, information, and knowledge. It then explains what a DBMS is and typical database applications. It describes different data models including flat file, hierarchical, network, and relational models. It provides examples of using Oracle PL/SQL including basic structure, variables and types, simple programs, and control flow structures.

Uploaded by

Siddhant Jain
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to

Databases
Name : Akanksha Sharma
Database Concepts
Data ?
Data refers to a collection of natural phenomena descriptors, including the
results of experience, observation or experiment, or a set of premises. This may
consist of numbers, words, or images, particularly as measurements or
observations of a set of variables.
Information ?
Information is a quality of a message from a sender to one or more receivers.
Information is always about something like size of a parameter, occurrence of
an event, etc. Information does not have to be accurate. It may be a truth or a
lie, or just the sound of a falling tree.
Knowledge?
Knowledge is used to mean the confident understanding of a subject with the
ability to use it for a specific purpose if appropriate .
DBMS
DBMS is a set of software programs that control the organization,
storage, management, and retrieval of data in a database.

DBMS includes:
 A modeling language to define the schema of each database
hosted in the DBMS, according to the DBMS data model.
 Data structures (fields, records, files and objects) optimized to deal
with very large amounts of data stored on a permanent data storage
device.
 A database query language and report writer to allow users to
interactively interrogate the database, analyze its data and update it
according to the users privileges on data.
 A transaction mechanism, that would guarantee the ACID
properties, in order to ensure data integrity, despite concurrent user
accesses (concurrency control), and faults (fault tolerance).
Typical Database Applications
 Traditional (Employee, student, product database)
 Online Shopping
 Search Engines
 Data Warehousing (OLAP)
 Data Mining
 Geographical Information Systems
Data-Level Models
 Flat File Structure
A database with a single table is called a flat file structure. A flat-file structure is good only
for extremely simple databases and not practical for most business applications. Many
spreadsheets include some database features like sorting entries and counting or
summarizing entries that meet certain criteria.
 Hierarchical Data Model
The hierarchical data model is set up like a "forest" or collection of tree structures. The
hierarchical data model is a special case of the network data model. This data model is
very efficient for certain kinds of applications where the data to be modeled is also like a
tree. The best-known hierarchical database management system is IBM's IMS.
 Network Data Model
The network data model is similar to the entity-relationship model with all relationships
restricted to be binary, many-one relationships. This restriction allows a simple directed
graph model to be used. The network data model is fast, but it is difficult to conceptualize
complex data structures using this model. An example of a network database management
system is IDMS.
 Relational Data Model
Relational model is based on predicate logic and set theory. You have sets of statements
of fact, and the underlying system can determine new sets of facts .The real power comes
from your complete control over determining new facts. All relationships between facts are
explicit in the database, and the command language can use and manipulate them. The
mathematics behind the model make this manipulation feasible.
RDBMS
 Relation: Two dimensional table
The relation itself corresponds to our familiar notion of a table:
A relation is a collection of tuples, each of which contains
values for a fixed number of attributes. Relations are
sometimes referred to as flat files, because of their
resemblance to an unstructured sequence of records. Each
Vendor Global
tuple in a relation must be unique -- that is, there can be no Revenue
duplicates.
Oracle 7,312
 Attribute: Table column
Other commonly used terms for attribute are 'property' and IBM 3,483
'field.' The set of permissible values for each attribute is called
the domain for that attribute. Microsoft 3,052
 Tuple: Table row Sybase 524
A tuple is an instance of an entity or relationship or whatever
is represented by the relation.
NCR 457
 Key: A single attribute or combination of attributes whose Teradata
values uniquely identify the tuples of the relation. That is,
each row has a different value for the key attribute(s). The
relational model requires that every relation have a key and
that for any tuple in the relation, the key fields have non-null
values -- no two tuples may have the same key value and
every tuple must have a value for the key attribute.
Case Study : Oracle
Case Study : Oracle
 Oracle Database Fundamentals
Oracle stores each data item in its own field . In Oracle, the fields relating to a particular person, thing, or event are
bundled together to form a single, complete unit of data, called a record . Each record is made up of a number of
fields. No two fields in a record can have the same field name. Oracle stores records relating to each other in a
table. A table consists of a number of records . Each field occupies one column and each record occupies one row
. Different tables are created for the various groups of information. Every table in Oracle has a field or a
combination of fields that uniquely identifies each record in the table. When a field in one table matches the
primary key of another table, the field is referred to as a foreign key. When a foreign key exists in a table, the
foreign key's table is sometimes referred to as a lookup table .

 Creating Database Tables


- create table tablename (columnname type, columnname type ...);
- describe department;
- alter table employee add ("Joining Date" date);
- alter table employee modify (Phone number);
- alter table tablename drop column columnname;

 Inserting Data
- insert into tablename (columnname, columnname, ...) values (somevalue, somevalue, ...);

 Selecting Data
- select columnname, columnname... from tablename;
- select "First Name"||' '||'Last Name' 'Full Name' from employee where deptid=1 and salary>5000;

 Updating Data
- update tablename set columnname=somevalue where conditions;
- delete from tablename where conditions;
- drop table tablename;
Case Study : Microsoft
RDBMS Concepts
Using Oracle PL/SQL
 Basic Structure of PL/SQL
 Variables and Types
 Simple PL/SQL Programs
 Control Flow in PL/SQL
Basic Structure
PL/SQL stands for Procedural Language/SQL. It extends SQL by adding constructs found
in procedural languages, resulting in a structural language that is more powerful than SQL.
The basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which
can be nested within each other and each block performs a logical action in he program. A
block has the following structure:

DECLARE
/* Declarative section: variables, types, and local subprograms. */
BEGIN
/* Executable section: procedural and SQL statements go here. */
/* This is the only section of the block that is required. */
EXCEPTION
/* Exception handling section: error handling statements go here. */
END;

Only the executable section is required. The other sections are optional.
The only SQL statements allowed in a PL/SQL program are SELECT, INSERT, UPDATE,
DELETE and several other data manipulation statements plus some transaction control.
Variables
 The DECLARE section defines and (optionally) initialises
variables. If not initialised specifically they default to NULL.

DECLARE
number1 NUMBER(2);
number2 NUMBER(2) := 17;
text VARCHAR2(12) := 'Hello world';
text DATE := SYSDATE; -- current date and time
BEGIN
SELECT street_number
INTO number1
FROM address
WHERE name = 'Smith';
END;

 Symbol := is the assignment operator to store a value in a variable.


 The major datatypes in PL/SQL include NUMBER, INTEGER,
CHAR, VARCHAR2, DATE, TIMESTAMP, TEXT etc.
Simple Program in PL/SQL
CREATE TABLE T1(
e INTEGER,
f INTEGER
);

DELETE FROM T1;

INSERT INTO T1 VALUES(1, 3);

INSERT INTO T1 VALUES(2, 4);

/* Above is plain SQL; below is the PL/SQL program. */

DECLARE
a NUMBER;
b NUMBER;
BEGIN
SELECT e,f INTO a,b FROM T1 WHERE e>1;
INSERT INTO T1 VALUES(b,a);
END;
.
run;
Control Flow in PL/SQL
An IF statement looks like:

IF <condition> THEN <statement_list> ELSE <statement_list> END


IF;

The ELSE part is optional. If you want a multiway branch, use:


IF <condition_1> THEN ...
ELSIF <condition_2> THEN ...
... ...
ELSIF <condition_n> THEN ...
ELSE ...
END IF;
Control Flow in PL/SQL
Loops are created with the following:

LOOP

<loop_body> /* A list of statements. */

END LOOP;

At least one of the statements in <loop_body> should be an EXIT


statement of the form
EXIT WHEN <condition>;
The loop breaks if <condition> is true.
Examples
LOOPING : CONDITIONAL :

DECLARE DECLARE
a NUMBER; i NUMBER := 1;
b NUMBER; BEGIN
BEGIN LOOP
SELECT e,f INTO a,b FROM T1 INSERT INTO T1 VALUES(i,i);
WHERE e>1; i := i+1;
IF b=1 THEN EXIT WHEN i>100;
INSERT INTO T1 VALUES(b,a); END LOOP;
ELSE END;
INSERT INTO T1 .
VALUES(b+10,a+10); run;
END IF;
END;
.
run;
Joins
 CROSS JOIN (Cartesian product) is the simplest join;
 INNER JOIN (sometimes called the "EQUI-JOIN")
where tables are combined based on a common
column;
 OUTER JOIN which involves combining all rows of
one table with only matching rows from the other
table;
 SELF JOIN which is a table joined to itself.
Cross Join
A cross join returns the cartesian product of the sets of records
from the two joined tables. If A and B are two sets, then cross
join = A × B.

Examples :

Explicit –
SELECT *
FROM employee CROSS JOIN department

Implicit –
SELECT *
FROM employee, department;
Inner Joins
 An equi-join, also known as an equijoin, is a specific
type of comparator-based join, or theta join, that
uses only equality comparisons in the join-predicate.
Using other comparison operators (such as <)
disqualifies a join as an equi-join.
Example –
SELECT *
FROM employee
INNER JOIN department
ON employee.DepartmentID =
department.DepartmentID
 Natural join
Outer Joins
 Left outer join
The result of a left outer join for tables A and B always contains
all records of the "left" table (A), even if the join-condition does
not find any matching record in the "right" table (B). This means
that a left outer join returns all the values from the left table, plus
matched values from the right table (or NULL in case of no
matching join predicate).

Example –

SELECT *
FROM employee LEFT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID
Outer Joins
 Right outer join
Every record from the "right" table (B) will appear in the joined
table at least once. If no matching row from the "left" table (A)
exists, NULL will appear in columns from A for those records that
have no match in A. A right outer join returns all the values from
the right table and matched values from the left table (NULL in
case of no matching join predicate).

Example –

SELECT *
FROM employee RIGHT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID
Self Join
 A self-join is simply a Employees
normal SQL join that joins
one table to itself. This is EmployeeID EmployeeName ManagerID

accomplished by using 61 Sue Smith (null)


table name aliases to give
each "instance" of the table 62 David Jones 61

a separate name. 63 Troy Parker 61

64 Claire Smith-Jones 63

65 Grover Rivers 63
Example –
SELECT E1.EmployeeName AS Employee,
E2.EmployeeName AS Manager
FROM Employees AS E1 INNER JOIN Employees AS E2
ON E1.ManagerID = E2.EmployeeID
Normalization
Introduction

Entity: The word ‘entity’ is the general name for the information that is to be stored within a
single table. Information about the entities is known as attributes.
Primary key: A primary key uniquely identifies a row of data found within a table. When
multiple attributes are used to derive a primary key, this key is known as a concatenated
primary key.
Relationship:
one-to-one (1:1) - A one-to-one relationship signifies that each instance of a given entity
relates to exactly one instance of another entity.
one-to-many (1:M) - A one-to-many relationship signifies that each instance of a given
entity relates to one or more instances of another entity.
many-to-many (M:N) - A many-to-many relationship signifies that many instances of a
given entity relate to many instances of another entity.

Foreign key: A foreign key forms the basis of a 1:M relationship between two tables. The
foreign key can be found within the M table, and maps to the primary key found in the 1
table.
The Three Normal Forms
 First Normal Form
A table is in first normal form (1NF) if there are no repeating
groups.

 How to Normalize ?
- Remove the repeating group of attributes to form a new entity
- Add to it the original key
The Three Normal Forms
 Second Normal Form
A table is in Second Normal Form(2NF) if it is in 1NF and each
non-key field is functionally dependent on the entire primary key.

 How to Normalize ?
- Examine tables with a composite key (a key made up of two
parts)
- For each non-key attribute, determine if its key is the first part,
or the second part, or if neither then the answer is both parts
- Remove the partial key and its dependents to form a new table
The Three Normal Forms
 Third Normal Form
A table is in Third Normal Form(3NF) if it is in 2NF and there are
no transitive dependencies.

 How to Normalize ?
- Identify any dependencies between non-key attributes within
each table
- Remove them to form a new table
- Promote one of the attributes to be the key of the new table
- This becomes the Foreign Key link in the original table (shown
with a *).
Example
Department: ( DepartmentName, SupervisorNumber ) – SupervisorNumber is a
foreign key
Supervisor: ( SupervisorNumber, SupervisorName )
EmployeeDepartment: ( DepartmentName, EmployeeNumber, StartDate )
Employee: ( EmployeeNumber, EmployeeName )
EmployeeProject: ( EmployeeNumber, ProjectNumber, StartDate )
Project: ( ProjectNumber, ProjectName )

To check whether these tables are in NF you must answer the following questions
1. Does the table contain any repeating groups?
If not, and the table has a primary key then it is First normal form (1NF)
2. Does the table contain any partial dependencies?
If not, and it is in 1NF then it is in 2NF
3. Does the table contain any transitive dependencies or derived attributes?
If not, and it is in 2NF then it is in 3NF
Indexing
 Types of Single-level Ordered Indexes
 Primary Indexes

 Clustering Indexes

 Secondary Indexes

 Multilevel Indexes
Types of Single-Level Indexes
 Primary Index
 Defined on an ordered data file
 The data file is ordered on a key field
 Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor
 A similar scheme can use the last record in a block.
 A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.
Primary
index on
the
ordering
key field
Types of Single-Level Indexes
 Clustering Index
 Defined on an ordered data file

 The data file is ordered on a non-key field unlike primary


index, which requires that the ordering field of the data file
have a distinct value for each record.

 Includes one index entry for each distinct value of the field;
the index entry points to the first data block that contains
records with that field value.

 It is another example of nondense index where Insertion and


Deletion is relatively straightforward with a clustering index.
A clustering index on the
DEPTNUMBER ordering
nonkey field of an
EMPLOYEE file.
Clustering index
with a separate
block cluster for
each group of
records that
share the same
value for the
clustering field.
Types of Single-Level Indexes
 Secondary Index
 A secondary index provides a secondary means of accessing a
file for which some primary access already exists.
 The secondary index may be on a field which is a candidate key
and has a unique value in every record, or a nonkey with
duplicate values.
 The index is an ordered file with two fields.
 The first field is of the same data type as some
nonordering field of the data file that is an indexing field.
 The second field is either a block pointer or a record
pointer. There can be many secondary indexes (and
hence, indexing fields) for the same file.
 Includes one entry for each record in the data file; hence, it is a
dense index
A dense
secondary
index (with
block pointers)
on a
nonordering
key field of a
file.
A secondary index (with recored pointers) on a nonkey field
implemented using one level of indirection so that index entries are
of fixed length and have unique field values.
Multi-Level Indexes
 Because a single-level index is an ordered file, we can
create a primary index to the index itself ; in this case, the
original index file is called the first-level index and the
index to the index is called the second-level index.
 We can repeat the process, creating a third, fourth, ..., top
level until all entries of the top level fit in one disk block
 A multi-level index can be created for any type of first-level
index (primary, secondary, clustering) as long as the first-
level index consists of more than one disk block
A two-level
primary index
resembling
ISAM
(Indexed
Sequential
Access
Method)
organization.
Data Warehousing &
Business Intelligence
Introduction
A Data Warehouse (DW) is a
subject-oriented,
integrated,
nonvolatile,
time-variant
collection of data in support of management's decisions.

Business intelligence (BI) – BI systems provide managers with


-Actionable information and knowledge
-At the right time
-At the right location
-In the right form

The knowledge derived from analyzing an organization’s information


Technologies for gathering, storing, analyzing and providing access to data
to help enterprise users make better business decisions
Characteristics of a DW

Data
Operational
Warehouse

Leads Inventory Customers Products

Quotes Regions Time


Orders

Focus is on Subject Areas rather than Applications


On-Line Transaction
Processing (OLTP)
• Database management systems are typically used for on-line
transaction processing.
• OLTP applications normally automate clerical data processing
tasks of an organization, like data entry and enquiry, transaction
handling, etc. (access, read, update)
• Database is current, and consistency and recoverability are
critical. Records are accessed one at a time.
• OLTP Operations:
- are structured and repetitive
- require detailed and up-to-date data
- are short, atomic and isolated transactions
On-Line Analytical
Processing (OLAP)
• On-line analytical processing is essential for decision support.
• OLAP is supported by data warehouses.
• Data warehouse consolidation of operational databases.
• Owing to the hierarchical nature of the dimensions, OLAP
operations view the data flexibly from different perspectives
(different levels of abstractions).
• OLAP operations:
- roll-up: Increase the level of abstraction
- Drill-down: Decrease the level of abstraction
- Slice and dice: Selection and projection
- Pivot: Re-orient the multi-dimensional view
- Drill-through: Links to the raw data
DW - Benefits
• Increase customer profitability

• Cost effective decision making

• Manage customer and business partner relationships

• Manage risk, assets and liabilities

• Integrate inventory, operations and manufacturing

• Reduction in time to locate, access, and analyze information


(Link multiple locations and geographies)

• Identify developing trends and reduce time to market

• Strategic advantage over competitors


Warehouse Architecture

EIS /DSS

Metadata

Select Query Tools


Extract
Transform Data
Integrate Warehouse OLAP/ROLAP
Maintain

Web Browsers
Operational
Systems/Data Middleware/API
Data
Preparation

Enterprise Data Warehouse


DW Architecture Components
Data
Cleansing
Data
Tools
Modeling
Tool Central ROLAP
Metadata Engine Data Access and
Analysis Tools

Source ETL Tool -Managed Query


Central
Databases RDBMS -Desktop OLAP
Warehouse
(RDBMS) -ROLAP
-MOLAP
Local meta - Data Mining
data
MDDB

Warehouse Architected
Admin Tool Datamarts

Warehouse Databases

Data Warehouse Is Not Just About Data... But Tools Too


DW/BI Tools
 ETL Tools
Extract, Transform, and Load (ETL) is a process in data
warehousing that involves -
1. Extracting data from outside sources,
2. Transforming it to fit business needs (which can include
quality levels), and ultimately
3. Loading it into the end target, i.e. the data warehouse.

DW are typically fed asynchronously by a variety of sources


which all serve a different purpose, resulting in e.g. different
reference data. ETL is a key process to bring heterogeneous and
asynchronous source extracts to a homogeneous environment.
Typically the known ETL tools are intended to use in batch mode,
pulling large volumes from different platforms and systems at
schedule times and transforming and integrating the data until it
fits the format to be loaded into a (corporate) multi-dimensional
data warehouse.
DW/BI Tools
 Data mart - a subset of the data warehouse in which only a focused
portion of the data warehouse information is kept
 Other technical components of business intelligence include tools
such as
 Data mining

 Automatic exception detection with proactive alerting and


automatic recipient determination
 Automatic learning

 Data-mining tool - a software tool you use to query information in a


data warehouse
DW/BI Tools
BO-Designer –
 It lets you to create the semantic layer i.e. UNIVERSE which isolates end
users from the technical issues of the database structure
 To create, manage and distribute universes for a particular group of BO and
WebIntelligence users.

The Building Blocks of a Designer are -


 Classes: Logical grouping of objects

 Objects: Most refined component of the Universe. An Object maps to Data


or a derivation of data in the database
 Dimension: Parameters for the analysis (Ex. City)
 Detail: Description of a dimension (Ex. Phone #)
 Measure: Numeric information by which Dimension object can be
measured (Ex. Sales Revenue)
ODS Development Case Study
End-to-End Process Diagram

Oracle Apps

Seibel
Teradata EDW
ODS
Intermediate
Excel Files Tables

Flat Files

Source Systems ETL Process Target DW


Thanks

You might also like