DBMS Queries Overview
DBMS Queries Overview
• Data
• Information
• Database
• DBMS
• Table/relation
• Oracle
• SQL
What is SQL?
• SQL is structured Query Language which is a
computer language for storing, manipulating
and retrieving data stored in relational
database.
• SQL is the standard language for Relation
Database System. All relational database
management systems like MySQL, MS Access,
Oracle, Sybase, Informix and SQL Server uses
SQL as standard database language.
SQL Components
• DDL– create table/rename/alter/drop/
truncate
• DML—insert/update/delete
• DCL---grant/revoke
• TCL---commit/rollback
• DQL--Select
Basic data types
• Char
• Varchar(size)/varchar2
• Date
• Number
• Long
• Raw/long raw
Create Command
• This command is used to create/generate new tables.
• Syntax
CREATE TABLE table name
(column name datatype(size), column name datatype(size)
--- );
• Example
CREATE TABLE Student
(std_name varchar2(20), father_name varchar2(20), DOB
number(10), address varchar2(20));
Insert Command
• This command is used to insert rows into the table
containing already defined columns.
• Syntax
INSERT INTO table name
(column name, column name, --- )
VALUES (expression, expression, ---);
• Example
INSERT INTO Student
(std_name, father_name, DOB, address)
VALUES (’Abid’, ’Ibrahim’, 1984, ‘KARAK”);
Select Command
• This command is used for viewing/retrieving of data from
table.
• Syntax
SELECT (column name, column name, --- )
FROM table name;
• Examples
• Retrieving student names and father names from student
SELECT (std_name, father_name)
FROM Student;
• Retrieving all records from student
SELECT * FROM Student;
• Selected columns and all rows
SELECT (std_name, father_name)
FROM Student;
• Selected rows and all columns
SELECT * FROM Student where DOB=1984;
• Selected columns and selected rows
SELECT (std_name, DOB) FROM Student where
address=‘KARAK’;
• Elimination of Duplicates from Select
SELECT DISTINCT * FROM Student;
Delete Command
• Syntax
RENAME Old table name TO New table;
• Example
RENAME Student TO Personal Data;
Drop Command
• Syntax
DROP TABLE table name;
• Example
DROP Table Student;
Describe Command
• Syntax
DESCRIBE table name;
• Example
DESCRIBE Student;
• SQL Components: DDL/DML/DCL/TCL
•
• create table emp
• (
• rno number(5),
• name varchar2(15),
• marks number(5))
• …………………………
•
• update emp set marks=null where marks=55;
• ………………………….
•
• update emp set marks=null;
• ……………………………….
• select rowid from emp;
• --------------------------------------
•
• delete from emp where rowid='AAADVTAABAAAKS6AAC';
• ……………………………………
Integrity constraints
• Primary key
• Foreign key
• Check
• Not null
• Unique
• Default
• NOT NULL Constraint: Ensures that a column cannot have
NULL value.
• DEFAULT Constraint : Provides a default value for a column
when none is specified.
• UNIQUE Constraint: Ensures that all values in a column are
different.
• PRIMARY Key: Uniquely identified each rows/records in a
database table.
• FOREIGN Key: Uniquely identified a rows/records in any
another database table.
• CHECK Constraint: The CHECK constraint ensures that all
values in a column satisfy certain conditions.
Primary key
• alter table emp
• add primary key(rno);
Default
• alter table emp
• modify name default 'aaa'
Check
• alter table emp
• add check(rno>5);
Foreign key
• Alter table emp add foreign key (dno)
references dept(depno);
Unique
• ALTER TABLE Persons
ADD UNIQUE (P_Id)
NOT NULL
• Alter table emp
modify esal number(5) not null;
Joins
• 1. The purpose of a join is to combine the data
across tables.
• 2. A join is actually performed by the where
clause which combines the specified rows of
tables.
• 3. If a join involves in more than two tables
then oracle joins first two tables based on the
joins condition and then compares the result
with the next table and so on.
Types of Joins
• Natural join
• Inner join
• Outer join
✓ left outer join
✓ right outer join
✓ full outer join
• Self join
• Cross join (Cartesian product)
• Equi & Non Equi joins
Assume that we have the following tables.
• 1. EQUI JOIN
• A join which contains an equal to ‘=’ operator
in the joins condition.
• Ex: SQL> select eno,ename,esal,dname from
emp ,dept where emp.deptno=dept.deptno;
• 2. NON-EQUI JOIN
• A join which contains an operator other than
equal to ‘=’ in the joins condition.
• Ex: SQL> select eno,ename,esal,dname from
emp ,dept where emp.deptno>dept.deptno;
• 3. SELF JOIN
Joining the table itself is called self join.
Delhi
Sales per item type per branch Sales
for first quarter. Manager
Chennai
Banglore
Solution 1:ABC Pvt Ltd.
Report
Delhi
Query & Sales
Data Analysis tools Manager
Warehouse
Chennai
Banglore
Scenario 2
Report
Data Entry
Operator
Report
Data Entry
Operator
Scenario 3
sales
time
Improvemen
t
Data warehousing
• It is the process which prepares the basic
repository of data that becomes the data
source where we extract information from.
• Data warehouse is subject Oriented,
Integrated, Time-Variant and nonvolatile
collection of data that support of
management's decision making process. Let's
explore this Definition of data warehouse.
• A data warehouse is built by extracting data
from multiple heterogeneous and external
sources ,cleansing to detect errors in the data
and rectify them wherever possible ,
integrating ,transforming the data from legacy
format to warehouse format and then loading
the data after sorting and summarizing.
What is a Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained separately from the
organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision-
making process.”—W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses
55
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer, product,
sales
• The Data warehouse is subject oriented because it provide us
the information around a subject rather the organization's
ongoing operations.
• These subjects can be product, customers, suppliers, sales,
revenue etc.
• The data warehouse does not focus on the ongoing operations
rather it focuses on modelling and analysis of data for decision
making.
56
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources such as relational databases, flat files, on-line
transaction records. This integration enhance the effective
analysis of data.
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.
57
Data Warehouse—Time Variant
• The Data in Data Warehouse is identified with a particular time
period. The data in data warehouse provide information from
historical point of view.
• The time horizon for the data warehouse is significantly longer than
that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time
element”
58
Data Warehouse—Nonvolatile
• Non volatile means that the previous data is not removed when
new data is added to it. The data warehouse is kept separate from
the operational database therefore frequent changes in operational
database are not reflected in data warehouse.
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data warehouse
environment
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
59
• Metadata - Metadata is simply defined as data about data.
The data that are used to represent other data is known as
metadata. For example the index of a book serves as
metadata for the contents in the book.In other words we can
say that metadata is the summarized data that lead us to the
detailed data.
A Data Warehouse Is A Process
Data Characteristics
• Raw Detail • Integrated • History • Targeted
• No/Minimal History • Scrubbed • Summaries • Specialized (OLAP)
End User
Workstations
Central
Repository
•Extract •Load
•Design •Replication •Access & Analysis
•Scrub •Index
•Mapping •Data Set Distribution •Resource Scheduling & Distribution
•Transform •Aggregation
Meta Data
System Monitoring
There Are Many Options
Operational User
Source Workstations
Systems E Operational
x Data Store
t
r
a
c Architected
t Data Mart
i
o Data
n Warehouse
S
y
s
t
e
m Independent
s Data Mart
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
63
Why a Separate Data Warehouse?
High performance for both systems
DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
Warehouse—tuned for OLAP: complex OLAP queries, multidimensional
view, consolidation
Different functions and different data:
missing data: Decision support requires historical data which
operational DBs do not typically maintain
data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
Note: There are more and more systems which perform OLAP analysis
directly on relational databases
64
Overview of ETL
• In Computing; Extract, Transform and Load (ETL)
refers to
– Extract data from outside source
– Transforms it to fit as per Business need
– Loads it into end target
• Load
The load phase loads the data into the end target, usually the data warehouse (DW)
but can be in any other format such as flat file, relation database.
Overview of ETL
• Performance
ETL vendors benchmark their record-systems at multiple TB (terabytes) per hour
(or ~1 GB per second) using powerful servers with multiple CPUs, multiple hard
drives, multiple gigabit-network connections, and lots of memory. The fastest ETL
record is currently held by Syncsort, Vertica and HP at 5.4TB in under an hour,
which is more than twice as fast as the earlier record held by Microsoft and Unisys.
Overview of ETL
▪ Parallel Processing
ETL software follows parallel processing. This has enabled a number of methods to
improve overall performance of ETL processes when dealing with large volumes of
data.
ETL applications implement three main types of parallelism:
– Data: By splitting a single sequential file into smaller data files to provide
parallel access.
– Pipeline: Allowing the simultaneous running of several components on the
same data stream. For example: looking up a value on record 1 at the same
time as adding two fields on record 2.
– Component: The simultaneous running of multiple processes on different data
streams in the same job, for example, sorting one input file while removing
duplicates on another file.
Extraction, Transformation, and Loading (ETL)
Data extraction
get data from multiple, heterogeneous, and external sources
Data cleaning
detect errors in the data and rectify them when possible
Data transformation
convert data from legacy or host format to warehouse format
Load
sort, summarize, consolidate, compute views, check integrity,
and build indicies and partitions
Refresh
propagate the updates from the data sources to the
warehouse
70
Metadata Repository
Meta data is the data defining warehouse objects. It stores:
Description of the structure of the data warehouse
schema, view, dimensions, hierarchies, derived data defn, data mart
locations and contents
Operational meta-data
data lineage (history of migrated data and transformation path), currency
of data (active, archived, or purged), monitoring information (warehouse
usage statistics, error reports, audit trails)
The algorithms used for summarization
The mapping from operational environment to the data warehouse
Data related to system performance
warehouse schema, view and derived data definitions
Business data
business terms and definitions, ownership of data, charging policies
71
Data Warehouse: A Multi-Tiered Architecture
Monitor
& OLAP Server
Other Metadata
sources Integrator
Analysis
Query
Operational Extract
Serve Reports
DBs Transform Data
Data mining
Load Warehouse
Refresh
Data Marts
• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
Knowledge Discovery (KDD) Process
• This is a view from typical database
systems and data warehousing
Pattern Evaluation
communities
• Data mining plays an essential role in
the knowledge discovery process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
• 1. Data cleaning (to remove noise and inconsistent
data)
• 2. Data integration (where multiple data sources
may be combined)
• 3. Data selection (where data relevant to the analysis
task are retrieved from the database)
• 4. Data transformation (where data are transformed
or consolidated into forms appropriate for mining by
performing summary or aggregation operations, for
instance)
• 5. Data mining (an essential process where
intelligent methods are applied in order to extract
data patterns)
• 6. Pattern evaluation (to identify the truly interesting
patterns representing knowledge based on some
interestingness measures)
• 7. Knowledge presentation (where visualization and
knowledge representation techniques are used to
present the mined knowledge to the user)
ARCHITECTURE OF DATA MINING
REPRESENTATION FOR VISUALIZING
THE DISCOVERED PATTERNS
Data cleaning
Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
Data integration
Integration of multiple databases, data cubes, or files
Data transformation
Normalization and aggregation
Data reduction
Obtains reduced representation in volume but produces the same or
similar analytical results
Data discretization
Part of data reduction but with particular importance, especially for
numerical data
Forms of data preprocessing
Data Cleaning