0% found this document useful (0 votes)
90 views98 pages

DBMS Queries Overview

BI can help the Sales Manager of ABC Pvt Ltd by: 1. Integrating data from different branch operational systems into a centralized data warehouse. 2. Developing quarterly sales reports by analyzing and querying consolidated sales data across branches. 3. Providing insights into best selling products and branches to help improve sales performance.

Uploaded by

Abhinay Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views98 pages

DBMS Queries Overview

BI can help the Sales Manager of ABC Pvt Ltd by: 1. Integrating data from different branch operational systems into a centralized data warehouse. 2. Developing quarterly sales reports by analyzing and querying consolidated sales data across branches. 3. Providing insights into best selling products and branches to help improve sales performance.

Uploaded by

Abhinay Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

INTRODUCTION

• Data
• Information
• Database
• DBMS
• Table/relation
• Oracle
• SQL
What is SQL?
• SQL is structured Query Language which is a
computer language for storing, manipulating
and retrieving data stored in relational
database.
• SQL is the standard language for Relation
Database System. All relational database
management systems like MySQL, MS Access,
Oracle, Sybase, Informix and SQL Server uses
SQL as standard database language.
SQL Components
• DDL– create table/rename/alter/drop/
truncate
• DML—insert/update/delete
• DCL---grant/revoke
• TCL---commit/rollback
• DQL--Select
Basic data types
• Char
• Varchar(size)/varchar2
• Date
• Number
• Long
• Raw/long raw
Create Command
• This command is used to create/generate new tables.
• Syntax
CREATE TABLE table name
(column name datatype(size), column name datatype(size)
--- );
• Example
CREATE TABLE Student
(std_name varchar2(20), father_name varchar2(20), DOB
number(10), address varchar2(20));
Insert Command
• This command is used to insert rows into the table
containing already defined columns.
• Syntax
INSERT INTO table name
(column name, column name, --- )
VALUES (expression, expression, ---);
• Example
INSERT INTO Student
(std_name, father_name, DOB, address)
VALUES (’Abid’, ’Ibrahim’, 1984, ‘KARAK”);
Select Command
• This command is used for viewing/retrieving of data from
table.
• Syntax
SELECT (column name, column name, --- )
FROM table name;
• Examples
• Retrieving student names and father names from student
SELECT (std_name, father_name)
FROM Student;
• Retrieving all records from student
SELECT * FROM Student;
• Selected columns and all rows
SELECT (std_name, father_name)
FROM Student;
• Selected rows and all columns
SELECT * FROM Student where DOB=1984;
• Selected columns and selected rows
SELECT (std_name, DOB) FROM Student where
address=‘KARAK’;
• Elimination of Duplicates from Select
SELECT DISTINCT * FROM Student;
Delete Command

• This is used to delete specified row from table.


• Syntax
DELETE FROM table name;
• Examples
• Removing all rows from table student
DELETE FROM Student;
• Removing all rows from table student
DELETE FROM Student where address=‘KARAK’;
Alter Command
• This is used to change description of a column or add an
extra column.
• Syntax
ALTER TABLE table name
ADD (new column name data type (size), new column
name data type (size) ---);
• Example
– ALTER TABLE Student
ADD (marks number(3), gender varchar2(2));
– ALTER TABLE Student
MODIFY (address char(20));
Update Command
• This is used for changing one or more values in the row of
the table.
• Syntax
UPDATE table name SET (column name = expression,
column name = expression ---);
• Example
UPDATE Student SET marks = marks+5;

UPDATE Student SET marks = marks+5 where


std_name=‘Abid’;
Rename Command
• This is used to rename old/existing table.

• Syntax
RENAME Old table name TO New table;

• Example
RENAME Student TO Personal Data;
Drop Command

• This is used to remove table from database.

• Syntax
DROP TABLE table name;

• Example
DROP Table Student;
Describe Command

• This command is used to display column names, data types


and attributes connected to the database.

• Syntax
DESCRIBE table name;

• Example
DESCRIBE Student;
• SQL Components: DDL/DML/DCL/TCL

• create table emp
• (
• rno number(5),
• name varchar2(15),
• marks number(5))
• …………………………

• update emp set marks=null where marks=55;
• ………………………….

• update emp set marks=null;
• ……………………………….
• select rowid from emp;
• --------------------------------------

• delete from emp where rowid='AAADVTAABAAAKS6AAC';
• ……………………………………
Integrity constraints
• Primary key
• Foreign key
• Check
• Not null
• Unique
• Default
• NOT NULL Constraint: Ensures that a column cannot have
NULL value.
• DEFAULT Constraint : Provides a default value for a column
when none is specified.
• UNIQUE Constraint: Ensures that all values in a column are
different.
• PRIMARY Key: Uniquely identified each rows/records in a
database table.
• FOREIGN Key: Uniquely identified a rows/records in any
another database table.
• CHECK Constraint: The CHECK constraint ensures that all
values in a column satisfy certain conditions.
Primary key
• alter table emp
• add primary key(rno);

Default
• alter table emp
• modify name default 'aaa'
Check
• alter table emp
• add check(rno>5);

Foreign key
• Alter table emp add foreign key (dno)
references dept(depno);
Unique
• ALTER TABLE Persons
ADD UNIQUE (P_Id)

NOT NULL
• Alter table emp
modify esal number(5) not null;
Joins
• 1. The purpose of a join is to combine the data
across tables.
• 2. A join is actually performed by the where
clause which combines the specified rows of
tables.
• 3. If a join involves in more than two tables
then oracle joins first two tables based on the
joins condition and then compares the result
with the next table and so on.
Types of Joins
• Natural join
• Inner join
• Outer join
✓ left outer join
✓ right outer join
✓ full outer join
• Self join
• Cross join (Cartesian product)
• Equi & Non Equi joins
Assume that we have the following tables.
• 1. EQUI JOIN
• A join which contains an equal to ‘=’ operator
in the joins condition.
• Ex: SQL> select eno,ename,esal,dname from
emp ,dept where emp.deptno=dept.deptno;
• 2. NON-EQUI JOIN
• A join which contains an operator other than
equal to ‘=’ in the joins condition.
• Ex: SQL> select eno,ename,esal,dname from
emp ,dept where emp.deptno>dept.deptno;
• 3. SELF JOIN
Joining the table itself is called self join.

• Select a.name “teacher”, c.name “hod” from


teacher a, teacher c where a.hod=c.id
ID NAME HOD
1 M 2
2 N
3 O 4
4 P
• 4. NATURAL JOIN
• Natural join compares all the common
columns.
• Ex: SQL> select eno,ename,dname,loc from
emp natural join dept;
• 5. CROSS JOIN
• This will gives the cross product.
• Ex: SQL> select empno,ename,esal,dname,loc
from emp cross join dept;
• 6. OUTER JOIN
• Outer join gives the non-matching records along with
matching records.

• LEFT OUTER JOIN


• This will display the all matching records and the
records which are in left hand side table those that are
not in right hand side table.
• Ex: SQL> select eno,ename,job,dname,loc from emp
left outer join dept on(emp.depno=dept.dno);
• Or
• SQL> select eno,ename,job,dname,loc from emp ,dept
where emp.depno=dept.dno(+);
• RIGHT OUTER JOIN
• This will display the all matching records and the records
which are in right hand side table those that are not in left
hand side table.
• Ex:
• SQL> select empno,ename,job,dname,loc from emp right
outer join dept on(emp.depno=dept.dno);
• Or
• SQL> select empno,ename,job,dname,loc from emp ,dept
where emp.depno(+) = dept.dno;
• FULL OUTER JOIN
• This will display the all matching records and the non-
matching records from both tables.
• Ex:
• SQL> select empno,ename,job,dname,loc from emp full
outer join dept on(emp.depno=dept.dno);
• 7. INNER JOIN
• This will display all the records that have
matched.
• Ex: SQL> select empno,ename,job,dname,loc
from emp inner join dept using(deptno);
Operators and clauses
• IN
• OR
• AND
• Between
• Like
• Distinct
• Rowid
• Order by
• Like opeartor is used for string or patteren
matching
• % character is used to match any string of any
length.
• _ character to match a single charcter.
Aggregate functions

• These functions operate on the multiset of values


of a column of a relation, and return a value.

• avg: average value


• min: minimum value
• max: maximum value
• sum: sum of values
• count: number of values
Find the average account balance at the Perryridge branch.
select avg(balance)
From account
where branch-name = ‘Perryridge’

Find the number of depositors in the bank.


select count (*)
from customer

Find the number of tuplesin the customerrelation.

select count (distinct customer-name)


from depositor
Group By and Having

Find the names of all branches where the average account


balance is more than $1,200.
Select branch-name, avg(balance)
From account
group by branch-name having avg(balance) > 1200

Note: predicates in the havingclause are applied after the


formation of groups whereas predicates in the where
clause are applied before forming groups
Problems
1)select count(empno), dname from emp111,dep111 where
emp111.deptno=dep111.Deptno GROUP BY dname having
dname in ('cse','ece')

2) select count(empno), dname from emp111,dep111 where


emp111.deptno=dep111.Deptno and SAL between 15000
and 70000 GROUP BY dname having dname in ('cse','me');

3) select Empno,ename,sal, dname from emp111 left join


dep111 on emp111.deptno=dep111.Deptno

4) select e.empno,'works-under',m.mgr from emp111


e,emp111 m where m.empno=e.empno
Who is BI for?
• BI for management
• Operational BI
• BI for process improvement
• BI for performance improvement
• BI to improve customer experience
Scenario 1

ABC Pvt Ltd is a company with branches at


Mumbai, Delhi, Chennai and Banglore. The
Sales Manager wants quarterly sales report.
Each branch has a separate operational
system.
Scenario 1 : ABC Pvt Ltd.
Mumbai

Delhi
Sales per item type per branch Sales
for first quarter. Manager

Chennai

Banglore
Solution 1:ABC Pvt Ltd.

• Extract sales information from each database.


• Store the information in a common repository at
a single site.
Solution 1:ABC Pvt Ltd.
Mumbai

Report
Delhi
Query & Sales
Data Analysis tools Manager
Warehouse

Chennai

Banglore
Scenario 2

One Stop Shopping Super Market has huge


operational database.Whenever Executives wants
some report the OLTP system becomes
slow and data entry operators have to wait for
some time.
Scenario 2 : One Stop Shopping

Data Entry Operator

Report

Wait Operational Management


Database

Data Entry Operator


Solution 2
• Extract data needed for analysis from operational
database.
• Store it in warehouse.
• Refresh warehouse at regular interval so that it
contains up to date information for analysis.
• Warehouse will contain data with historical
perspective.
Solution 2

Data Entry
Operator

Report

Transaction Extract Data


Operational Manager
data Warehouse
database

Data Entry
Operator
Scenario 3

Cakes & Cookies is a small,new company.President


of the company wants his company should grow.He
needs information so that he can make correct
decisions.
Solution 3
• Improve the quality of data before loading it
into the warehouse.
• Perform data cleaning and transformation
before loading the data.
Solution 3
Expansio
n

sales

Data Query and Analysis President


Warehouse tool

time

Improvemen
t
Data warehousing
• It is the process which prepares the basic
repository of data that becomes the data
source where we extract information from.
• Data warehouse is subject Oriented,
Integrated, Time-Variant and nonvolatile
collection of data that support of
management's decision making process. Let's
explore this Definition of data warehouse.
• A data warehouse is built by extracting data
from multiple heterogeneous and external
sources ,cleansing to detect errors in the data
and rectify them wherever possible ,
integrating ,transforming the data from legacy
format to warehouse format and then loading
the data after sorting and summarizing.
What is a Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained separately from the
organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision-
making process.”—W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses

55
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer, product,
sales
• The Data warehouse is subject oriented because it provide us
the information around a subject rather the organization's
ongoing operations.
• These subjects can be product, customers, suppliers, sales,
revenue etc.
• The data warehouse does not focus on the ongoing operations
rather it focuses on modelling and analysis of data for decision
making.

56
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources such as relational databases, flat files, on-line
transaction records. This integration enhance the effective
analysis of data.
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.

57
Data Warehouse—Time Variant
• The Data in Data Warehouse is identified with a particular time
period. The data in data warehouse provide information from
historical point of view.
• The time horizon for the data warehouse is significantly longer than
that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time
element”

58
Data Warehouse—Nonvolatile
• Non volatile means that the previous data is not removed when
new data is added to it. The data warehouse is kept separate from
the operational database therefore frequent changes in operational
database are not reflected in data warehouse.
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data warehouse
environment
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data

59
• Metadata - Metadata is simply defined as data about data.
The data that are used to represent other data is known as
metadata. For example the index of a book serves as
metadata for the contents in the book.In other words we can
say that metadata is the summarized data that lead us to the
detailed data.
A Data Warehouse Is A Process
Data Characteristics
• Raw Detail • Integrated • History • Targeted
• No/Minimal History • Scrubbed • Summaries • Specialized (OLAP)

Source OLTP Architected


Systems Data Mart
Data
Warehouse

End User
Workstations
Central
Repository

•Extract •Load
•Design •Replication •Access & Analysis
•Scrub •Index
•Mapping •Data Set Distribution •Resource Scheduling & Distribution
•Transform •Aggregation

Meta Data

System Monitoring
There Are Many Options
Operational User
Source Workstations
Systems E Operational
x Data Store
t
r
a
c Architected
t Data Mart
i
o Data
n Warehouse
S
y
s
t
e
m Independent
s Data Mart
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

63
Why a Separate Data Warehouse?
 High performance for both systems
 DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
 Warehouse—tuned for OLAP: complex OLAP queries, multidimensional
view, consolidation
 Different functions and different data:
 missing data: Decision support requires historical data which
operational DBs do not typically maintain
 data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
 data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
 Note: There are more and more systems which perform OLAP analysis
directly on relational databases

64
Overview of ETL
• In Computing; Extract, Transform and Load (ETL)
refers to
– Extract data from outside source
– Transforms it to fit as per Business need
– Loads it into end target

ETL systems are commonly used to integrate data


from multiple applications, typically developed and
supported by different vendors or hosted on
separate computer hardware.
Overview of ETL
• Extract
The first part of an ETL process involves extracting the data from the source
systems. In many cases this is the most challenging aspect of ETL, since extracting
data correctly sets the stage for how subsequent processes go further.
Most analytics projects consolidate data from different source systems. Each
separate system may also use a different data organization and/or format.
Common data source formats are relational databases and flat files, but may
include non-relational database structures such as Information Management
System (IMS) or even fetching from outside sources such as through web spidering
or screen-scraping. The streaming of the extracted data source and load on-the-fly
to the destination database is another way of performing ETL when no
intermediate data storage is required. In general, the goal of the extraction phase
is to convert the data into a single format appropriate for transformation
processing.
Overview of ETL
• Transform
The transform stage applies a series of rules or functions to the extracted data from the
source to derive the data for loading into the end target. Some data sources require
very little or even no manipulation of data where as others require transformation as
per there business requirement. Following are Some of the transformations
– Translating coded values
– Encoding free-form values
– Sorting
– Joining
– Aggregation
– Transposing or Pivoting

• Load
The load phase loads the data into the end target, usually the data warehouse (DW)
but can be in any other format such as flat file, relation database.
Overview of ETL
• Performance
ETL vendors benchmark their record-systems at multiple TB (terabytes) per hour
(or ~1 GB per second) using powerful servers with multiple CPUs, multiple hard
drives, multiple gigabit-network connections, and lots of memory. The fastest ETL
record is currently held by Syncsort, Vertica and HP at 5.4TB in under an hour,
which is more than twice as fast as the earlier record held by Microsoft and Unisys.
Overview of ETL
▪ Parallel Processing
ETL software follows parallel processing. This has enabled a number of methods to
improve overall performance of ETL processes when dealing with large volumes of
data.
ETL applications implement three main types of parallelism:
– Data: By splitting a single sequential file into smaller data files to provide
parallel access.
– Pipeline: Allowing the simultaneous running of several components on the
same data stream. For example: looking up a value on record 1 at the same
time as adding two fields on record 2.
– Component: The simultaneous running of multiple processes on different data
streams in the same job, for example, sorting one input file while removing
duplicates on another file.
Extraction, Transformation, and Loading (ETL)
 Data extraction
get data from multiple, heterogeneous, and external sources
 Data cleaning
detect errors in the data and rectify them when possible
 Data transformation
convert data from legacy or host format to warehouse format
 Load
sort, summarize, consolidate, compute views, check integrity,
and build indicies and partitions
 Refresh
propagate the updates from the data sources to the
warehouse

70
Metadata Repository
 Meta data is the data defining warehouse objects. It stores:
 Description of the structure of the data warehouse
 schema, view, dimensions, hierarchies, derived data defn, data mart
locations and contents
 Operational meta-data
 data lineage (history of migrated data and transformation path), currency
of data (active, archived, or purged), monitoring information (warehouse
usage statistics, error reports, audit trails)
 The algorithms used for summarization
 The mapping from operational environment to the data warehouse
 Data related to system performance
 warehouse schema, view and derived data definitions
 Business data
 business terms and definitions, ownership of data, charging policies
71
Data Warehouse: A Multi-Tiered Architecture

Monitor
& OLAP Server
Other Metadata
sources Integrator
Analysis
Query
Operational Extract
Serve Reports
DBs Transform Data
Data mining
Load Warehouse
Refresh

Data Marts

Data Sources Data Storage OLAP Engine Front-End Tools


72
OLAP
• OLTP (On-line Transaction Processing) : is
characterized by a large number of short on-line
transactions (INSERT, UPDATE, DELETE).
• The main emphasis for OLTP systems is put on
very fast query processing, maintaining data
integrity in multi-access environments and an
effectiveness measured by number of
transactions per second.
• In OLTP database there is detailed and current
data, and schema used to store transactional
databases is the entity model.
• OLAP (On-line Analytical Processing) : is characterized
by relatively low volume of transactions.
• Queries are often very complex and involve
aggregations.
• For OLAP systems a response time is an effectiveness
measure.
• OLAP applications are widely used by Data Mining
techniques.
• In OLAP database there is aggregated, historical data,
stored in multi-dimensional schemas (usually star
schema).
Data models for OLTP and OLAP
For OLTP : ER model
For OLAP: Star or snowflake schema
Snowflake model
ER Diagram
Information services
• It is not the process of producing information
rather it also involves ensuring that the
information produced is aligned with business
requirements and can be acted upon to
produce value for the company.
• Information is delivered in the form of
reports,charts,dashboards .
• Data mining is a practice used to increase the
body of knowledge.
• Applied analytics is generally used to drive
action and produce outcomes.
Why Data Mining?

• The Explosive Growth of Data: from terabytes to yottabytes


– Data collection and data availability
• Automated data collection tools, database systems, Web,
computerized society
– Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube
• We are drowning in data, but starving for knowledge!
What Is Data Mining?

• Data mining (knowledge discovery from data)


– Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data

• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
Knowledge Discovery (KDD) Process
• This is a view from typical database
systems and data warehousing
Pattern Evaluation
communities
• Data mining plays an essential role in
the knowledge discovery process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
• 1. Data cleaning (to remove noise and inconsistent
data)
• 2. Data integration (where multiple data sources
may be combined)
• 3. Data selection (where data relevant to the analysis
task are retrieved from the database)
• 4. Data transformation (where data are transformed
or consolidated into forms appropriate for mining by
performing summary or aggregation operations, for
instance)
• 5. Data mining (an essential process where
intelligent methods are applied in order to extract
data patterns)
• 6. Pattern evaluation (to identify the truly interesting
patterns representing knowledge based on some
interestingness measures)
• 7. Knowledge presentation (where visualization and
knowledge representation techniques are used to
present the mined knowledge to the user)
ARCHITECTURE OF DATA MINING
REPRESENTATION FOR VISUALIZING
THE DISCOVERED PATTERNS

• This refers to the form in which discovered


patterns are to be displayed. These
representations may include the following:
• Rules
• Tables
• Charts
• Graphs
• Decision Trees
• Cubes
Why Data Preprocessing?

• Data in the real world is dirty


– incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate data
– noisy: containing errors or outliers
– inconsistent: containing discrepancies in codes or names
• No quality data, no quality mining results!
– Quality decisions must be based on quality data
– Data warehouse needs consistent integration of quality
data
Major Tasks in Data Preprocessing

 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
 Data integration
 Integration of multiple databases, data cubes, or files
 Data transformation
 Normalization and aggregation
 Data reduction
 Obtains reduced representation in volume but produces the same or
similar analytical results
 Data discretization
 Part of data reduction but with particular importance, especially for
numerical data
Forms of data preprocessing
Data Cleaning

• Data cleaning tasks


– Fill in missing values
– Identify outliers and smooth out noisy data
– Correct inconsistent data
Missing Data
• Data is not always available
– E.g., many tuples have no recorded value for several attributes, such
as customer income in sales data
• Missing data may be due to
– equipment malfunction
– inconsistent with other recorded data and thus deleted
– data not entered due to misunderstanding
– certain data may not be considered important at the time of entry
– not register history or changes of the data
• Missing data may need to be inferred.
How to Handle Missing Data?
• Ignore the tuple: usually done when class label is missing (assuming the
tasks in classification—not effective when the percentage of missing values
per attribute varies considerably.
• Fill in the missing value manually: tedious + infeasible !
• Use a global constant to fill in the missing value: e.g., “unknown”, a new
class?!
• Use the attribute mean to fill in the missing value
• Use the attribute mean for all samples belonging to the same class to fill in
the missing value: smarter
Noisy Data

 Noise: random error or variance in a measured variable


 Incorrect attribute values may due to
 faulty data collection instruments
 data entry problems
 data transmission problems
 technology limitation
 inconsistency in naming convention
 Other data problems which requires data cleaning
 duplicate records
 incomplete data
 inconsistent data
THANKS

You might also like