0% found this document useful (0 votes)
73 views

Data Warehousing and Data Mining - Unit2

The document discusses data warehousing and data mining. It defines key concepts like data, databases, and data warehouses. It explains that a data warehouse organizes all available organizational data to support analysis and reporting. It is integrated, time-variant, and nonvolatile. The data warehousing process extracts and transforms operational data into a central data store to support management decision making. Data warehouses are constructed by integrating heterogeneous data sources and applying data cleaning. They are subject-oriented, provide a historical perspective, and do not allow updates. Data warehouse usage includes querying, reporting, OLAP, and data mining. The general architecture includes data acquisition, storage, and extraction components.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Data Warehousing and Data Mining - Unit2

The document discusses data warehousing and data mining. It defines key concepts like data, databases, and data warehouses. It explains that a data warehouse organizes all available organizational data to support analysis and reporting. It is integrated, time-variant, and nonvolatile. The data warehousing process extracts and transforms operational data into a central data store to support management decision making. Data warehouses are constructed by integrating heterogeneous data sources and applying data cleaning. They are subject-oriented, provide a historical perspective, and do not allow updates. Data warehouse usage includes querying, reporting, OLAP, and data mining. The general architecture includes data acquisition, storage, and extraction components.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Warehousing and Data

Mining
Unit 2

1
Data Warehousing
• Data
– Raw piece of information that is capable of being moved and
store.
• Database
– An organized collection of such data in which data are managed
in tabular form with relationship.
• Data Warehouse
– System that organizes all the data available in an organization,
makes it accessible & usable for the all kinds of data analysis
and also allows to create a lots of reports by the use of mining
tools.

2
Data Warehouse
– “A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile
collection of data in support of management’s
decision-making process.”
• Data warehousing:
– The process of constructing and using data
warehouses.
– Is the process of extracting & transferring
operational data into informational data & loading
it into a central data store (warehouse)
3
Data Warehouse—Integrated
• Constructed by integrating multiple,
heterogeneous data sources Sales
– relational databases, flat files, on-line system
transaction records
• Data cleaning and data integration
techniques are applied. Payroll
Customer
– Ensure consistency in naming conventions, system
data
encoding structures, attribute measures, etc.
among different data sources
• E.g., Hotel price: currency, tax, breakfast Purchasing
covered, etc. system

4
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as
customer, product, sales.
Sales Employee
• Focusing on the modeling and analysis of system data
data for decision makers, not on daily
operations or transaction processing. Payroll Customer
system data
• Provide a simple and concise view around
particular subject issues by excluding data Vendor
Purchasing data
that are not useful in the decision support system
process.
Operational data DW

5
Data Warehouse—Time Variant

• The time horizon for the data warehouse is significantly


longer than that of operational systems.
– Operational database: current value data.
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)

6
Data Warehouse—Non-Volatile
• A physically separate store of data transformed
DBMS DW
from the operational environment.
create access
• Operational update of data does not occur in
the data warehouse environment.
– Does not require transaction processing, update
Sales delete Customer
recovery, and concurrency control mechanisms system data
– Requires only two operations in data accessing:
• initial loading of data and access of data.
insert load

7
Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting
using crosstabs, tables, charts and graphs
– Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools.

8
General Architecture
OLAP
External Data Server Data
Sources
acquisition extraction OLAP

queries/
Query reports
Data and
Integration Data Data Analysis
Component Warehouse Component
data
mining

Metadata

Internal Monitoring
Sources Administration
Construction &
maintenance 9
3 main phases
• Data acquisition
– relevant data collection
– Recovering: transformation into the data warehouse model from
existing models
– Loading: cleaning and loading in the DWH
• Storage
• Data extraction
– Tool examples: Query report, SQL, multidimensional analysis (OLAP
tools), datamining
• Maintenance

10
DATA WAREHOUSING
THE USE OF A DATA WAREHOUSE

INVENTORY
DATABASE STEP 1: Load the Data Warehouse

PERSONNEL STEP 2: Question the Data Warehouse


DATABASE

DATA
NEWCASTLE
SALES DB WAREHOUSE

LONDON
SALES DB

STEP 3: Do something DECISIONS


and ACTIONS!
GLASGOW
with what you learn from
SALES DB the Data Warehouse
11
Partitioning
• To improve performances & flexibility without
giving up on the details

DW
 Data marts

• By date, business type, geography, …


12
Creating a Data Warehouse

13
Why Separate Data Warehouse?
• High performance for both systems
– DBMS— tuned for OLTP: access methods, indexing,
concurrency control, recovery
– Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation(aggregation).
• Different functions and different data:
– missing data: Decision support requires historical data
which operational DBs do not typically maintain
– data consolidation: Decision Support requires consolidation
(aggregation, summarization) of data from heterogeneous
sources
– data quality: different sources typically use inconsistent
data representations, codes and formats

14

You might also like