DW Chapter 3
DW Chapter 3
Chapter 3
4
Three-tier architecture of data warehouse
(Another Diagram)
Information Sources Data Warehouse OLAP Servers Clients
Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Analysis
Semistructured
Sources Data
Warehouse serve
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s
Data Mining
serve
Data Marts
5
MDBMS
A multidimensional database management system
(MDBMS) is a database management system that
uses a data cube as an idea to represent multiple
dimensions of data available to users. This database
is optimized for data warehouse and online analytical
processing applications.
6
A. Data Warehouse Models
From the perspective of data warehouse architecture, we have the
following data warehouse models:
• Virtual Warehouse
• Data mart
• Enterprise Warehouse
7
1. Virtual Warehouse
The view over an operational data warehouse is
known as a virtual warehouse. It is easy to build a
virtual warehouse. Building a virtual warehouse
requires excess capacity on operational database
servers.
8
2. Data Mart
• Data mart contains a subset of organization-wide data. This
subset of data is valuable to specific groups of an
organization.
• In other words, we can claim that data marts contain data
specific to a particular group. For example, the marketing
data mart may contain data related to items, customers, and
sales. Data marts are confined to subjects.
9
3. Enterprise Warehouse
• An enterprise warehouse collects all the information and
the subjects spanning an entire organization
• It provides us enterprise-wide data integration.
• The data is integrated from operational systems and
external information providers.
• This information can vary from a few gigabytes to
hundreds of gigabytes, terabytes or beyond.
10
B. Load Manager
• This component performs the operations required to extract and
load process.
• The size and complexity of the load manager varies between
specific solutions from one data warehouse to other.
Load Manager Architecture
• The load manager performs the following functions:
• Extract the data from source system.
• Fast Load the extracted data into temporary data store.
• Perform simple transformations into structure similar to the
one in the data warehouse.
11
Load Manager
12
1. Extract Data from Source
➢The data is extracted from the operational databases or
the external information providers.
➢Gateways is the application programs that are used to
extract data.
➢It is supported by underlying DBMS and allows client
program to generate SQL to be executed at a server.
➢Open Database Connection(ODBC), Java Database
Connection (JDBC), are examples of gateway.
13
2. Fast Load
• In order to minimize the total load window, the data need
to be loaded into the warehouse in the fastest possible
time.
• The transformations affects the speed of data processing.
• It is more effective to load the data into relational
database prior to applying transformations and checks.
• Gateway technology proves to be not suitable, since they
tend not be performant when large data volumes are
involved.
14
3. Simple Transformations
• While loading it may be required to perform simple transformations.
After this has been completed we are in position to do the complex
checks. Suppose we are loading the EPOS sales transaction we need
to perform the following checks:
• Strip out all the columns that are not required within the
warehouse.
• Convert all the values to required data types.
15
C. Warehouse Manager
• A warehouse manager is responsible for the warehouse
management process. It consists of third-party system
software, C programs, and shell scripts.
• The size and complexity of warehouse managers varies
between specific solutions.
16
Warehouse Manager Architecture
• A warehouse manager includes the following:
• The controlling process
• Stored procedures or C with SQL
• Backup/Recovery tool
• SQL Scripts
17
Warehouse Manager Architecture
18
Operations Performed by Warehouse
Manager
A warehouse manager analyzes the data to perform consistency and
referential integrity checks.
1. Creates indexes, business views, partition views against the base
data.
2. Generates new aggregations and updates existing aggregations.
Generates normalizations.
3. Transforms and merges the source data into the published data
warehouse.
4. Backup the data in the data warehouse.
5. Archives the data that has reached the end of its captured life.
20
Query Manager Architecture
The following screenshot shows the architecture of a
query manager. It includes the following:
• Query redirection via C tool or RDBMS
• Stored procedures
• Query management tool
• Query scheduling via C tool or RDBMS
• Query scheduling via third-party software
21
Query Manager Architecture
22
1. Detailed Information
• Detailed information is not kept online, rather it is
aggregated to the next level of detail and then archived
to tape. The detailed information part of data
warehouse keeps the detailed information in the
snowflake schema. Detailed information is loaded into
the data warehouse to supplement the aggregated data.
23
The following diagram shows a pictorial impression of where
detailed information is stored and how it is used.
Note: If detailed information is held offline to minimize disk storage, we should make sure that the data has been
extracted, cleaned up, and transformed into starflake schema before it is archived.
24
2. Summary Information
Summary Information is a part of data warehouse that stores predefined
aggregations. These aggregations are generated by the warehouse manager.
Summary Information must be treated as transient. It changes on-the-go in
order to respond to the changing query profiles.