0% found this document useful (0 votes)
133 views33 pages

Lecture 03

The document discusses data warehousing storage structures. It explains that data extracted from operational systems can be stored in relational or multidimensional databases. Relational databases store data in tables while multidimensional databases optimize for efficient storage and retrieval of large volumes of data from different perspectives called dimensions. Multidimensional databases have performance advantages for data warehousing queries compared to relational databases.

Uploaded by

hassan499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views33 pages

Lecture 03

The document discusses data warehousing storage structures. It explains that data extracted from operational systems can be stored in relational or multidimensional databases. Relational databases store data in tables while multidimensional databases optimize for efficient storage and retrieval of large volumes of data from different perspectives called dimensions. Multidimensional databases have performance advantages for data warehousing queries compared to relational databases.

Uploaded by

hassan499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture 03

Storage Structure
Summary – last week
• Last week:
– DWH Lifecycle
– Basic Structure

• This week:
– DW Architecture
– Storage Architecture
– Tier Architecture

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 2
Storage Structure
• Storage structure
– After extraction from the operational data,in DW
information is stored in databases
– The databases are operated by a DBMS
– Different database structures can be used for a DW:
• Rela tiona l m odel (RDB ) operated by a RDB MS
• MultiDim ensiona l m odel (M DB ) operated by a
MDB MS

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 3
Storage Structure (cont’d.)
• RDB and MDB are com plem enta r y and do not
have to exclude each other
– In the stag ing a rea some RDBMS can be used,
however it must be off-limits to user queries because
of performance reasons
– By default,normalized databases are excluded from
the presenta tion a rea ,which should be strictly
multi-dimensionally (MDBMS)

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 4
Relational DB
• DB in rela tiona l m odel
– A database is seen as a collection of predica tes over
a finite set of variables
– The content of the DB is modeled as a set of
rela tions in which all predicates are satisfied

Publisher
Books
Name
Title
ID (PK)
ISBN (PK)
Price
Publisher (FK)
Category (FK) BookCategory
Cat_ID (PK)
Description

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 5
Relational DB (cont’d.)
• A rela tion is defined as a set of tuples that have
the same attributes
– It is usually described as a table
Attribute

Tuple

Relation

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 6
Multidimensional DB
• Multidim ensiona l DB (MDB ) are optimized
for DW and OLAP applications
– They are created using input from the stag ing a rea
– Designed for efficient and convenient storage and
retrieval of large volumes of data
– Stored,viewed and analyzed from different
perspectives called dim ensions

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 7
Multidimensional DB (cont’d.)
• Example:an automobile manufacturer wants to
increase sale volumes
– Evaluation requires to view historica l sa le volume
figures from multiple dimensions
– S a les volume by m odel,
by color,by dea ler, over
tim e

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 8
Multidimensional DB (cont’d.)
• A relational structure of the given evaluation
would be

Model Color Sales volume


Mini VAN Blue 324
Mini VAN Black 113
Mini VAN Red 18
Sedan Black 160
Sedan Blue 115
Sedan Red 6
Sports coupe Red 16
Sports coupe Black 16
Sports coupe Blue 12

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 9
Multidimensional DB (cont’d.)
• Structure
* 289 451 40 1560

Mini VAN 113 324 18 455

Sedan 160 115 6 281

Coupe 16 12 16 44

Black Blue Red *

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 10
Multidimensional DB (cont’d.)
• The com plexity grows quickly with the number
of dimensions and the number of positions
– Example:3 dimensions with 10 values each and no
indexes
– If we consider viewing information
in a RDB it would result in a worst
case of 103=1000
records view

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 11
Multidimensional DB (cont’d.)
• Now,if we consider perform a nce
– For responding to a query when car type = Sedan,
color = Blue,and dealer = Berg
• RDBMS has to search through 1000 records to find the
right record
• MDB has more knowledge about where data lies
• The maximum of searches in the case of MDB is of 30
positions

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 12
Multidimensional DB (cont’d.)
• If the query is more relaxed
– Total sales across all dealers for all colors when car
type = sedan
• RDBMS still has to go through the 1000 records
• MDB,however,goes only through a slice of 10x10

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 13
Multidimensional DB (cont’d.)

• Performance advantages
– MDBs are an order of magnitude faster than RDBMSs
– Performance benefits are more for queries that
generate cross-tab views of data (the case of DW)
• Conclusion
– The performance advantages offered by
MDBs facilitates the development of
interactive decision support applications
like OLAP that can be impractical in a
relational environment

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 14
RDB vs. MDB

• Any database manipulation is possible


with both technolog ies
• MDBs however offer some
advantages in the context of DW:
– Ease of data presenta tion
– Ease of m a intena nce
– Perform a nce

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 15
RDB vs. MDB (cont’d.)

• Ease of data presentation


– Data views are natural output of the MDBs
– Obtaining the same views in RDB requires a complex query
• Example withWalmart and Sybase:
– select sum(sales.quantity_sold) from sales,products,product_categories,
manufacturers,stores,cities where manufacturer_name =‘Colgate’
and product_category_name =‘toothpaste’
and cities.population < 40 000
and trunc(sales.date_time_of_sale) = trunc(sysdate-1)
and sales.product_id = products.product_id
and sales.store_id = stores.store_id
and products.product_category_id = product_categories.product_category_id
and products.manufacturer_id = manufacturers.manufacturer_id
and stores.city_id = cities.city_id

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 16
RDB vs. MDB (cont’d.)
• Ease of data presenta tion
– Top k queries cannot be expressed well in SQL
• Find the five cheapest hotels in Auckland
– SELECT * FROM hotels h WHERE h.city = Auckland AND 5 >
(SELECT count(*) FROM hotels h1 WHERE h1.city = Auckland
AND h1.price < h.price);
• Some RDBMS extended the functionality of SQL with STOP
AFTER functionality
– SELECT * FROM hotelsWHERE city = Auckland Order By price
STOPAFTER 5;

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 17
RDB vs. MDB (cont’d.)
• Ease of m a intena nce
– No additional overhead to translate user queries into
requests for data
• Data is stored as it is viewed
– RDBs use indexes and sophistica ted joins which
require sig nifica nt m a intena nce and storage to
provide same intuitiveness

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 18
RDB vs. MDB (cont’d.)
• Perform a nce
• – Performance of MDBs can be matched by
RDBs through database tuning
• – Not possible to tune the database for a ll
possible ad- hoc queries
• – A g g reg a te navigators are helping RDBs to
catch up with MDBs as far as aggregation queries
are concerned

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 19
When MDBs are In-appropriate?
• When MDBs are in-appropriate?
– If the dataset types are not highly related,using a MDB
results in a sparse representation

Mini VAN 113 324 18 Smith 34

34
Sedan 160 115 6 James 25

Coupe 16 12 16 Fox 45

Black Blue Red 6 115 3

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 20
When MDBs are Appropriate?

• When MDBs are appropriate?


– In the case of hig hly interrela ted dataset types
MDBs are recommended for greatest ease of access
and analysis
– Examples of applications
• FinancialAnalysis and Reporting
• Budgeting
• PromotionTracking
• QualityAssurance and Quality Control
• Product Profitability

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 21
Tier Architecture
• Popular DW architectures
– GenericTwo-TierArchitecture
– Independent Data Mart
– Dependent Data Mart and Operational Data Store
– Logical Data Mart andActiveWarehouse
– Three-TierArchitecture
• Other
– One-TierArchitecture
– N-TierArchitecture
– Web-basedArchitecture

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 22
Layered Architecture
• Data a na lysis comes in two flavors
– Depending on the execution place of the analysis
• Thin Client
– Analytics are executed on the server
– Client just displays
– This architecture fits well for Internet/Intranet DW access

Client HTTP, IIOP Server

Analysis
Data storage

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 23
Layered Architecture (cont’d.)
• Fat Client
– The server just delivers the data
– Analytics are executed on the client
– Communication between client and server must be able to sustain
large data transfers

ODBC, JDBC, NFS


Server
Client

Analysis
Data storage

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 24
Layered Architecture (cont’d.)
• GenericTwo-T ier A rchitecture
– Data is not completely current in the DW
– Periodic extraction

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 25
Layered Architecture (cont’d.)
• Independent Da ta Ma rt
– Mini warehouses – limited in scope
– Separate ETL for each independent Data Mart
– High Data Marts access complexity

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 26
Layered Architecture (cont’d.)

• Dependent Da ta Ma rt and Operational


Data Store
–Single ETL for the DW
– Data Marts are loaded from the DW
–More simple
data access than
in the previous
case

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 27
Layered Architecture (cont’d.)
• L og ica l Da ta Ma rt andActiveWarehouse
– The ETL is near real-time
– Data Marts are not separate databases,but logical views
of the DW

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 28
DW vs. Data Marts
Scope Subjects
DW Data Marts DW Data Marts
Application Specific DSS application Multiple subjects One central subject
independent
Sources
Centralized, Decentralized by user area
Planned Organic, possibly not planned DW Data Marts
Many internal and Few internal and
Data external sources external sources

DW Data Marts
Historical, Some history, detailed,
Other characteristics
detailed, summarized
summarized DW Data Marts
Lightly Highly denormalized Flexible Restrictive
denormalized Data-oriented Project oriented
Long life Short life
Large Start small, becomes
large
Single complex Multiple, semi-complex
structure structure, together
complex

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 29
Layered Architecture (cont’d.)
• GenericT hree-T ier A rchitecture OLAP queries

– Derived data
•Data that had been selected, Data mart
Derived data metadata
formatted,and aggregated for
Data Mart
DSS support
– Reconciled data DW
Reconciled data
• Detailed,current data metadata

DW and ODS
intended to be the single,
authoritative source for all ETL with staging area

decision support Operational


Operational
metadata
data
Operational data

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 30
Layered Architecture (cont’d.)

• One-TierArchitecture
– Theoretically possible
– Might be interesting for mobile applications
• N-TierArchitecture
– Higher tier architecture is also possible
• But the complexity grows with the number of tier-interfaces
• Web-basedArchitecture
– Advantages:
• Usage of existing software,reduction of costs,platform
independence
– Disadvantages:
• Security issues:data encryption/user access and identification

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 31
Summary
• DW architectures:
– Storage architecture
– Relational DB
– Multidimensional DB
– Tier Architecture

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 32
Next lecture

• DW architecture (cont’d.)
– Distributed DW
– DW data modelling
– Conceptual model

Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 33

You might also like