Lecture 03
Lecture 03
Storage Structure
Summary – last week
• Last week:
– DWH Lifecycle
– Basic Structure
• This week:
– DW Architecture
– Storage Architecture
– Tier Architecture
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 2
Storage Structure
• Storage structure
– After extraction from the operational data,in DW
information is stored in databases
– The databases are operated by a DBMS
– Different database structures can be used for a DW:
• Rela tiona l m odel (RDB ) operated by a RDB MS
• MultiDim ensiona l m odel (M DB ) operated by a
MDB MS
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 3
Storage Structure (cont’d.)
• RDB and MDB are com plem enta r y and do not
have to exclude each other
– In the stag ing a rea some RDBMS can be used,
however it must be off-limits to user queries because
of performance reasons
– By default,normalized databases are excluded from
the presenta tion a rea ,which should be strictly
multi-dimensionally (MDBMS)
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 4
Relational DB
• DB in rela tiona l m odel
– A database is seen as a collection of predica tes over
a finite set of variables
– The content of the DB is modeled as a set of
rela tions in which all predicates are satisfied
Publisher
Books
Name
Title
ID (PK)
ISBN (PK)
Price
Publisher (FK)
Category (FK) BookCategory
Cat_ID (PK)
Description
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 5
Relational DB (cont’d.)
• A rela tion is defined as a set of tuples that have
the same attributes
– It is usually described as a table
Attribute
Tuple
Relation
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 6
Multidimensional DB
• Multidim ensiona l DB (MDB ) are optimized
for DW and OLAP applications
– They are created using input from the stag ing a rea
– Designed for efficient and convenient storage and
retrieval of large volumes of data
– Stored,viewed and analyzed from different
perspectives called dim ensions
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 7
Multidimensional DB (cont’d.)
• Example:an automobile manufacturer wants to
increase sale volumes
– Evaluation requires to view historica l sa le volume
figures from multiple dimensions
– S a les volume by m odel,
by color,by dea ler, over
tim e
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 8
Multidimensional DB (cont’d.)
• A relational structure of the given evaluation
would be
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 9
Multidimensional DB (cont’d.)
• Structure
* 289 451 40 1560
Coupe 16 12 16 44
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 10
Multidimensional DB (cont’d.)
• The com plexity grows quickly with the number
of dimensions and the number of positions
– Example:3 dimensions with 10 values each and no
indexes
– If we consider viewing information
in a RDB it would result in a worst
case of 103=1000
records view
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 11
Multidimensional DB (cont’d.)
• Now,if we consider perform a nce
– For responding to a query when car type = Sedan,
color = Blue,and dealer = Berg
• RDBMS has to search through 1000 records to find the
right record
• MDB has more knowledge about where data lies
• The maximum of searches in the case of MDB is of 30
positions
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 12
Multidimensional DB (cont’d.)
• If the query is more relaxed
– Total sales across all dealers for all colors when car
type = sedan
• RDBMS still has to go through the 1000 records
• MDB,however,goes only through a slice of 10x10
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 13
Multidimensional DB (cont’d.)
• Performance advantages
– MDBs are an order of magnitude faster than RDBMSs
– Performance benefits are more for queries that
generate cross-tab views of data (the case of DW)
• Conclusion
– The performance advantages offered by
MDBs facilitates the development of
interactive decision support applications
like OLAP that can be impractical in a
relational environment
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 14
RDB vs. MDB
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 15
RDB vs. MDB (cont’d.)
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 16
RDB vs. MDB (cont’d.)
• Ease of data presenta tion
– Top k queries cannot be expressed well in SQL
• Find the five cheapest hotels in Auckland
– SELECT * FROM hotels h WHERE h.city = Auckland AND 5 >
(SELECT count(*) FROM hotels h1 WHERE h1.city = Auckland
AND h1.price < h.price);
• Some RDBMS extended the functionality of SQL with STOP
AFTER functionality
– SELECT * FROM hotelsWHERE city = Auckland Order By price
STOPAFTER 5;
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 17
RDB vs. MDB (cont’d.)
• Ease of m a intena nce
– No additional overhead to translate user queries into
requests for data
• Data is stored as it is viewed
– RDBs use indexes and sophistica ted joins which
require sig nifica nt m a intena nce and storage to
provide same intuitiveness
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 18
RDB vs. MDB (cont’d.)
• Perform a nce
• – Performance of MDBs can be matched by
RDBs through database tuning
• – Not possible to tune the database for a ll
possible ad- hoc queries
• – A g g reg a te navigators are helping RDBs to
catch up with MDBs as far as aggregation queries
are concerned
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 19
When MDBs are In-appropriate?
• When MDBs are in-appropriate?
– If the dataset types are not highly related,using a MDB
results in a sparse representation
34
Sedan 160 115 6 James 25
Coupe 16 12 16 Fox 45
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 20
When MDBs are Appropriate?
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 21
Tier Architecture
• Popular DW architectures
– GenericTwo-TierArchitecture
– Independent Data Mart
– Dependent Data Mart and Operational Data Store
– Logical Data Mart andActiveWarehouse
– Three-TierArchitecture
• Other
– One-TierArchitecture
– N-TierArchitecture
– Web-basedArchitecture
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 22
Layered Architecture
• Data a na lysis comes in two flavors
– Depending on the execution place of the analysis
• Thin Client
– Analytics are executed on the server
– Client just displays
– This architecture fits well for Internet/Intranet DW access
Analysis
Data storage
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 23
Layered Architecture (cont’d.)
• Fat Client
– The server just delivers the data
– Analytics are executed on the client
– Communication between client and server must be able to sustain
large data transfers
Analysis
Data storage
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 24
Layered Architecture (cont’d.)
• GenericTwo-T ier A rchitecture
– Data is not completely current in the DW
– Periodic extraction
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 25
Layered Architecture (cont’d.)
• Independent Da ta Ma rt
– Mini warehouses – limited in scope
– Separate ETL for each independent Data Mart
– High Data Marts access complexity
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 26
Layered Architecture (cont’d.)
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 27
Layered Architecture (cont’d.)
• L og ica l Da ta Ma rt andActiveWarehouse
– The ETL is near real-time
– Data Marts are not separate databases,but logical views
of the DW
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 28
DW vs. Data Marts
Scope Subjects
DW Data Marts DW Data Marts
Application Specific DSS application Multiple subjects One central subject
independent
Sources
Centralized, Decentralized by user area
Planned Organic, possibly not planned DW Data Marts
Many internal and Few internal and
Data external sources external sources
DW Data Marts
Historical, Some history, detailed,
Other characteristics
detailed, summarized
summarized DW Data Marts
Lightly Highly denormalized Flexible Restrictive
denormalized Data-oriented Project oriented
Long life Short life
Large Start small, becomes
large
Single complex Multiple, semi-complex
structure structure, together
complex
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 29
Layered Architecture (cont’d.)
• GenericT hree-T ier A rchitecture OLAP queries
– Derived data
•Data that had been selected, Data mart
Derived data metadata
formatted,and aggregated for
Data Mart
DSS support
– Reconciled data DW
Reconciled data
• Detailed,current data metadata
DW and ODS
intended to be the single,
authoritative source for all ETL with staging area
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 30
Layered Architecture (cont’d.)
• One-TierArchitecture
– Theoretically possible
– Might be interesting for mobile applications
• N-TierArchitecture
– Higher tier architecture is also possible
• But the complexity grows with the number of tier-interfaces
• Web-basedArchitecture
– Advantages:
• Usage of existing software,reduction of costs,platform
independence
– Disadvantages:
• Security issues:data encryption/user access and identification
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 31
Summary
• DW architectures:
– Storage architecture
– Relational DB
– Multidimensional DB
– Tier Architecture
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 32
Next lecture
• DW architecture (cont’d.)
– Distributed DW
– DW data modelling
– Conceptual model
Acknowledgment - Thanks to Wolf-Tilo Balke and Silviu Homoceanu - TU Braunschweig for the slides 33