MAC - Conceptual Data Modeling for OLAP
MAC - Conceptual Data Modeling for OLAP
Timos Sellis
Knowledge and Database Systems Laboratory
Department of Electrical and Computer Engineering
National Technical University of Athens (NTUA)
Zografou 15773, Athens, Greece
[email protected]
the proposed MAC model allows OLAP scenarios to be 5HVLGHQFH 5HVLGHQFH 5HVLGHQFH
3 &XVWRPHU
modeled in a natural and straightforward way. $UHD &LW\ 5HJLRQ
and hierarchies and the broad definition of cubes makes it 3 &XVWRPHU 3URIHVVLRQ
For example, one possible analysis path is to drill-down Q1: Give me the sum of sales for the year 2000 per
from Warehouse Area to Store level. This drill-down Month, Product Group, Product Class, Store City and
operation could reveal the stores supplied by warehouses Store Region.
of a particular area. The designer of the OLAP application
has defined this path (P5) but has decided to leave Note that the above question requires aggregation on
Cashier level out of this path. This decision can mean that several levels of the same path. For example it requires
when doing roll-up and drill-down operations on this path the sum of sales for each product group and also the sum
it is not meaningful to drill-down to the Cashier level. of sales for each product class. The query result
represented in a grid fashion could look like Figure 2.
From the above discussion one can realize that a
conceptual data model suitable for multidimensional -DQXDU\ )HE
&LW\ &LW\
6WRUH 5HJLRQ
&LW\
1. dimension levels,
*URXS *URXS
levels) and
3. analysis paths.
According to the first path (P1) of our example scenario, Figure 2: Example result for query Q1
for each customer we have the residence address in the
form of area, city, and region. Assume now that for some Now consider the case where the analyst uses two paths
customers the city attribute is not set (perhaps because of the same dimension for the classification and grouping
their residence area does not belong to any city). In this of sales data.
case the residence area of the customer will be linked by
the grouping/classification relationship directly to a Q2: Give me the sum of sales for year 2000 per
residence region and not to a residence city as it happens customer Profession and Residence Region.
with most of the other residence areas. Generally
speaking a grouping/classification relationship may The two levels used for grouping, Profession and
involve more than two levels since a members of a level Residence Region are in fact independent although they
may drill-down to members of different levels. both belong to the same dimension. So, grouping on both
of them at the same time defines a two-dimensional space.
There are also some other aspects of the If the two levels where related, like Store City and Store
grouping/classification relationship that we found to be Region are in the previous query, we would have a one-
important. In some cases a member of a level is linked to dimensional space.
more than one members of the next level. For example a
member of the Product level will probably be linked to The next query shows the necessity of supporting multiple
more than one members of the Advertising Method level measures defined over the same set of dimensions.
In most of the cases some of the measures will be The MAC model uses concepts that are close to the way
functionally dependent on other more primitive measures. OLAP users perceive the information. The model tries to
Generally speaking, a set of primitive and derived be expressive providing the means to model complicated
measures can simultaneously be required for an analysis real-world scenarios while using a minimal set of
process. Finally, we present a query with a somehow concepts that remain as simple as possible. The MAC
more complicated selection condition: model describes data as dimension levels, drilling
relationships, dimension paths, dimensions, cubes and
Q4: Give me the sum of sales per Month, Store Area attributes.
and Brand selecting only the store areas that have
increased their total sales for the year 2000 by more Dimension levels represent classes of dimension members.
than 10 percent from the previous year. Each dimension member represents some instance of a
real-world property that an OLAP measure may have.
This query requires calculation of the total sales value per Distinct dimension levels can be related by means of a
store area for the years 2000 and 1999 and then the drilling relationship. A drilling relationship indicates that
selection of the areas whose sum of sales value for the there is a semantic relationship among the involved levels
year 2000 greater than 110% of their sum of sales value and describes how the dimension members of the children
for the year 1999. For those store areas the query requires levels can be grouped into sets that correspond to
the sum of sales calculated by Month and Brand. This is a dimension members of the parent level.
typical query that performs selection based on aggregated
data at a different level than the data required for its A set of drilling relationships can form a dimension path
output. if several structural requirements are met. A dimension
path defines a meaningful composition of drilling
Our experience shows that all the above queries are relationships and is used to model a valid sequence of
common OLAP queries. We believe that a conceptual abstraction operations (drill-down/roll-up). One or more
model suitable for multidimensional analysis should dimension paths that share common levels can form a
accommodate queries like the above. This means that the dimension.
structure of the query result as well as the structure of any
other information involved in the query definition must be Finally, we define multidimensional aggregation cubes
easily represented by the concepts of the model. The (MACs) as a relationship among the domains of one or
above requirement comes as a result of our intention of more dimensions. A MAC can have one or more
having a conceptual model that can efficiently represent measures. Each one of those can be considered as a
all kinds of information handled by OLAP applications simple and atomic attribute of the relationship represented
and not only the raw (source - detailed) data. Since we are by the MAC. An instance of a MAC is called a MAC cell
only talking about a conceptual data model we do not or simply a cell. We now give the complete definition of
require the model to represent the functional aspects the above terms and provide examples on how they are
(operators and functions) of the queries but we limit our used. In the following we will use the simple term cube to
requirements to the static data involved in those refer to a multidimensional aggregation cube
computations.
3.1 Dimension Levels
Based on our example queries we derive the following
requirements: a good conceptual model should be able to A dimension level is a set of dimension members. The
define aggregations on arbitrary combination of levels (of dimension members are the most detailed modeling
different paths) even if those levels belong to the same concepts of our model and represent instances of real-
dimension as well as aggregation on a set of levels world properties that OLAP measures may have. In our
belonging to a particular path. Furthermore, the model example scenario, the sale price is one measure of our
should allow multiple measures to be defined for a given multidimensional analysis. A property of this measure is
set of dimensions and in some cases represent them in one the location where the sale took place – the cashier where
concept, reflecting the fact that those measures are the sale was recorded. For our example scenario we
semantically linked. would define the dimension level Cashier in order to
represent the cashiers where the sales transactions take
place and each particular cashier of each store would be
3 The Multidimensional Aggregation Cube modeled as a dimension member of this level.
(MAC) Data Model
In this section we present the Multidimensional Dimension levels can have one or more attributes. A
Aggregation Cube data model. MAC is a user-centric subset of those attributes always form a key for the
conceptual data model that attempts to cover the dimension level. In most of the cases a single attribute
requirements described in the previous section in order to acts as the key but multi-attribute keys can also exist. This
The above example reveals the key property of the MAC Time Location Item Client SumOfS
model: the instances of a cube (the cells of a cube) can P.Month P.Store_Area._x P.Brand ALL P.
represent measure values of different granularities even if Year.2000 Store_Area._x ALL ALL >_s*1.1
there is a functional dependency among them. In our Year.1999 Store_Area._x ALL ALL _s
example, cell_A refers to sales measured per Product and
Cashier while cell_C refers to sales measured per Brand Table 2: The query Q4
and Cashier. Those cells, although defined at different
levels of granularity, can be part of the same cube. This is An additional advantage of our approach is that the most
due to the definition of the dimension domain, which typical OLAP operations: drill-down, roll-up, slice and
states that all members of all participating levels are valid dice are translated to simple selection queries on the cube.
dimension values. For example, a drill-down on the results of Q4 to the
Product level can be expressed by simply replacing Brand
We believe that the above property of our cubes is crucial with Product at the Item coordinate of Table 2.
for the compact and intuitive representation of
multidimensional data. As explained in the section 2, the Nevertheless, there is an important argument against this
OLAP users usually handle data defined over various modeling approach. The cube can represent cells with
levels of granularity. Queries may impose selection measures that are functionally dependent but it cannot
conditions on various dimension levels and may require guaranty their consistency. The measure of cell_C could
[Codd93] E. F. Codd: Providing OLAP to user-analysts: [RR91] M. Rafanelli, F.L. Ricci: A functional model
An IT mandate. E.F. Codd and Associates, 1993. for macro-databases. SIGMOD Record, 20(1), March
1991.
[DaTh97] A. Datta, H. Thomas: A conceptual Model and
an algebra for On-Line Analytical Processing in Data [SBH99] C. Sapia, M. Blaschka, G. Höfling: An
Warehouse. Proc. of the WITS 1997. Overview of Multidimensional Data Models for
OLAP. Technical Report 1999.
[G++96] J. Gray et al.: Data Cube: A Relational http://www.forwiss.tu-muenchen.de/
Aggregation Operator Generalizing Group-By, Cross-
Tab and Sub-Totals. Proc. of the ICDE 1996. [S++98] C. Sapia, M. Blaschka, G. Höfling, B. Dinter:
Extending the E/R model for the Multidimensional
[GoRi98] M. Golfarelli, S. Rizzi: A Methodological Paradigm. Proc. of the DWDM 1998.
Framework for Data Warehouse Design. Proc. of the
DOLAP 1998 [Shos97] A. Shoshani: OLAP and statistical databases:
Similarities and differences. Proc. of the PODS 1997.
[GyLa97] M. Gyssens, L.V.S. Lakshmanan: A
Foundation for Multi-Dimensional Databases. Proc. [TBC99] N. Tryfona, F. Busborg, J. G. B. Chistiansen:
of the VLDB 1997. starER: A Conceptual Model for Data Warehouse
Design. Proc. of the DOLAP 1999.
[Inmo96] W.H. Inmon: Building the Data Warehouse.
John Wiley & Sons, 1996. [TPC99] TPC: TPC Benchmark H and TPC Benchmark
R. Transaction Processing Council. June 1999.
[Kimb96] R. Kimball: The Data Warehouse Toolkit. John http://www.tpc.org/
Wiley & Sons, 1996.
[Truj99] J. Trujillo: The GOLD model: An Object
[Kimb97] R. Kimball: A Dimensional Modeling Oriented multidimensional data model for
Manifesto. DBMS, August 1997. multidimensional databases. Proc. of the ECOOP
1999.
[Lehn98] W. Lehner: Modeling Large Scale OLAP
Scenarios. Proc. of the EDBT 1998 [VaSe99] P. Vassiliadis, T. Sellis: A Survey of Logical
Models for OLAP Databases. SIGMOD Record 28(4),
[LeSh97] H. Lenz, A. Shoshani: Summarizability in Dec. 1999.
OLAP and Statistical Databases. In Proc. of the
SSDBM 1997. [Vass98] P. Vassiliadis: Modeling Multidimensional
Databases, cube and cube operations. Proc. of the
[LiWa96] C. Li, X. S. Wang: A Data Model for SSDBM 1998.
Supporting On-Line Analytical Processing. In Proc. of
the CIKM 1996.