Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
MOLAP:
ical" data cube). It is used when extra storage space is available on the server
and the best query performance is desired;
• it uses relational data model to represent multidimensional data. It
ROLAP:
is used when there is limited space on the server and the query performance
is not so important;
• it is a combination of the two. It does not necessarily create a copy
HOLAP:
of the source data, but data aggregations are stored in a multidimensional
structure on the server. This provides space saving and faster query processing.
OLAP provides the following operation to manipulate multidimensional data:
• aggregates data at a higher level;
Roll-up:
• de-aggregates data at a lower level;
Drill-down:
• applies selections and projections which reduce data dimen-
Slice and dice:
sionality; 17
• selects two dimensions to re-aggregate data and performs a re-
Pivoting:
orientation of the cube;
• sorts data according to predefined criteria;
Ranking:
• All the traditional OLTP operations are supported (select, join, project...).
4.2 Data Warehouse Design
When the design process of a data warehouse starts, relevant data sources must be
selected. DWs are based on a multidimensional model and E-R models cannot be
used in the DW conceptual design, this is why one or more fact schemata are derived
from them. They are characterized by fact name, measures and dimensions. The
dimensions of the same fact must have distinct names.
A must assume discrete values and can be organized into
dimensional attribute
a hierarchy. In details, a is a directional tree whose nodes are
dimensional hierarchy
dimensional attributes, the edges describe n:1 associations between pairs of dimen-
sional attributes and the root is the considered dimension. Moreover, two dimen-
sional attributes can be connected by more than two distinct direct edges. Hierarchy
is used not to duplicate portions of hierarchies. In this case, we have to give
sharing
a different name to each arc.
It is possible to identify three different categories of flow measures are
measures:
related to a time period (eg: number of sales per day), level measures are evaluated in
the particular time instant (eg: number of products in stock) and unitary measures
are level measures, but are also relative measures (eg: interest rate, money exchange
rate...).
A is an occurrence of a fact. It is represented by means of a tuple
primary event
of values. A hierarchy describes how it is possible to group and select primary events
and its root represents the finest aggregation granularity.
A contains additional information about a dimensional at-
descriptive attribute
tribute. It is not used to aggregate data.
A is a dimensional or descriptive attribute whose
cross-dimensional attribute
value is obtained by combining values of some dimensional attributes (eg: IVA).
Some attributes or dimensions may be related by a many-to-many relationship.
4.2.1 Conceptual Design
We start from the logical or conceptual schema of the source and apply a top-down
methodology. First of all, we identify the facts and for each of them an attribute
tree is defined, a fact schema is produced and a glossary is written.
A fact can be either an entity or a relationship of the source E-R schema and
it corresponds to an event that dynamically happens in the organization. The fact
18
becomes the root of a new fact schema. Furthermore, the is composed
attribute tree
by root and nodes and can be obtained through a semi-automatic procedure. After
that, the attribute tree is edited to rule out everything that is irrelevant to us by
means of two techniques: pruning, the subtree rooted in node n is deleted, and
grafting, the children of node n are directly connected to the father of n.
Pay attention when there are cycles in the E-R schema:
Dimensions are chosen among the children of the root and time is always a good
candidate. Numerical attributes that are children of the root are usually measures.
Further measures are defined by applying aggregate functions to numerical attributes
(sum, avg, min, max...). It is possible that a fact has no measures, it is an empty
fact.
In the an expression in associated with each measure and describes how
glossary,
the measure itself is obtained for each primary event at different levels of aggregation
starting from the attributes of the source schema.
Summing up, the steps of the conceptual design are:
1. fact definition;
2. for each fact:
• attribute tree generation;
• editing of the tree;
• dimensions definition;
• measures definition;
• fact schema creation. 19
4.2.2 Logical Design
There exists a great variety of data warehouse schemata. Starting from the concep-
tual schema, we want to obtain the logic schema for a specific data mart.
ROLAP is based on the that is a set of charac-
star schema, dimension tables
terized by a primary key and a set of attributes. Moreover, a fact table imports
all the primary keys of such dimension tables. Their denormalization introduces
redundancy, but guarantees fewer joints. Cross-dimensional attributes require to
create a new dimension table having as keys the associated dimensional attributes.
Shared hierarchies and convergences should use the same dimension table without
duplicating it.
The reduces this denormalization, but also the available mem-
snowflake schema
ory space. The advantage is the enhanced query execution.
An alternative to the star and the snowflake schemata is the constellation fact
multiple fact tables share dimension tables and the schema is viewed as a
schema:
collection of stars. It is used in sophisticated applications.
Since aggregation computation is very expensive, we can use to
materialized views
speed up frequent queries. Primary views correspond to primary aggregation levels
while secondary views are related to secondary events obtained aggregating primary
ones. It is useful to materialize a view when it directly solves a frequent query and
reduces the costs of some of them.
Sometimes it is useful to introduce new measures in order to manage aggrega-
tion correctly. are obtained by applying mathematical operators
Derived measures
to two or more values of the same tuple. Aggregate operators are of various types:
aggregate data starting from partially aggregated data (sum,
distributive operators
max, min), require further information to aggregate data (av-
algebraic operators
erage), cannot aggregate data starting from partially aggregated
holistic operators
ones.
Summing up, the steps of logical modeling are:
1. choice of the logical schema (star/snowflake);
2. conceptual schema translation;
3. choice of the materialized views;
4. optimizations. 20
5 Temporal Databases
Temporal databases are still a research area and take time into account in an unusual
manner. They store data related to time instances by offering temporal data types
and keeping information related to the past, the present and the future. Temporal
databases are in contrast to current databases which store only facts that are true
at the current time.
Possible applications deal with data warehousing, finance, law, medical records,
project scheduling, science...
Applications that manage temporal data would benefit from built-in, knowledge
independent temporal support. Their development would be more efficient and
performance increased.
Temporal databases are very useful when we want to keep the history of some-
thing. There are multiple alternatives to do that: it could be up to the user deter-
mining the history by inspecting data, we can use SQL as much as possible or we
can take advantage of embedded SQL.
Some special constructs are required, for example the Temporal
temporal join.
DBMSs have not reached a satisfactory performance level, therefore remain an open
research problem.
Software changes are part of the software operational life: the application or the
database schema are modified during their lifetime. A modification may be caused
by the change of the reality of interest or an improved perception of the reality
itself. Schema evolution and versioning deal with the need to retain current data
and system software functionality when database structure is changed.
In particular, schema evolution permits modifications of the schema without
loss of extensional data, while schema versioning allows querying all data through
appropriate version-based interfaces.
Basically, what we want from temporal databases is:
• capturing the semantics of time-varying information;
• retaining the simplicity of the relational model;
• presenting information concerning an object in a coherent fashion;
• ensuring ease of implementation and high performance.
5.1 Timestamps
A is a seven-part value (year, month, day, hour, minute, second and
timestamp
microsecond) that designates a date and time. The internal representation is a
string of 10 bytes: 4 for the date, 3 for the time and 3 for microseconds.
21
Time is already present in commercial DBMSs, but temporal DBMSs are able
to manage time-referenced data: timestamps are associated to database entities, for
example individual or groups of attributes, individual or sets of tuples, objects and
schema items.
As far as the semantics of a timestamp is concerned, database facts have at least
two relevant aspects:
• it is the time when the fact is true in the modeled reality. It
Valid time:
captures the time varying states of the real world. It can be either in the past
or in the future and can be changed frequently. All facts have a valid time and
it may not necessarily be recorded in the DB. If a database models different
world, each fact might have more than one valid time, one for each world.
A valid time table can be updated and supports historical queries.
• it specifies when a fact has been recorded in the database.
Transaction time:
It captures the time varying states of the database. It cannot extend beyond
the current time and cannot be changed. From the transaction time viewpoint,
an entity has a duration: from insertion to deletion, that is only logical because
it is not physically removed from the DB. Transaction time may be associated
not only to real world facts, but also with other DB concepts, such as attribute
values that are updated at a given time.
A transaction time table is appended only: it keeps the whole history of up-
dates. It supports rollback queries.
Valid time is controlled by the user while transaction time is controlled by the
database. We can have four different kinds of tables: snapshot, valid time, transac-
tion time, bitemporal.
are appended only and support both rollback and historical
Bitemporal tables
queries.
5.2 Time
The time structure can be of various types:
• Linear: