Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano

Riassunto per l'esame di Decision Support Database e del prof. Ruggieri, basato su appunti personali del publisher e studio autonomo del libro consigliato dal docente Decision Support Databases …

Esame Decision Support Database

Facoltà Scienze matematiche fisiche e naturali

Dal corso del Prof. Ruggieri Salvatore

Università Università degli Studi di Pisa

Publisher macchia17

A.A. 2017-2018

24 pagine

4 download

Appunto

Vota 5,0 / 5 (1)

Scarica

Estratto del documento

Ex. SUM, MIN, MAX, COUNT

2. Algebraic

If it can be computed from a ﬁnite algebraic expression deﬁned over distributive functions.

For example, the functions average (AVG), variance (VAR), and standard deviation (STDEV) are

algebraic aggregate functions.

3. Holistic

If it cannot be computed from other aggregate functions. For example, MEDIAN, MODE, RANK.

Chapter 3- Data warehouse design

The purpose of a data warehouse (DW) is not just to store data but rather to facilitate decision making.

The DW design process has 4 main steps:

1. Requirements Analysis. The goal is to produce a description of the business processes, the typical

information analysis activities with which users are involved, and the measures and dimensions of

interest. Typically, requirements at this stage are documented rather informally.

2. Conceptual Design. The goal is to produce a formal description of the data to be analysed in high-level-

term using a conceptual data model.

We will use the Dimensional Fact Model, DFM, to describe facts, dimensions, dimensional attributes

and attribute hierarchies.

3. Logical Design. The goal is to transform the conceptual design into the logical structures used for storing

the DW in a relational DBMS.

4. Physical Design. The goal is to deﬁne the data structures needed for storing the database tables created

by the logical design. The main issues are what indexes and materialized views to deﬁne to optimize the

overall performance of the system.

In general, the data to load in the DW are processed with two important and complex kinds of operations:

1. Transform. When the data come from different sources, their formats are revised to align them by

eliminating syntactic and semantic differences.

(a) Syntactic transformation. The same data can have both attributes with different names, and the

names are not those to be used in the DW, and different types. For example, a code is deﬁned in some

cases of a type string, and in others a type integer; a gender is deﬁned as (M, F), (m, f), (0, 1) or (male,

female); a value is deﬁned with different units of measurement, and so on.

(b) Semantic transformation. The data in source databases may have been used with a different

meaning. For example, sales can be daily or weekly.

2. Cleaning. The data are analysed in order to eliminate errors of representation or to complete missing

information. For example, in the case of addresses the zip code can be wrong or the name of the town

can be written in different ways (Busto Arsizio also written as BustoArsizio or BArsizio)

The information generated during the design and implementation of a DW is organized and stored as metadata

using appropriate specialized tools or taking advantage of the capabilities of DW systems that provide. In the

case of DW metadata are about other aspects of the data and can be classiﬁed into the following main

categories:

– Business metadata. Concern the meaning of the terms used to deﬁne the logical structure of data in corporate

terminology. This type of metadata is usually used by users to understand the nature of the data available.

– Structural metadata. Concern the logical structure of facts and dimensions, types of attribute values,

hierarchies, dimensions and meaningful aggregations. This type of metadata is usually used by users to

understand what types of analysis can be performed on the data.

– Technical metadata. Are concerned with the physical data property, such as storage structures, data sources,

date of loading, transformations applied etc. This type of metadata is usually used by technicians for the

maintenance and development of a DW.

– Operational metadata. Concerns the types of predeﬁned analysis reports and what parameters should be

used.

– Design metadata. Concern the results of the DW design phases.

3.2 DATA WAREHOUSE DESIGN APPROACHES

1. DATA-DRIVEN (top-down)

The goal is to design ﬁrst an enterprise DW based on the data available in

the operational information system, then the data marts are created from

the DW. This is done by analysing a conceptual model of data, if one is

available, or the actual logical record layouts and selecting data elements

deemed to be of interest. This approach is the only possible when the

demand for information from a DW does not exist until the DW is

available.

An initial DW design based on the data available can help both users to discover new ways in which to use the

available data, and the designer to identify areas on which to focus data warehouse development efforts. The

disadvantage of this approach is that without user involvement there is the risk of producing a non-interesting

result.

2. ANALYSIS-DRIVEN (bottom-up)

The goal is to design ﬁrst the data marts based on the data analysis that the users want to perform, and then

the data marts are integrated to build the DW. The major advantage to this approach is that the focus is on

providing what is really needed, rather than what is available. In general, this approach has a smaller scope than

the data-driven approach. Therefore, it generally produces useful data marts in a shorter time span. The

disadvantage of this approach is the risk that some of the data that the analysis needs are not available.

Moreover, if a user is too tightly focused, it is possible to miss useful data that is available in the operational

information system.

Design phases for each data mart of interest:

1. Requirements analysis

2. Initial analysis-driven data mart conceptual design

3. Candidate data-driven data mart conceptual design

4. Final data mart conceptual design

5. Data mart and DW logical design

REQUIREMENTS ANALYSIS

The requirements analysis phase is divided into two main sub-phases, characterized by the different language

used for the preparation of the documents they produce.

The ﬁrst sub-phase, Requirements gathering, produces a natural language speciﬁcation of requirements.

(a) Analysis of the problem domain for which the modeling will be done.

(b) Analysis of the business processes to select, with end-user interviews, those more interesting to consider for

designing the data warehouse.

Examples of frequently encountered categories of business questions are:

– Existence checking analysis, such as “A given product has been sold to a particular customer.”

– Item comparison analysis, such as “Compare the value of purchases of two customers over the last six months,”

or “Compare the number of items sold for a given product category, by store, and by week.”

– Trend analysis, such as “The growth in item sales for a given set of products, over the last 12 months.”

– Queries to analyse ratios, rankings, and clusters, such as “Rank our best customers in terms of dollar sales

over the last year.”

– Statistical analysis, such as “The average item sales by product category, by sales region.”

Note that the business questions must usually be “interpreted” in order to express them in a form more useful

for designing the data warehouse. That is, an interpreted business question should be expressed with the types

of reports to be produced, or phrases that reveal the following information: the constraints on data to analyse,

the requested dimensional attributes and the metrics (aggregation operation) to compute, the coordinates

(dimensions) against which the fact must be analysed, the result sorting criteria and if metrics’ partial value are

needed. When business questions are expressed by means of phrases, the use of the following form is

suggested: “For a data subset to use, the metrics to compute, by dimension, ..., by dimension. How the result

should be presented”. For example, “For the year 2010 in Italy, the total of sale revenue, by region, by month,

by customer name. The result must be sorted by region, month, and customer name, and must include partial

totals for all regions.” It is important also to check that the information analysis requirements need data that

are currently available or can be obtained as external data that exist outside the enterprise. If there are multiple

data sources, the analysis is complicated by the need to reconcile the likely differences in representation of

information. In the following we will not consider this aspect.

The second sub-phase, Requirements speciﬁcation, produces a description of the requirements for data analysis

outlining the salient features to be modeled then with the conceptual design.

Business Process Requirements: 12

Fact Description:

Dimensions:

Dimensional Attributes:

Dimensional Hierarchies:

Dimensional Attributes Changes: . It is important to understand how

the business wants to deal with the dimension attributes that can change over time. Consequently, for each of

them, besides the name, the type of strategy is speciﬁed to deal with them in the logical design phase and data

loading. The strategy to be speciﬁed depends on the analysis’ objectives, and for this reason it should be

documented in the requirements speciﬁcation. We consider four options, of which the ﬁrst three are considered

for slowly changing dimensions:

Type 1 (overwriting the history) If a dimensional attribute changes its value, only the latest value is

required to be held in the data mart. This means that there is no need to preserve the previous value.

For example, if a customer changes address, the new one replaces the present value of the dimension

Customer. It is the easiest solution, but the history of customer addresses is lost.

Type 2 (preserving the history) If a dimensional attribute changes its value, both the old and the new

value are required to be held in the data mart, but the structure of the dimension must not be changed.

It is the solution commonly used.

Type 3 (preserving one or more versions of history) If a dimensional attribute changes its value, the

structure of the dimension is extended with additional attributes to keep the tracking history with both

old and new values.

Type 4 (fast changing) The dimensional attributes change frequently, and must not be treated with one

of the previous solutions

Measures:

Descriptive attributes of the facts:

Summary of dimensions and Measures:

INITIAL ANALYSIS-DRIVEN DATA MART CONCEPTUAL DESIGN

An initial data mart conceptual design is deﬁned from the analysis that the users perform (the design from what

the users want), without any claim to completeness but useful as a formal description of requirements. In the

conceptual design the dimensional hierarchies are modelled together with their type (balanced, incomplete,

recursive), degenerate dimensions and descriptive attributes of the facts.

CANDIDATE DATA-DRIVEN DATA MART CONCEPTUAL DESIGN

A method is described for developing a candidate data mart conceptual design from the operational database

relational schema (the design from what is availa

Anteprima

Vedrai una selezione di 6 pagine su 24

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano Pag. 1

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano Pag. 2

Anteprima di 6 pagg. su 24.
Scarica il documento per vederlo tutto.

Scarica

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano Pag. 6

Anteprima di 6 pagg. su 24.
Scarica il documento per vederlo tutto.

Scarica

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano Pag. 11

Anteprima di 6 pagg. su 24.
Scarica il documento per vederlo tutto.

Scarica

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano Pag. 16

Anteprima di 6 pagg. su 24.
Scarica il documento per vederlo tutto.

Scarica

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano Pag. 21

Acquista con carta o PayPal

Scarica i documenti tutte le volte che vuoi

Dettagli

SSD

Scienze matematiche e informatiche INF/01 Informatica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher macchia17 di informazioni apprese con la frequenza delle lezioni di Decision Support Database e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Pisa o del prof Ruggieri Salvatore.

Appunti correlati

Invia appunti e guadagna

Recensioni

5/5

1 recensione

5 stelle

4 stelle

3 stelle

2 stelle

1 stella

Ti è piaciuto questo appunto?

Menny.melody90

9 Gennaio 2023

Riassunto esame Decision Support Database, prof. Ruggieri, libro consigliato Decision Support Databases Essentials, Albano

Ex. SUM, MIN, MAX, COUNT

3.2 DATA WAREHOUSE DESIGN APPROACHES

REQUIREMENTS ANALYSIS

INITIAL ANALYSIS-DRIVEN DATA MART CONCEPTUAL DESIGN

CANDIDATE DATA-DRIVEN DATA MART CONCEPTUAL DESIGN

Recensioni

Domande e risposte

I migliori insegnanti di Basi di dati

Matteo S.

Marco L.

Simone C.