Data, Information, and Knowledge
Data: A representation of certain facts without context, which can be processed by computers.
Information: Data become information when they are interpreted in a certain context.
Knowledge: Information becomes knowledge when it expands the recipient's capability of understanding reality and allows them to make new predictions, informed decisions, and proper actions. It is a cycle, not just a flow.
Chapter 1 - Decision Support System
This chapter provides a general overview of the purpose of decision support systems and of the concepts of data warehouse and of the data warehousing process.
Definitions
Prof. definition: An information system is a system whose purpose is to store, process, and communicate information.
Book definition: An information system is an organized collection of resources, people, and procedures finalized to collect, store, process, and communicate the information needed to support ongoing activities.
Information systems can be classified in several ways, but for our purposes, it is useful to classify them in the following categories based on the business activities that are required to support.
A Taxonomy of Information Systems
- Operational: To perform the business operational activities. It will consist of an operational database and a collection of application programs (transactions) which are used to access and update the data quickly and efficiently. The main goal is to maintain the correspondence between the database and the real-world situation it is modeling, as events occur in the real world. The data are under the control of a Data Base Management System (DBMS).
- Decision support: Decision support information systems can be classified into Management Information Systems and Decision Support Systems.
- MIS: It is used by managers in monitoring and controlling their units to correct problems by making decisions based on comparing the actual performance and the planned performance.
- DSS: To provide the information that managers need for analyzing the business and to help them make decisions of three types:
- Structured: When a well-defined decision-making procedure exists. Ex. Make a new order (Model-driven).
- Semi-structured: When the decision-making procedure is partly defined and requires the manager's creative intervention. Ex. Where to open a new store?
- Unstructured: When a well-defined decision-making procedure does not exist. The decision depends only on the manager’s experience. Ex. Should we launch a new product or not? (Data-driven).
- Web-based for E-commerce: The focus is the use of the internet web to allow new routes to the market.
Decision Support Categories
There isn’t a strict correlation between kinds of decisions and structure levels, but we can say that in the operational level decisions are more structured, in the tactical level decisions are semi-structured whereas in the strategical level are typically unstructured.
Three categories of decision support can be provided. Specifically:
- Reports: Reporting is considered the lowest level of decision support. A reporting facility capable of generating informative reports for managers in time to be useful is of the utmost importance for the successful operation of any business. (Slide to find out what happened in an event or in a period).
- Multidimensional data analysis: Sometimes called Online Analytic Processing (OLAP). Data analysis is usually accomplished interactively with data analysis tools. The goal of data analysis is to get useful information from the data. (Slide to explore data interactively to look for useful information).
- Exploratory data analysis: It uses what is called a discovery technique of useful data models with data mining algorithms. Unlike reports and multidimensional analysis, where the user must create and execute queries based on hypotheses, data mining algorithms search for answers. Predict the future.
Data Warehouse
A data warehouse is a subject-oriented (to analyze subjects of interest), integrated (from different data sources), non-volatile (the data in a data warehouse is primarily for query and analysis, and it is never changed interactively), time-varying (an operational system contains current data, while a data warehouse contains historical data over a long time for analysis and decision support, therefore a time dimension is explicitly included in data so that trends and changes over time can be analyzed) collection of data in support of management’s decisions.
Operational System vs. Decision Support System (Prof. Comparison)
- Operational System: Data are organized in a DB. Data are managed by a traditional DBMS. The applications are used to perform structured business operational activities. Typically, a relational data model. The focus of an operational information system is the execution of business processes.
- Decision Support System: Data are organized in a separate specialized DB. Data are managed by a specialized DBMS. The BI applications are used to analyze data. Typically, a mate data model. The focus of a decision support information system is the evaluation of the processes.
Book Comparison
A data warehouse is usually separated from an operational database for the following reasons:
- Performance: Special data organization, access, and implementation methods are needed to support multidimensional views and data analysis, which usually requires complex queries that would degrade the performance of operational transactions. Moreover, concurrency control and recovery DBMS modes are not compatible with data analysis.
- Function: Decision support requires:
- Historical data, which operational databases do not typically maintain.
- Consolidation (aggregation, summarization) of data from heterogeneous sources, such as operational databases and external sources.
- Different sources typically use inconsistent data representations, codes, and formats, which must be reconciled to enforce data quality.
Characteristics of OLAP – FASMI (Fast Analysis of Shared Multidimensional Information)
- Fast (OLAP)
- Analytic (OLAP)
- Shared
- Multidimensional: It is the primary requirement
- Information: Condense large amounts of data
Data Warehousing Architecture
The term data warehousing is used to refer to the process used to organize data from operational (OLTP) sources into a data warehouse and then allow end users to analyze them (OLAP) with business intelligence applications.
In practice, three types of solutions are adopted, depending on the number of data layers employed.
- One-layer architecture: It is usually used as the first low-cost solution for small organizations, with no separation between operational and analytical applications. Virtual data warehouse. The word “One” means that there is only a database.
- Two-layer architecture: It is more general than the previous one because a data warehouse exists separately from the operational database and is managed by a specific system. The data warehouse is loaded with data extracted with Extract, Transform, and Load (ETL) applications from the operational database, and any other structured data sources, to bring them to a consistent form. While the data sources are updated continuously by operational applications, the data warehouse is updated periodically with the ETL applications. This solution separates:
- The system for operational database management from the system for data warehouse management and decision support.
- The operational applications from the business intelligence applications, so that business analysis would not interfere with and degrade the performance of operational applications.
- Three-layer architecture: This solution is the most general with three data layers: the data sources, the data staging, and the data warehouse. The data staging contains data obtained from the integration of different data sources and prepared for loading into the data warehouse. The data staging may just be a set of files or, at the other extreme, a fully developed relational database. The complexity of the data staging layer depends on the quality of the data sources. This solution separates the process of extraction and integration of data sources from the process of data reorganization and loading into the data warehouse.
Data warehouses and operational databases provide different functions and require different kinds of data; therefore, they need to be maintained separately.
What to Model
Managers are interested in analyzing collections of facts about the performance of a key business process, measurable and worthy of improvement. A fact is represented by a set of numerical attributes by which the process performance is tracked and measured to maintain or improve their efficiency. Ex. A sale. In data warehousing terminology, the interval at which we take measurements is called the grain.
Managers think in terms of business dimensions, which give facts their context and are used to analyze them to evaluate their effects. Ex. For sales data, the dimensions could include Product, Date, and Store. Dimensions contain the descriptions of the subject being measured.
Managers analyze measurable business process performance using summary data (called metrics) obtained by grouping facts by different dimensions and combinations of dimensions, and then aggregating measures into useful forms.
Key Performance Indicators (KPI): Managers are interested in analyzing metrics in various levels of detail, by exploiting the fact that some dimensions have a set of associated attributes that can be structured as a hierarchy.
Chapter 2 - Data Warehouse Modeling
This chapter presents the fundamental concepts about a conceptual model for designing data warehouses, and the logical data model to implement them.
The purpose of a data warehouse is not just to store data but rather to facilitate decision-making. It is very important to understand the structure and the contents of the data to best support the needs of the business users. Ex. Number of product ordered by product, by customer, by month.
A symbolic model is a subjective formal representation of ideas and knowledge about some aspects of the real world (domain of discourse), designed to serve an explicit purpose.
Conceptual Data Model
To analyze a problem, given user requirements. Ex. E-R or Object Data Model.
Logical Model
To design a solution independently of actual DBMS. Ex. Relational Data Model.
Physical Model
To realize a project on a specific DBMS.
Conceptual Multidimensional Model
It is useful to reason about the characteristics of data at a conceptual level, independent of implementation concerns, as it happens with the Entity-Relationship model for databases.
Facts
Facts are modeled by a rectangle divided into two parts, which contain the facts' name and the set of measures. A measure is a numerical property of a fact that describes one of its quantitative aspects of interest for analysis. Sometimes, facts are without measures, and are usually called fact-less facts, but in accordance with our terminology, we call them measureless facts. This happens when facts represent events that only need to be counted.
Ex. Facts=OrderLines, that is to say, a single line of an order is an event to be modeled. So, if an order has 10 lines, we model it with 10 facts. So, a fact is a single sale.
Dimensions
Dimensions give facts their context and are used to analyze them. Dimensions are represented by lines emanating from the rectangle of facts and ending with a circle. In general, a dimension is described by a set of attributes used to qualify, categorize, or summarize facts in reports.
Dimensional Hierarchies
A hierarchical relationship between attributes values. The presence of a hierarchy between the dimensional attributes increases the possibilities of data analysis from different perspectives (Multidimensional Analysis). For example, once the sales of products have been analyzed by year, we can have a deeper analysis at a different level of detail to analyze product sales by quarter.
We can think to write the attributes by pointing out the functional dependencies between the attributes using arrows.
- Descriptive attributes: Dimensions and dimensional attributes are usually represented with arcs ending with a circle to model that their values may be used in data analysis for selecting or grouping facts data. However, if attributes are considered descriptive in the sense that in the data analysis they are used only for selecting data, or to show their values in the report result, but not for grouping or aggregating data, they are represented with an arc without a circle.
- Degenerate dimensions: Dimensions without any attributes. Usually, these are transaction-based numbers which describe the fact but are not measures because it is meaningless to aggregate them. Ex. Bill number.
- Optional attributes or dimensions: When the value of an attribute or a dimension may be undefined, the corresponding arcs are “cut”.
- Types of hierarchy:
- Balanced: When the possible levels are a predefined number and the attribute values are always defined. For example, the Date attributes Month, Quarter, and Year belong to a balanced hierarchy with three levels.
- Ragged: When the values of one or more attributes may be undefined. A ragged hierarchy is graphically denoted by marking with a dash the attributes whose values may be undefined. For example, a location dimension with attributes Country, State, and City is balanced in the US, but it is ragged for most European countries where State is non-used.
- Recursive (unbalanced): When the possible levels are a variable number. For example, in the dimension Agent, there is the attribute Supervisor representing a recursive hierarchy among agents. In the conceptual schema, a recursive hierarchy is represented with a loop.
- Shared hierarchy: The dimensions can share some hierarchy attributes, such as City and Customer. To avoid ambiguity, the circle is doubled and the arcs are oriented.
- Multivalued dimension or attribute: A fact may be associated with more than one value of a dimension. In this case, the outgoing arc from the fact ends with a double arrow. Besides dimensions, dimensional attributes also may be multivalued, and represented in the same way.
Considerations on the Conceptual Modeling of a Data Mart
In a data mart project, the focus is on the collection of facts, their measures, their dimensions, attributes, and hierarchies.
Instead, in a database project, the focus is on collections of entities, their properties, associations, and hierarchies between collections.
Here are the key steps in the conceptual design of a data mart:
- Identify the Granularity.
- Identify the Measures.
- Identify the Dimensions.
- Identify the Dimensional Attributes.
Step 1: Identify the Granularity of the Fact of Detail of the Event
When modeling a data mart, the first fundamental decision to be taken is the meaning of the fact because from this choice derives the measures that characterize the fact and the dimensions for its analysis. Grain is the precision with which the measurements are taken.
You must consider the nature of the fact which may be of the following types:
- A Transaction Fact: Represents the information on a specific event that occurred at a specific point in time during the execution of a business process. Ex. A fact is a transaction (withdrawal or deposit) on a bank account.
- A Periodic Snapshot Fact: Represents the information on a series of events that have occurred over a period of time. Ex. A fact is the monthly summary of all transactions on a bank account.
- An Accumulating Snapshot Fact: Represents the information on the lifetime of an evolving event that has a duration and changes over time. Ex. A mortgage application.
The grain of the fact determines the size of the data warehouse.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
-
Riassunto esame Psicologia del pensiero, decision making e comunicazione, Prof. Macchi Laura, libro consigliato Pen…
-
Riassunto esame Statistica, Prof. Likavec Silvia, libro consigliato Statistica per la ricerca sociale, Albano, Test…
-
Riassunto esame Psicologia dell'Orientamento Scolastico e Professionale: Insegnare il decision-making agli studenti…
-
Riassunto esame organizzazione e amministrazione del servizio sociale, prof. Ilari, libro consigliato Organizzare i…