Analysis of variance, logistic regression, cluster analysis

Name: Analysis of variance, logistic regression, cluster analysis
Rating: 4.0 (2 reviews)
Author: diegoianni

Revisionato il 31/05/2026

di diegoianni

Publisher

Vota 4,0/5 (2)

Contenuto verificato e approvato dal Team di Esperti di Skuola.net

Appunti in lingua inglese di statistica applicata all'economia con esempi. Argomenti trattati: analysis of variance, logisti regression, cluster analysis, factor analysisbasati su appunti …

Esame Applied Statistics

Facoltà Economia

Dal corso del Prof. Recla Alessandro

Università Università degli Studi dell' Insubria

A.A. 2020-2021

7 pagine

1 download

Appunto

Scarica

Estratto del documento

Applied statistics: multivariate analysis

Types of multivariate analysis

Multivariate analysis can be divided into two types with different goals:

Analysis of dependence: simple multiple linear regression, logistic regression.
Analysis of interdependence: factor analysis, cluster analysis (hierarchical and non-hierarchical).

Factor analysis

Definition: It is a multivariate technique for interdependence analysis among quantitative variables.

Main objective: Reduce variables to provide more aggregate information and create new quantitative variables characterized by optimal properties. The input may have multicollinearity problems, which can be adjusted with factor analysis.

Extraction method: Principal components method (PCA). This method assumes that the specific information contribution of the input variables is very low, while the shared information contribution is very high, allowing explanation through k principal common factors.

How many factors:

The factors must be 30% of the initial variables (k/p).
Scree plot: stop before the point where the line gets flatter.
Percentage of total variance explained: between 60% and 75%. If it is more, you need to reduce factors.
Latent roots (Eigenvalues): > 1 (default factor).
Communalities (sum of component loadings): > 0.5, indicating the variance explained by the solution of the single input variables.

How to interpret the components:

The component matrix: each initial variable shows the correlation with the new factors.
The rotated component matrix, with different methods:
- Varimax: minimizes the number of variables with high loadings (correlation) for each factor.
- Quantimax: attempts to minimize the number of factors strongly correlated to each variable.
- Equimax: a cross between Varimax and Quantimax.

Factor scores: Once an adequate solution is found, it's possible to use the obtained factors as new macro variables called factor scores. They are standardized variables with a mean of 0 and variance of 1. They can be used as explanatory variables in a regression model or as segmentation variables in cluster analysis.

Cluster analysis: non-hierarchical algorithm

Definition: Cluster analysis (CA) is an automatic classification technique that classifies statistical units into groups or clusters, which are internally homogeneous but externally very heterogeneous.

Main objective: Creating groups that are internally homogeneous but have high external variability (each group is different from the others), also known as segmentation. Groups can be formed by factors.

Different types of segmentation: Based on types of data, segmentation can be behavioral, need-based, demographic, or value-based.

Two main types of algorithms:

Direct classification algorithms (k-means algorithm): where the number of clusters is specified.

Anteprima

Vedrai una selezione di 3 pagine su 7

Analysis of variance, logistic regression, cluster analysis Pag. 1

Analysis of variance, logistic regression, cluster analysis Pag. 2

Anteprima di 3 pagg. su 7.
Scarica il documento per vederlo tutto.

Scarica

Analysis of variance, logistic regression, cluster analysis Pag. 6

Acquista con carta o PayPal

Scarica i documenti tutte le volte che vuoi

Dettagli

SSD

Scienze economiche e statistiche SECS-S/01 Statistica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher diegoianni di informazioni apprese con la frequenza delle lezioni di Applied Statistics e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi dell' Insubria o del prof Recla Alessandro.

Appunti correlati