Anteprima
Vedrai una selezione di 13 pagine su 60
Appunti di Statistical Learning Theory (STALT) Pag. 1 Appunti di Statistical Learning Theory (STALT) Pag. 2
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 6
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 11
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 16
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 21
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 26
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 31
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 36
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 41
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 46
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 51
Anteprima di 13 pagg. su 60.
Scarica il documento per vederlo tutto.
Appunti di Statistical Learning Theory (STALT) Pag. 56
1 su 60
D/illustrazione/soddisfatti o rimborsati
Disdici quando
vuoi
Acquista con carta
o PayPal
Scarica i documenti
tutte le volte che vuoi
Estratto del documento

1 - REVIEW

  • Gaussian distribution
  • Mean
  • Variance
  • Standardization

2 - STATISTICAL LEARNING

  • Regression function
  • MSE decomposition
  • Nearest neighbor averaging
  • Linear model
  • Model accuracy
  • Training/Test MSE
  • Bias-variance trade-off
  • Bias
  • Classification problems
  • Conditional class probability
  • Misclassification error rate
  • K-nearest neighbors

2B - REVIEW 2

  • Covariance
  • Correlation
  • Sample moments
  • Mean
  • Variance

3 - LINEAR REGRESSION

  • Estimation with least squares
  • Residual
  • RSS
  • Accuracy of LS
  • Standard error
  • Confidence interval
  • Hypothesis testing
  • Regression basics review
  • LS criterion
  • Matrix formulation
  • Confidence intervals
  • σ² is known
  • General case
  • Comparing nested models
  • Fisher's F
  • RSE
  • Multiple linear regression
  • Forward selection
  • Backward selection
  • Qualitative predictors
  • Interactions

4 - CLASSIFICATION

  • Using Linear Regression
  • Logistic Regression
  • Probability
  • Logit
  • Maximum likelihood
  • Confounding
  • Case-control sampling
  • Multinomial Regression
  • Discriminant analysis
  • Probability
  • LDA with R
  • Discriminant score
  • Estimated parameters
  • QDA with R
  • Discriminant score
  • Fisher's discriminant plot
  • From g(x) to p
  • Probabilities
  • Types of errors
  • ROC plot
  • Quadratic DA
  • Naive Bayes

5 - RESAMPLING

  • Validation set approach
  • K-fold cross validation
  • CV
  • LOOCV
  • Classification
  • Loss of predictors
  • Bootstrap
  • Estimating prediction error

6 - MODEL SELECTION

  • Linear model selection
  • Feature selection
  • Subset selection
  • Best subset selection
  • Stepwise selection
  • Forward
  • Backwards
  • Estimating best error
  • Cp
  • AIC
  • BIC
  • Adjusted R²
  • Validation/CV
  • One-standard-error rule
  • Shrinkage methods
  • Ridge regression
  • Lasso
  • Dimension reduction methods
  • Principal Components Analysis
  • Partial Least Squares

7 - NONLINEAR MODELS

  • Polynomial Regression
  • Step functions
  • Piecewise polynomials
  • Linear splines
  • Cubic splines
  • Natural cubic splines
  • Knot placement
  • Smoothing splines
  • Local Regression
  • Generalized Additive Models

8 - TREES

  • Regression problems
  • Tree building
  • Recursive binary splitting
  • Pruning
  • Cost complexity pruning
  • Classification problems
  • Gini index
  • Deviance
  • Bagging
  • Out-of-bag error estimation
  • Boosting
  • B
  • λ
  • X

9 - SUPPORT VECTOR MACHINES

  • Maximal margin classifier
  • Non-separable data
  • Feature expansion
  • Kernels

10 - UNSUPERVISED LEARNING

  • Principal Component Analysis
  • Proportion of Variance Explained
  • K-means clustering

Events

Probability is defined on events. An event is a set, so it has the operations of sets.

Probability

  • Kolmogorov axioms:
  • 0 < P(A) < 1
  • A ∪ B = &:RightArrow; P(A + B) = P(A) + P(B) (∪ is the union)
  • In general, P(A + B) = P(A) + P(B) - P(AB) (AB = intersection)

Conditional probability

If P(M) ≠ 0, P(A|M) = P(AM) / P(M)

Total probability theorem

M1, ..., Mn disjoint events with Σi=1n Mi > A, P(Mi) ≠ 0. Then,

P(A) = Σi=1n P(A|Mi)P(Mi)

Bayes theorem

P(M|A) = P(A|M)P(M) / P(A)

Using the total probability theorem:

P(Mi|A) = P(A|Mi)P(Mi) / Σj=1n P(A|Mj)P(Mj)

Independence

A and B are independent if P(AB) = P(A)P(B) <=> P(A|B) = P(A)

Mean

mx = m1 = ∫-∞ x fx(x) dx

It's the barycenter of fx(x)

Y = g(x)   => mY = ∫-∞ g(x) fx(x) dx

All the moments can be interpreted as a first order moment of the corresponding power of X.

mk = E[Xk] = ∫-∞ xk fx(x) dx

Y = a X + b   E[Y] = a E[X] + b => E[.] is a linear operator.

Variance

Var[X] = σx2 = ∫-∞ (x - mx)2 fx(x) dx

The variance measures how concentrated the pdf is around its mean.

Standard deviation: σx = √Var[X]

Var[X] = m2 - m12

Var[X] = E[(X - E[X])2]

Y = a X + b => Var[Y] = Var[a X + b] = ... = a2 Var[X]   it's invariant to translation

Standardization

Y = X - E[X]σx   => E[Y] = 0 , σY = 1   => Y = Z

X ~ N(mx, σx2) ,   Z ~ N(0, 1)

FX(x) = FZ(x - mXσX)

K-Nearest Neighbors

Example:KNN with K=3

It is not necessary that the final regions are consistent with the training set.

We take a neighborhood with K nearest observations to a certain point and see what's the class with highest probability.

The decision of the number of neighbors influences the complexity of the model.Large K → low flexibility.Low K → high flexibility, might be noisy.Using the test data we can find the optimal K.

Statistics

Sample moments

Xi, i=1,...,n i.i.d. (independent and identically distributed)Problem: estimating E[Xik]Solution: sample moments Mk=1n∑ Xik

Sample mean: M1=X̄n=1n∑ Xi

Sample variance: S2=1n-1∑ (Xi-X̄n)2

Law of large numbers (proof)

Xi, i=1,...,n i.i.d,E[Xi]=m Var[Xi]=σ2<∞Then, limn→∞E[(X̄n-m)2]=0 (The Mean Squared Error tends to zero)

Mean square convergence

Let θ0 be deterministic. Then, limn→∞E[(Θn0)2]=0 iff:limn→∞E[Θn]=θ0limn→∞Var[Θn]=0

limn→∞E[(Θn0)2]=0 mean square convergence => limn→∞p(|Θn0|>ε)=0, ∀ε>0 convergence in probability

Central limit theory

Xi, i=1,...,n i.i.d,E[Xi]=m Var[Xi]=σ2<∞X̄n=ΣXi, Sn=n−E[X̄n]√Var[X̄n]Then, the cdf of Sn converges to N(0,1)

Consequence:Asymptotically, X̄n ~ N(m, σ2n). Regardless of the starting distribution, it converges to a Gaussian.

Dettagli
Publisher
A.A. 2022-2023
60 pagine
SSD Ingegneria industriale e dell'informazione ING-INF/04 Automatica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher Teoscard di informazioni apprese con la frequenza delle lezioni di Statistical learning theory e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Pavia o del prof De Nicolao Giuseppe.