Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
1 - REVIEW
- Gaussian distribution
- Mean
- Variance
- Standardization
2 - STATISTICAL LEARNING
- Regression function
- MSE decomposition
- Nearest neighbor averaging
- Linear model
- Model accuracy
- Training/Test MSE
- Bias-variance trade-off
- Bias
- Classification problems
- Conditional class probability
- Misclassification error rate
- K-nearest neighbors
2B - REVIEW 2
- Covariance
- Correlation
- Sample moments
- Mean
- Variance
3 - LINEAR REGRESSION
- Estimation with least squares
- Residual
- RSS
- Accuracy of LS
- Standard error
- Confidence interval
- Hypothesis testing
- Regression basics review
- LS criterion
- Matrix formulation
- Confidence intervals
- σ² is known
- General case
- Comparing nested models
- Fisher's F
- RSE
- R²
- Multiple linear regression
- Forward selection
- Backward selection
- Qualitative predictors
- Interactions
4 - CLASSIFICATION
- Using Linear Regression
- Logistic Regression
- Probability
- Logit
- Maximum likelihood
- Confounding
- Case-control sampling
- Multinomial Regression
- Discriminant analysis
- Probability
- LDA with R
- Discriminant score
- Estimated parameters
- QDA with R
- Discriminant score
- Fisher's discriminant plot
- From g(x) to p
- Probabilities
- Types of errors
- ROC plot
- Quadratic DA
- Naive Bayes
5 - RESAMPLING
- Validation set approach
- K-fold cross validation
- CV
- LOOCV
- Classification
- Loss of predictors
- Bootstrap
- Estimating prediction error
6 - MODEL SELECTION
- Linear model selection
- Feature selection
- Subset selection
- Best subset selection
- Stepwise selection
- Forward
- Backwards
- Estimating best error
- Cp
- AIC
- BIC
- Adjusted R²
- Validation/CV
- One-standard-error rule
- Shrinkage methods
- Ridge regression
- Lasso
- Dimension reduction methods
- Principal Components Analysis
- Partial Least Squares
7 - NONLINEAR MODELS
- Polynomial Regression
- Step functions
- Piecewise polynomials
- Linear splines
- Cubic splines
- Natural cubic splines
- Knot placement
- Smoothing splines
- Local Regression
- Generalized Additive Models
8 - TREES
- Regression problems
- Tree building
- Recursive binary splitting
- Pruning
- Cost complexity pruning
- Classification problems
- Gini index
- Deviance
- Bagging
- Out-of-bag error estimation
- Boosting
- B
- λ
- X
9 - SUPPORT VECTOR MACHINES
- Maximal margin classifier
- Non-separable data
- Feature expansion
- Kernels
10 - UNSUPERVISED LEARNING
- Principal Component Analysis
- Proportion of Variance Explained
- K-means clustering
Events
Probability is defined on events. An event is a set, so it has the operations of sets.
Probability
- Kolmogorov axioms:
- 0 < P(A) < 1
- A ∪ B = &:RightArrow; P(A + B) = P(A) + P(B) (∪ is the union)
- In general, P(A + B) = P(A) + P(B) - P(AB) (AB = intersection)
Conditional probability
If P(M) ≠ 0, P(A|M) = P(AM) / P(M)
Total probability theorem
M1, ..., Mn disjoint events with Σi=1n Mi > A, P(Mi) ≠ 0. Then,
P(A) = Σi=1n P(A|Mi)P(Mi)
Bayes theorem
P(M|A) = P(A|M)P(M) / P(A)
Using the total probability theorem:
P(Mi|A) = P(A|Mi)P(Mi) / Σj=1n P(A|Mj)P(Mj)
Independence
A and B are independent if P(AB) = P(A)P(B) <=> P(A|B) = P(A)
Mean
mx = m1 = ∫-∞∞ x fx(x) dx
It's the barycenter of fx(x)
Y = g(x) => mY = ∫-∞∞ g(x) fx(x) dx
All the moments can be interpreted as a first order moment of the corresponding power of X.
mk = E[Xk] = ∫-∞∞ xk fx(x) dx
Y = a X + b E[Y] = a E[X] + b => E[.] is a linear operator.
Variance
Var[X] = σx2 = ∫-∞∞ (x - mx)2 fx(x) dx
The variance measures how concentrated the pdf is around its mean.
Standard deviation: σx = √Var[X]
Var[X] = m2 - m12
Var[X] = E[(X - E[X])2]
Y = a X + b => Var[Y] = Var[a X + b] = ... = a2 Var[X] it's invariant to translation
Standardization
Y = X - E[X]σx => E[Y] = 0 , σY = 1 => Y = Z
X ~ N(mx, σx2) , Z ~ N(0, 1)
FX(x) = FZ(x - mXσX)
K-Nearest Neighbors
Example:KNN with K=3
It is not necessary that the final regions are consistent with the training set.
We take a neighborhood with K nearest observations to a certain point and see what's the class with highest probability.
The decision of the number of neighbors influences the complexity of the model.Large K → low flexibility.Low K → high flexibility, might be noisy.Using the test data we can find the optimal K.
Statistics
Sample moments
Xi, i=1,...,n i.i.d. (independent and identically distributed)Problem: estimating E[Xik]Solution: sample moments Mk=1⁄n∑ Xik
Sample mean: M1=X̄n=1⁄n∑ Xi
Sample variance: S2=1⁄n-1∑ (Xi-X̄n)2
Law of large numbers (proof)
Xi, i=1,...,n i.i.d,E[Xi]=m Var[Xi]=σ2<∞Then, limn→∞E[(X̄n-m)2]=0 (The Mean Squared Error tends to zero)
Mean square convergence
Let θ0 be deterministic. Then, limn→∞E[(Θ∧n-θ0)2]=0 iff:limn→∞E[Θ∧n]=θ0limn→∞Var[Θ∧n]=0
limn→∞E[(Θ∧n-θ0)2]=0 mean square convergence => limn→∞p(|Θ∧n-θ0|>ε)=0, ∀ε>0 convergence in probability
Central limit theory
Xi, i=1,...,n i.i.d,E[Xi]=m Var[Xi]=σ2<∞X̄n=ΣXi, Sn=X̄n−E[X̄n]⁄√Var[X̄n]Then, the cdf of Sn converges to N(0,1)
Consequence:Asymptotically, X̄n ~ N(m, σ2⁄n). Regardless of the starting distribution, it converges to a Gaussian.