Anteprima
Vedrai una selezione di 10 pagine su 94
Machine Learning - Appunti completi Pag. 1 Machine Learning - Appunti completi Pag. 2
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 6
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 11
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 16
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 21
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 26
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 31
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 36
Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.
Machine Learning - Appunti completi Pag. 41
1 su 94
D/illustrazione/soddisfatti o rimborsati
Disdici quando
vuoi
Acquista con carta
o PayPal
Scarica i documenti
tutte le volte che vuoi
Estratto del documento

MACHINE LEARNING 2018/2019

DETAILED PROGRAM

Prof. F. Vandin

Last Update: January 20th, 2019

This document describes, for each topic, the level of detail required for the material presented during the lectures (see the slides). The level of detail relates to what has been presented during the lectures, so all details means that all details presented during the lectures (see the slides) is required while main idea means that only the understanding of the concepts presented is required, while the details (e.g., details of propositions, proofs, formulas) are not required. Note that the background material (probability, linear algebra) is assumed to be known at the level of detail used during the presentation of the topics below. For some propositions the corresponding proposition in the book is used for clarity.

  1. Learning Model

    • All details presented in class required, including: definitions, propositions’ statements, proof of Corollary 2.3 [UML], definition of agnostic PAC learnability for general loss functions
  2. Uniform Convergence

    • Lemma 4.2 [UML]: statement and proof with all details;
    • Definition of uniform convergence property: main idea (details of definition not required)
    • Corollary 4.6 [UML]: only main idea and bound on the number of samples
  3. Basics of Statistics

    • definition confidence interval, definition rejection region: details
    • hypothesis testing: main idea; hypothesis testing rejection rule in detail
    • everything else: main idea
  4. Linear Models

    • linear predictors/models: definitions with all details
    • linear classification, perceptron: definitions and algorithm in detail
    • proposition on perceptron convergence: only main idea (as in slide “Perceptron: Notes”)
    • linear regression: definitions, matrix form, derivation best predictor, use of generalized inverse in detail (derivation generalized inverse: not required)
    • logistic regression: definition, loss function, equivalence MLE solution and ERM solution in detail
  5. Bias-Complexity

    • No Free Lunch (NFL) theorem, NFL and priori knowledge: only main idea
    • approximation error + estimation error, complexity and error decomposition: all details

6. VC-dimension

  • Restrictions, shattering, VC-dimension: definitions in detail
  • Fundamental Theorems of Statistical Learning and bound on generalization: in detail

7. Model Selection and Validation

  • validation: main idea
  • validation for model selection: main idea
  • model-selection curve, train-validation-test split, k-fold cross validation: all details
  • what if learning fails: main idea
  • model selection with SRM: main idea

8. Regularization and Feature Selection

  • Regularized Loss Minimization, Tikhonov Regularization: all details
  • Ridge Regression, derivation of optimal solution: all details
  • Stability, stability rules do not overfit, Tikhonov Regularization as a stabilizer: main idea
  • Fitting-stability tradeoff definition and considerations: all details; guarantee results for the choice of lambda: only idea
  • l1 regularization, LASSO: all details
  • subset selection, forward selection, backward selection, without and with validation data: all details; pseudocode: main idea and structure required (details not required)

10. SVM

  • hard-SVM optimization problem and quadratic formulation: all details (no proof of equivalence between the two formulations)
  • soft-SVM optimization problem: all details
  • gradient descent (GD): all details; GD guarantees: main idea
  • stochastic gradient descent (SGD): main idea
  • SGD for solving soft-SVM (algorithm): all details
  • Hard-SVM dual formulation: main idea; final optimization problem: all details (no derivation required)
  • Definition of Kernel and commonly used kernels: all details
  • SVM for regression: all details only for optimization problem and support vectors definition

11. Neural Networks

  • Neuron, activation function, network architecture, point of view of one node, hypothesis set: all details
  • Expressiveness: main idea of each statement
  • Sample complexity, runtime of learning NNs: main idea
  • Forward propagation algorithm: all details
  • Backpropagation algorithm: main idea (pseudocode: only main structure)
  • Regularized NNs: main idea

PAC learning

Definition

An hypothesis class H is said to be PAC learnable if exists a function, called mH: (0,1)2→ℕ and a learning algorithm s.t. For every

  • δ ∈ (0,1) and ε ∈ (0,1)
  • D distribution over X
  • ℒ labeling function F: X→{0,1}

If the realizability assumption holds w.r.t H, D, F then, when running the learning algorithm on a number m ≥ mH(ε, δ) of i.i.d examples (generated according to D and labeled according to F), it produces an hypothesis h s.T.

LD,F(h) ≤ ε with probability ≥ 1-δ

corollary: every finite hypothesis class is PAC learnable with mH(ε, δ) ≤ ⌈log2|H|/δ⌉/ε

|H| < ∞

Realizability assumption is too strong in many applications. Given x, the label Y is obtained according to a conditional probability P[Y|X]

Bayes Optimal Predictor

Given a probability distribution D over X×{0,1} the best classifier is

FD(x) = { 1 if P[Y=1|X] ≥ 1/2 0 if otherwise }

Hoeffdings Inequality

Uniform Convergence | 6 November

Proof. [Corollary 4.6]

Fix ε, δ ∈ (0,1). We need to find the sample size m s.t. for any D, with prob ≥ 1 - δ over the choice of S = {(z1,..., zm) zi = (xi, yi) sampled i.i.d. From D} we have:

∀h ∈ H, | LS(h) - LD(h) | ≤ ε

That is:

  • Pm (S : ∀h ∈ H, | LS(h) - LD(h) | ≤ ε ) ≥ 1 - δ

Equivalently:

  • Pm (S : ∃h ∈ H, | LS(h) - LD(h) | > ε ) < δ

We have:

{ S : ∃h ∈ H, | LS(h) - LD(h) | > ε } =

= ⋃h ∈ H { S : | LS(h) - LD(h) | > ε } ≤

≤ ∑h ∈ H P { | LS(h) - LD(h) | > ε }   For union bound

Recall:

  • LD(h) = Ez→D [l(h,z)]
  • LS(h) = 1/mi=1m l(h,zi)

therefore: E [ LS(h)] = 1/m Σi=1m E [ l(h,zi)]

= LD(h)

linear regression

(x = ℝd, y = ℝ, squared-loss)

Hreg = Ld

ER Function: Mean Square Error

Ls(h) = 1/m mi=1 (h(xi) - yi)2

how do we find (good so using ERM) hypothesis? ⇒ Least Squares algorithm

Best hypothesis:

argminw Ls(h)

Since, 1/m is constant we can simplify it in the expression of Ls(h).

We now look for w minimizing the Residual Sum of Squares:

argminw mi=1 ( - yi)2

Let X be the design matrix

[ x1

...

xm ]

and y the labels vector

[ y1

...

ym ]

we rewrite the function to be minimized as

mi=1 ( - yi)2 = (Y - Xw)T (Y - Xw) ≜ RSS

→ arg min RSSw.

  1. ∂/∂w (RSS) = - 2XT (Y - Xw)
  2. ∂/∂w (RSS) ≟ 0 ⇔ -2XT (Y - Xw) ≟ 0 → W = (XT X)-1 XT y

assuming this invertible

otherwise: see generalized inverse

Dettagli
Publisher
A.A. 2018-2019
94 pagine
2 download
SSD Scienze matematiche e informatiche INF/01 Informatica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher beardsome di informazioni apprese con la frequenza delle lezioni di Machine learning e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Padova o del prof Vandin Fabio.