Machine Learning - Appunti completi

Esame Machine learning

Facoltà Ingegneria

Dal corso del Prof. Vandin Fabio

Università Università degli Studi di Padova

Appunto

4,0 / 5 (2)

Scarica

Questo documento contiiene gli appunti relativi al corso di Machine Learning appartenente al manifesto degli esami della laurea magistrale in Ingegneria Informatica, presso l' Università degli Studi di Padova.

Questo file contiene appunti riscritti e rivisti presi a lezione. E' consigliato abbinare questo documento alle slides del corso e possibilmente al libro di testo di riferimento "Understanding Machine Learning - From theory to algorithms".

Le prime 3 pagine del documento contengono il programma del corso e quindi il contenuto di questi appunti.

Buono studio e in bocca al lupo!

Good luck for the exam!

…continua

Anteprima

Vedrai una selezione di 10 pagine su 94

Machine Learning - Appunti completi Pag. 1

Machine Learning - Appunti completi Pag. 2

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 6

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 11

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 16

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 21

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 26

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 31

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 36

Anteprima di 10 pagg. su 94.
Scarica il documento per vederlo tutto.

Scarica

Machine Learning - Appunti completi Pag. 41

Disdici quando
vuoi

Acquista con carta
o PayPal

Scarica i documenti
tutte le volte che vuoi

Estratto del documento

MACHINE LEARNING 2018/2019

DETAILED PROGRAM

Prof. F. Vandin

Last Update: January 20^th, 2019

This document describes, for each topic, the level of detail required for the material presented during the lectures (see the slides). The level of detail relates to what has been presented during the lectures, so all details means that all details presented during the lectures (see the slides) is required while main idea means that only the understanding of the concepts presented is required, while the details (e.g., details of propositions, proofs, formulas) are not required. Note that the background material (probability, linear algebra) is assumed to be known at the level of detail used during the presentation of the topics below. For some propositions the corresponding proposition in the book is used for clarity.

Learning Model
- All details presented in class required, including: definitions, propositions’ statements, proof of Corollary 2.3 [UML], definition of agnostic PAC learnability for general loss functions
Uniform Convergence
- Lemma 4.2 [UML]: statement and proof with all details;
- Definition of uniform convergence property: main idea (details of definition not required)
- Corollary 4.6 [UML]: only main idea and bound on the number of samples
Basics of Statistics
- definition confidence interval, definition rejection region: details
- hypothesis testing: main idea; hypothesis testing rejection rule in detail
- everything else: main idea
Linear Models
- linear predictors/models: definitions with all details
- linear classification, perceptron: definitions and algorithm in detail
- proposition on perceptron convergence: only main idea (as in slide “Perceptron: Notes”)
- linear regression: definitions, matrix form, derivation best predictor, use of generalized inverse in detail (derivation generalized inverse: not required)
- logistic regression: definition, loss function, equivalence MLE solution and ERM solution in detail
Bias-Complexity
- No Free Lunch (NFL) theorem, NFL and priori knowledge: only main idea
- approximation error + estimation error, complexity and error decomposition: all details

6. VC-dimension

Restrictions, shattering, VC-dimension: definitions in detail
Fundamental Theorems of Statistical Learning and bound on generalization: in detail

7. Model Selection and Validation

validation: main idea
validation for model selection: main idea
model-selection curve, train-validation-test split, k-fold cross validation: all details
what if learning fails: main idea
model selection with SRM: main idea

8. Regularization and Feature Selection

Regularized Loss Minimization, Tikhonov Regularization: all details
Ridge Regression, derivation of optimal solution: all details
Stability, stability rules do not overfit, Tikhonov Regularization as a stabilizer: main idea
Fitting-stability tradeoff definition and considerations: all details; guarantee results for the choice of lambda: only idea
l₁ regularization, LASSO: all details
subset selection, forward selection, backward selection, without and with validation data: all details; pseudocode: main idea and structure required (details not required)

10. SVM

hard-SVM optimization problem and quadratic formulation: all details (no proof of equivalence between the two formulations)
soft-SVM optimization problem: all details
gradient descent (GD): all details; GD guarantees: main idea
stochastic gradient descent (SGD): main idea
SGD for solving soft-SVM (algorithm): all details
Hard-SVM dual formulation: main idea; final optimization problem: all details (no derivation required)
Definition of Kernel and commonly used kernels: all details
SVM for regression: all details only for optimization problem and support vectors definition

11. Neural Networks

Neuron, activation function, network architecture, point of view of one node, hypothesis set: all details
Expressiveness: main idea of each statement
Sample complexity, runtime of learning NNs: main idea
Forward propagation algorithm: all details
Backpropagation algorithm: main idea (pseudocode: only main structure)
Regularized NNs: main idea

PAC learning

Definition

An hypothesis class H is said to be PAC learnable if exists a function, called m_H: (0,1)²→ℕ and a learning algorithm s.t. For every

δ ∈ (0,1) and ε ∈ (0,1)
D distribution over X
ℒ labeling function F: X→{0,1}

If the realizability assumption holds w.r.t H, D, F then, when running the learning algorithm on a number m ≥ m_H(ε, δ) of i.i.d examples (generated according to D and labeled according to F), it produces an hypothesis h s.T.

L_D,F(h) ≤ ε with probability ≥ 1-δ

corollary: every finite hypothesis class is PAC learnable with m_H(ε, δ) ≤ ⌈log₂|H|/δ⌉/ε

|H| < ∞

Realizability assumption is too strong in many applications. Given x, the label Y is obtained according to a conditional probability P[Y|X]

Bayes Optimal Predictor

Given a probability distribution D over X×{0,1} the best classifier is

F_D(x) = { 1 if P[Y=1|X] ≥ 1/2 0 if otherwise }

Hoeffdings Inequality

Uniform Convergence | 6 November

Proof. [Corollary 4.6]

Fix ε, δ ∈ (0,1). We need to find the sample size m s.t. for any D, with prob ≥ 1 - δ over the choice of S = {(z₁,..., z_m) z_i = (x_i, y_i) sampled i.i.d. From D} we have:

∀h ∈ H, | L_S(h) - L_D(h) | ≤ ε

That is:

P^m (S : ∀h ∈ H, | L_S(h) - L_D(h) | ≤ ε ) ≥ 1 - δ

Equivalently:

P^m (S : ∃h ∈ H, | L_S(h) - L_D(h) | > ε ) < δ

We have:

{ S : ∃h ∈ H, | L_S(h) - L_D(h) | > ε } =

= ⋃_{h ∈ H} { S : | L_S(h) - L_D(h) | > ε } ≤

≤ ∑_{h ∈ H} P { | L_S(h) - L_D(h) | > ε } For union bound

Recall:

L_D(h) = E_z→D [l(h,z)]
L_S(h) = ¹/_m ∑_i=1^m l(h,z_i)

therefore: E [ L_S(h)] = ¹/_m Σ_i=1^m E [ l(h,z_i)]

= L_D(h)

linear regression

_{(x = ℝ^d, y = ℝ, squared-loss)}

H_reg = L_d

ER Function: Mean Square Error

L_s(h) = ¹/_m ^m∑_i=1 (h(x_i) - y_i)²

how do we find (good so using ERM) hypothesis? ⇒ Least Squares algorithm

Best hypothesis:

argmin_w L_s(h)

Since, ¹/_m is constant we can simplify it in the expression of L_s(h).

We now look for w minimizing the Residual Sum of Squares:

argmin_w ^m∑_i=1 ( - y_i)²

Let X be the design matrix

[ x₁

...

x_m ]

and y the labels vector

[ y₁

...

y_m ]

we rewrite the function to be minimized as

^m∑_i=1 ( - y_i)² = (Y - Xw)^T (Y - Xw) ≜ RSS

→ arg min RSS_w.

∂/∂_w (RSS) = - 2X^T (Y - Xw)
∂/∂_w (RSS) ≟ 0 ⇔ -2X^T (Y - Xw) ≟ 0 → W = (X^T X)^-1 X^T y

_{assuming this invertible}

_{otherwise: see generalized inverse}

Dettagli

Publisher

beardsome

A.A. 2018-2019

94 pagine

2 download

SSD Scienze matematiche e informatiche INF/01 Informatica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher beardsome di informazioni apprese con la frequenza delle lezioni di Machine learning e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Padova o del prof Vandin Fabio.

Appunti correlati

Machine Learning

Machine Learning Premium

Appunti esame

3,5 / 5

Machine Learning

Machine Learning Premium

Appunto

Machine learning

Machine learning Premium

Appunti esame

4,0 / 5

Machine Learning

Machine Learning Premium

Appunto

5,0 / 5

Recensioni

4/5

2 recensioni

5 stelle

4 stelle

3 stelle

2 stelle

1 stella

1

0

1

0

0

Ti è piaciuto questo appunto?

Mariamarazza

17 Giugno 2024

Maxxi88

5 Novembre 2022

Tutor AI

Ciao! Sono il tuo Tutor AI, il compagno ideale per uno studio interattivo. Utilizzo il metodo maieutico per affinare il tuo ragionamento e la comprensione. Insieme possiamo:

Risolvere un problema di matematica
Riassumere un testo
Tradurre una frase
E molto altro ancora...

Cosa vuoi imparare oggi?

Il Tutor AI di Skuola.net usa un modello AI di Chat GPT.
Per termini, condizioni e privacy, visita la relativa pagina.