OMML - Appunti corso

Appunti di Optimization Method for Machine Learning della prof.ssa Palagi, La Sapienza.

In questa raccolta sono presenti tutti gli appunti che ho preso durante le lezioni e sono utilissimi (per esercizi svolti e teoria non presente sulle dispense della prof.ssa) per superare il corso.
Consiglio comunque uno studio collegato con il materiale fornito dal docente.

Esame Optimization method for machine learning

Facoltà Ingegneria dell'informazione

Dal corso del Prof. Palagi Laura

Università Università degli Studi di Roma La Sapienza

Publisher andrea22x

A.A. 2018-2019

43 pagine

2 download

Appunto

Vota 3,0 / 5 (1)

Scarica

Estratto del documento

HEIR

Pt CE

1 i

Constr. problem s

CIR e

a discesa

d direzione o.tn

Ies ttde.si

d be feasible

must w

only active constrains (these which satisfy the equalities) play a role to define a feasible direction

conditional

cit'd

Ff gradient

min

Kk 1 d d feasible

discesa method

conditions feasible

d FRANK wolfe

KKT conditions

where the are: È ain

dice

Lew few

few C

d Aw at

Lew few O stationary

ai ain d'IC

kilt Il

Il An

O 0

compliment

X Aw 2C feasible

N.B. kktps.int

min kilt

solution point

local e is

min

convex

So for the hard-SVM problem: Daf w

R a

Df Mbf IR I f

Db 0

Dai f I

Inf Ibf Ibf

IR a

f D'f to

D S.d

In f

f because

f convex

O

Dfw

IR f

Ibf

3 it's quadratic

So in this problem we know that all local minimizers are global, but does exist at least one? Yes, thank to

Theorem: for the (primal) hard-SVM exists a unique solution

So for this problem, that has a global solution, the empirical risk is null!

È yilwtxii.to

what

L 1

X Il

EIR w ti P

Yi b

cit 1

2 e p

p È

E Yi

yi 0

L ci w pera

KK T p

L di yi

Db O

i i 3

420 i

Yi b

1 0

But let's look that the points that are not on the margin aren't playing any role in minimization: in machine learning, all the

support

samples that are not on the margin (we will see how understand which are) can be removed for optimalization (called

vectors, w).

because they don't influence the Duality property

First of all let's look what is a dual problem. Given a minimize problem, namely primal problem, as

AP

tQx

fcxi.fi c

min

t.xe.SE

5 xEIRn1AxEb4QE1Rn

n.QZ0 fcsnvex

min

CIR

A

we can define another problem, namely dual problem, that is

911

max

a deI

sit

and these have a strictly link, called gap duality (diﬀerence between the optimal values). Exist two importan theorems:

291,1

fix I

weak Duality 914

fix 1 1 LP

for

Duality sure

Strong

Thank to the duality theorems (strong and weak), we know that:

max min

i deI

EQCHIEfcxttiefcxittxe.si

ceci

So every feasible point of dual problem defines a lower bound of the primal problem.

Let's look how to find a dual problem through the lagrangian function and the KKT conditions:

È dica bi

Lex fax

l b X

fax

d Ax At

d fax O

x Via

dica

KKT D

X in

AXE

420

The Wolfe Dual Problem is: LIX

max isn't

there

Ì ML d the complain

conditions

420

So we can see the following theorem: P convex D optional

nut

multipli ap

So the assumptions are: ES

V

fax x

f e d

L zo

x bi 1

aii 0

t 20 weak duality theorem:

Before to give the proof, we have to demonstrate the lineare consta

convex

Aiab

I I

Lei

fini a

IL'I

520

Proof of the weak duality theorem: lx.it

izfcxitpflxi

fcx

convex

for Hps fol P

feasible

E It b

AI E

for

È feasible D

so it is I

lx b

zfcxnitpflxi AI

xizfcxixpflxi.CI E

f E I I'tax

I b

AI b

I

finiti AI

finito E

E Aiab

I I

pfci.ci

II

L Il AI y

II t'ix

Il 5

E Lei

Lei fini il

E

1 I

Lei

fix

So now look the proof of the first theorem: feasible for d

for assumptions

so look that the point is also an optimum o b

X fax

fix A Lexi

L Y

fa fax z

weak duality

Now let's look what is the dual problem of the hard-SVM problem: L b a

Max

11Wh

min L

Y L

b Db

I iii

Yi n p

altri

È wtxii.to

Lew e

null

a Yi p

p DBL

L di di Yi

w i i

But just look that p

È Xi

cui ai

will di

Lin 4

b ai di

Il y

E

O

L di Yi

w i L È di

b a un

e i

L y

b p ÈÈ

112

E

un an

di y

So the dual problem of hard-SVM problem is: p

ÈÈ L ap

min di convex

y y

2 i

I linea

di Yi P boxconst

P

IR

E

So we start with the primal problem that is non-quadratic optimization and ended up to a quadratic function.

Soft-SVM Problem

We saw that, suppose that an hyperplane of separation exists, the hard-SVM dual problem is:

IO p

ÈÈ 2

min di

2 i

I

È 0

di Yi

IR

E

And thanks to a theorem, we have that: Dual

Hard sum

Primal

sum

Hard convex

quadratic

convex

quadratic 13 solution

17 solution KKT

with of

f is IP

3 point

feasible a

P

I

sol of

3 D I

DI fix

I o

DI fix I

if 0

Now for use it in real problems, we have to relax some constraints: first of all we will admit some errors and after we will use an

hyperplane that isn't linear.

Note a new notation for make it more easy: P

i IRP

y d

Yi P

i e

K x j

Kij scalar product

also the decision function is: b

cit

fix sign

p i

wt i scalar product

So the SVM are so important because they use the relative position of the points and not the absolute position (with the scalar

product). b)?

But if we solve the dual problem, how can we find the variable of the primal (w and

p io

L w

o i computer

consta in nut ti

compie the

i b is org

1 y

i hand

I

yi o on

e w i

4 one

only

use

con

i vector

v i support

x so

Indeed the points that aren't on the margins for sure (for the complementariety of the KKT) have their own lagrangian multiplier

support vector.

null. Only some (not sure all) points on the margins have own multiplier positive and these are called

Calling with a vector of all ones, the hard-SVM primal is: eta

min O

4T a

O

So let's relax it: the points are no more linear separable for the presence of outliers. To do that we have to introduce the

measure (non negative slacks variable) of how far a point is from own area region (N.B. the limit of the regions are the margins)

classification

error

i feasible

not

still

but

no error

soft classification

The is: relaxed consta

yilwtxii.biz Gi c

to have a good model we have to penalize the mistakes, introducing a new term in objective function:

CI

what Gi

min

sum

soft margin quadratic

consta

convex

yilwtxii.biz Gi

primal i problem

i C

where the sum of the slacks variables is the upper bound of the classification error and is a constant that balcances it.

yilwtx.it 21 xii.to O

yilw

maxtl

P p

CI

what Gi yilwtxii.to 0

maxfi

min il what

min i i

yilwtxii.to 0

gizmaxfi linear

soft sum

To use a non-linear separator we will use a transformation of the points to make them linearly separable and after come back

(as we did for the RBF). soft margin SVM's dual:

So now just look what is p

what Gi

C

min

b 4

v P di

yilwtxii.biz e

p P Zi

iii

i a

wtxii.to Zi Gi

w di

t 4 di

C

E

b G x

w i i

Lin t

Gia

max p

E

b a

a digiti

Leo ci i

p gia

Dbl o

D i 0 20

o z

GL a

zza

O

So in the soft SVM we add only an upper bound on the alpha: p

È yhynxntxnxr.tv

min di

2 i

D È D

di Yi P

EC i i

DEX

Optimum condition of SVM

We saw that the dual formulation of the soft-SVM linear problem is:

d'Ka eta

MI ytx

OOEXECek

Where: yiy.deitxJ P

fk.jlkij i

i j

Also we saw that the suppor vectors are these that have riso BOUNDED

2 O suppor veci

di i

Zico

C di EC

di zii o

To calculate the value of we have to use only the real suppor vector:

ai yilwsttxii.ba

gf p

if o e yilw.tn sb

itb o

cC szi sgi

xi so oxi kernel:

For pass to a non linear separetor, we have to transform the input point in a new space, through a function called

spcxi

and the function of generalize will be: p b

sign.fafyiklxi.ir

fcxi

to have all the stuﬀs we need to solve the same dual problem, without the trasformation but changing Kiri

xie Ce

t.p.se Xy pia J

gia

x DAL Xy 6 O

Iq pt

d'Ka a

MI a

j O

Y

ytx.si ieri pia o

Ecc G ECC

OIL 020 1020

It can be three diﬀerent cases: 14 20

quali

0 pi

D pinzo

J 20

pqiaiit.ly

D

20 Oi

Oi pi thy o

pala

D

O

G poi

So we can use a problem only with lambda, but first call:

Lia Cl

X di

t.lk

i Vie Lcd

tipi 20

Dalai Vieille

Ponditions tipico

gia Vie Lcd

tipi villa

Dalai

P y

So consider these conditions in function of the value that can assume: i 1

X y

polca

Vie Lcd tipi Dalai te yi i

Itaca yi.tl

4 E Dolcetti

Vieille tipi Dalai

e 1,2 yi i

paca i

Viet Lcd tipi

Utica pala 4

So call: di

L zitta

iella Ellis i

a iella

L di

il Ellis

vice i

a Dolce min

max

Dolce L Utile

Vi E a

z ieztcaiuu.ca 4

4 i max

min

2 Utile ties

Vie

i a calorica

e y

4 i L.la

V i Utica

ai max

animale

Let's create new sets: fi C

f Oca

a Ila

ll u

L a

Rca a Ica

ll U

L

a a i

min

ma max

Xm Ed I

E la

ie 4

ieri µ

Note that the equality is f

Anteprima

Vedrai una selezione di 10 pagine su 43