Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
HEIR
Pt CE
2C
Aw
1 i
Constr. problem s
CIR e
ci
e
w
a discesa
di
d direzione o.tn
te
Ies ttde.si
d be feasible
must w
only active constrains (these which satisfy the equalities) play a role to define a feasible direction
conditional
cit'd
Ff gradient
min
Kk 1 d d feasible
discesa method
conditions feasible
d FRANK wolfe
KKT conditions
where the are: È ain
dice
Lew few
l
few C
d Aw at
d
Lew few O stationary
ai ain d'IC
kilt Il
Il An
O 0
compliment
ci
X Aw 2C feasible
20
N.B. kktps.int
is
min kilt
solution point
local e is
min
convex
So for the hard-SVM problem: Daf w
R a
1
Df Mbf IR I f
Db 0
Dai f I
Inf Ibf Ibf
IR a
f D'f to
D S.d
p
2
In f
f because
f convex
O
Dfw
IR f
Ibf
3 it's quadratic
So in this problem we know that all local minimizers are global, but does exist at least one? Yes, thank to
Theorem: for the (primal) hard-SVM exists a unique solution
So for this problem, that has a global solution, the empirical risk is null!
È yilwtxii.to
di
what
L 1
X Il
d
b
EIR w ti P
Yi b
cit 1
2 e p
p È
di
di
E Yi
yi 0
L ci w pera
i
KK T p
L di yi
Db O
i i 3
di
420 i
Yi b
1 0
n
But let's look that the points that are not on the margin aren't playing any role in minimization: in machine learning, all the
support
samples that are not on the margin (we will see how understand which are) can be removed for optimalization (called
vectors, w).
because they don't influence the Duality property
First of all let's look what is a dual problem. Given a minimize problem, namely primal problem, as
AP
tQx
fcxi.fi c
min
t.xe.SE
5 xEIRn1AxEb4QE1Rn
n.QZ0 fcsnvex
min
CIR
A
we can define another problem, namely dual problem, that is
911
max
a deI
sit
and these have a strictly link, called gap duality (difference between the optimal values). Exist two importan theorems:
291,1
fix I
weak Duality 914
fix 1 1 LP
for
Duality sure
Strong
Thank to the duality theorems (strong and weak), we know that:
max min
i deI
EQCHIEfcxttiefcxittxe.si
ceci
So every feasible point of dual problem defines a lower bound of the primal problem.
Let's look how to find a dual problem through the lagrangian function and the KKT conditions:
È dica bi
Lex fax
l b X
fax
d Ax At
d fax O
x Via
bi
dica
KKT D
X in
AXE
420
The Wolfe Dual Problem is: LIX
max isn't
there
Ì ML d the complain
conditions
420
So we can see the following theorem: P convex D optional
nut
multipli ap
So the assumptions are: ES
V
fax x
f e d
L zo
x bi 1
aii 0
If
t 20 weak duality theorem:
Before to give the proof, we have to demonstrate the lineare consta
f
convex
Aiab
I I
Lei
fini a
Il
IL'I
520
Proof of the weak duality theorem: lx.it
izfcxitpflxi
fcx
f
convex
for Hps fol P
feasible
E It b
AI E
for
È feasible D
so it is I
lx b
zfcxnitpflxi AI
xizfcxixpflxi.CI E
f E I I'tax
I b
AI b
b
I
finiti AI
finito E
E Aiab
I I
pfci.ci
II
L Il AI y
II t'ix
Il 5
E Lei
Lei fini il
E
1 I
Lei
fix
So now look the proof of the first theorem: feasible for d
is
for assumptions
so look that the point is also an optimum o b
X fax
fix A Lexi
L Y
e
fa fax z
weak duality
Now let's look what is the dual problem of the hard-SVM problem: L b a
w
Max
11Wh
min L
Y L
b Db
p
I iii
Yi n p
altri
È wtxii.to
b
Lew e
di
null
a Yi p
p DBL
L di di Yi
Yi
w i i
But just look that p
È Xi
cui ai
will di
Lin 4
bi
b ai di
Il y
i
p
E
O
L di Yi
w i L È di
b a un
e i
È
L y
b p ÈÈ
112
E
11
un an
di y
y
y
i
So the dual problem of hard-SVM problem is: p
ÈÈ L ap
min di convex
an
y y
2 i
I linea
1
0
di Yi P boxconst
P
IR
E
So we start with the primal problem that is non-quadratic optimization and ended up to a quadratic function.
Soft-SVM Problem
We saw that, suppose that an hyperplane of separation exists, the hard-SVM dual problem is:
IO p
ÈÈ 2
min di
an
an
y
y
2 i
I
È 0
di Yi
p
IR
E
And thanks to a theorem, we have that: Dual
Hard sum
Primal
sum
Hard convex
quadratic
convex
quadratic 13 solution
17 solution KKT
with of
f is IP
3 point
feasible a
P
I
sol of
3 D I
DI fix
I o
DI fix I
if 0
Now for use it in real problems, we have to relax some constraints: first of all we will admit some errors and after we will use an
hyperplane that isn't linear.
Note a new notation for make it more easy: P
xd
i IRP
y d
Yi P
1
i e
K x j
Kij scalar product
also the decision function is: b
cit
fix sign
p i
yi
wt i scalar product
So the SVM are so important because they use the relative position of the points and not the absolute position (with the scalar
product). b)?
But if we solve the dual problem, how can we find the variable of the primal (w and
p io
yi
L w
o i computer
on
consta in nut ti
compie the
i b is org
in
1 y
b
i hand
I
b
yi o on
e w i
4 one
only
use
con
i vector
v i support
x so
Indeed the points that aren't on the margins for sure (for the complementariety of the KKT) have their own lagrangian multiplier
support vector.
null. Only some (not sure all) points on the margins have own multiplier positive and these are called
e
Calling with a vector of all ones, the hard-SVM primal is: eta
Ka
a
min O
4T a
O
So let's relax it: the points are no more linear separable for the presence of outliers. To do that we have to introduce the
measure (non negative slacks variable) of how far a point is from own area region (N.B. the limit of the regions are the margins)
classification
error
È
i feasible
not
still
but
no error
soft classification
The is: relaxed consta
yilwtxii.biz Gi c
e
20
i
to have a good model we have to penalize the mistakes, introducing a new term in objective function:
p
CI
what Gi
min
sum
soft margin quadratic
consta
convex
yilwtxii.biz Gi
primal i problem
20
i C
where the sum of the slacks variables is the upper bound of the classification error and is a constant that balcances it.
Gi
b
yilwtx.it 21 xii.to O
yilw
maxtl
Gi
zo
i
P p
CI
what Gi yilwtxii.to 0
maxfi
min il what
In
min i i
yilwtxii.to 0
gizmaxfi linear
soft sum
To use a non-linear separator we will use a transformation of the points to make them linearly separable and after come back
(as we did for the RBF). soft margin SVM's dual:
So now just look what is p
what Gi
C
il
min
b 4
v P di
i
Gi
yilwtxii.biz e
e
p P Zi
iii
20
i a
wtxii.to Zi Gi
i
w di
t 4 di
C
E
b G x
w i i
b
Lin t
Gia
max p
E
b a
g
a digiti
Leo ci i
p gia
Dbl o
D i 0 20
Ce
2
Ce
o z
GL a
zza
O
So in the soft SVM we add only an upper bound on the alpha: p
È yhynxntxnxr.tv
In
min di
2 i
p
D È D
di Yi P
EC i i
DEX
Optimum condition of SVM
We saw that the dual formulation of the soft-SVM linear problem is:
d'Ka eta
È
MI ytx
OOEXECek
Where: yiy.deitxJ P
fk.jlkij i
i j
Also we saw that the suppor vectors are these that have riso BOUNDED
2 O suppor veci
di i
Zico
C di EC
di zii o
c
a
20
ti
b
To calculate the value of we have to use only the real suppor vector:
ai yilwsttxii.ba
gf p
if o e yilw.tn sb
itb o
1
cC szi sgi
xi so oxi kernel:
For pass to a non linear separetor, we have to transform the input point in a new space, through a function called
spcxi
and the function of generalize will be: p b
sign.fafyiklxi.ir
fcxi
to have all the stuffs we need to solve the same dual problem, without the trasformation but changing Kiri
xie Ce
t.p.se Xy pia J
gia
x DAL Xy 6 O
Iq pt
d'Ka a
e
È
MI a
j O
e
Y
ytx.si ieri pia o
p
Ecc G ECC
OIL 020 1020
It can be three different cases: 14 20
quali
0 pi
D pinzo
J 20
pqiaiit.ly
D
20 Oi
Oi pi thy o
pala
D
O
G poi
So we can use a problem only with lambda, but first call:
Lia Cl
ti
o
X di
t.lk
i Vie Lcd
tipi 20
Dalai Vieille
Ponditions tipico
gia Vie Lcd
tipi villa
o
Dalai
P y
So consider these conditions in function of the value that can assume: i 1
X y
polca
Vie Lcd tipi Dalai te yi i
Itaca yi.tl
4 E Dolcetti
Vieille tipi Dalai
e 1,2 yi i
paca i
i
Viet Lcd tipi
Utica pala 4
So call: di
L zitta
iella Ellis i
4
i
4
a iella
L di
il Ellis
vice i
4
4
a Dolce min
max
Dolce L Utile
Vi E a
z ieztcaiuu.ca 4
4 i max
min
i
2 Utile ties
Vie
i a calorica
e y
4 i L.la
V i Utica
ai max
animale
Let's create new sets: fi C
f Oca
a Ila
ll u
L a
Rca a Ica
ll U
a
L
a a i
min
i
ma max
Xm Ed I
E la
ie 4
ieri µ
Note that the equality is f