Che materia stai cercando?

Advanced Control Techniques

Le seguenti dispense vogliono essere un resoconto didattico del corso di Advanced Control Techniques, il quale docente `e il prof. Giuseppe Notarstefano, presso il CdL in Ingegneria Informatica at Unisalento. Il programma copre tre macro-argomenti: la Stabilit`a dei Sistemi Non Lineari, i Sistemi Distribuiti ed Algoritmi di Ottimizzazione (distribuiti), piu` un piccolo capitolo circa delle Esercitazioni.... Vedi di più

Esame di Advanced Control Techniques docente Prof. G. Notarstefano

Anteprima

ESTRATTO DOCUMENTO

The sys. is already linear. Find if the sys. is I/O linearizable. ẏ = ẋ = x ÿ = ẋ =

1 1 1

∧ ⇐⇒

x (. . . ) THE SYS IS NOT I/O LINEARIZABLE.

1

x 0

1

1 0 1 0

ẏ = L h(x) + L h(x)u = + u

g

f x 1

2

2

ÿ = L h(x) + (L L h(x) = 0)u

g f

f

if ∀i (ATYPIC CASE).

=⇒ So L L h(x) = 0

g

exercise 2 

ẋ = x

1 1

ẋ = x + u

2 2

y = x = h(x)

 2

=⇒

ẋ x 0

1 1

= ( = f (x)) + ( = g(x))u

ẋ x 1

2 2

x 1

6

0 1 0 1

ẏ = + ( = 0)u = x + u

2

x 2

⇐⇒

=⇒ ρ = 1 THE SYS. IS I/O LINEARIZABLE BUT IT IS NOT FEEDBACK

6

LINEARIZABLE BECAUSE ρ = 1 AND THE STATE DIMENSION IS n = 2 =⇒ ρ = n.

Check if sys. is minimum phase: ˙

η̇ = η ZERO DYNAMICS (ξ = 0). (Non c’è ξ da mettere a 0). ξ = ξ + u. The system is

⇐⇒ ∃! | <λ

NON-MINIMUM PHASE, so we can’t use the ξ = 0 to a.s. the η dynamics λ >

η η

0. ∂φ(x)

NB: φ(x), g(x) = 0.

∂x

∂φ(x)

0

h i

∂φ(x)

∂φ(x) = 0 =⇒ = 0 =⇒

∂x ∂x 1 ∂x

2

=⇒ φ(x) = x , ma x := ξ, quindi η := x .

1 2 1

exercise 3

• (i) 

ẋ = x

1 2

 21

−x −

ẋ = + (1 x )x + u

2 1 2

y = x

 1

– (ρ = 1) ẏ = ẋ = x ;

1 2 21

−x −

– (ρ = 2) ÿ = ẋ = + (1 x )x + u;

2 1 2 ⇐⇒

=⇒ I/O LINEARIZABLE with ρ = n = 2 AND FEEDBACK LINEARIZABLE

2

ON .

R 21

− −

u = x (1 x )x + w

1 2

59

• 21

⇐⇒ −x −

(ii) y = x , (ρ = 1) ẏ = ẋ = + (1 x )x + u =⇒ I/O LINEARIZABLE

2 2 1 2

2

ON , ρ = 1 < n = 2 =⇒ not FEEDBACK LINEARIZABLE. Let’s check! Put the

R

sys. in NORMAL FORM:

φ(x)

ξ = x =⇒ T (x) = . Let’s find it:

2 h(x)

∂φ(x) 0

i

h ∂φ(x)

∂φ(x)

g(x) = 0 =⇒ ∂x ∂x 1

∂x 1 2

η = φ(x) = x (NOT CONSTANT) , BECAUSE we are changing coordinates =⇒

1 (

η̇ = ξ

˙

ξ = w

NORMAL FORM. Is the sys. min. phase? (ξ = 0) =⇒ η = 0 =⇒ Not A.S. but surely

LOCALLY STABLE, however it is not MINIMUM PHASE.

12 2 ≤ ⇐⇒

Dim V (η) = (η ) =⇒ V̇ = η(0) = 0 0 NEGATIVE SEMI-DEFINITE. Moreover

the Krasowskii’s criteria doesn’t success.

exercise 4  23

2 + x

−x u

ẋ = +

 1 1 23 23

 

1 + x 2+x

    

−x

 1 1

 2

1+x

 3

ẋ = x ẋ x u

=⇒ = + 

2 3 0

2 3

    

ẋ x x

ẋ = x x + u 1

3 1 3

 3 1 3

y = x

 2 23 

 2+x

 

−x 1 2

1+x

3

0 1 0 x 0 1 0

ẏ = u = x

+ 

 0

3 3

  

x x 1

1 3 23 

 2+x

 

−x

1 2

1+x 3

0 0 1 x 0 0 1

ÿ = + u = x x + u

 0

3 1 3

  

x x 1

1 3

So, ρ = 2 < n = 3 =⇒ INPUT/OUTPUT LINEARIZABLE. OSS: Se trovo un’uscita che

verifica ρ = n allora è FL, ma se ρ < n non posso dirlo a priori, quella potrebbe non essere

l’uscita giusta!

x 2 ∈

ξ := , η := φ(x) R.

x 3 23

 

2+x 2

∂φ(x) 1+x

h i 3

∂φ(x) ∂φ(x) ∂φ(x)

g(x) = 0 =⇒ = 0 =⇒

 

0

∂x ∂x ∂x

∂x  

1 2 3 1

23

2 + x

∂φ(x) ∂φ(x) ⇐⇒ −x

=⇒ ( )+ = 0 φ(x) = + x + arctan(x )

1 3 3

2

∂x ∂x

1 + x

1 3

3 60

(PARTIALLY DIFF. EQ.). Potevamo scegliere φ(x) = x ma non avremmo ottenuto un set

2

di coordinate indipendenti.

[If we won’t solve the equation above there is a method to find if the sys. is MINIMUM

PHASE]. −x

ÿ = x x + u =⇒ u = x + w

1 3 1 3

? 3 3

{x ∈ | {x ∈ |

= (D = ) (h(x) = y) = (L h(x) = ẏ) = 0} = x = x = 0}

Z R R

0 2 3

f −x = 0 =⇒

Per verificare la ZERO DINAMICA (w = 0 =⇒ u = x ) but u = [−x x ]| ?

1 3 1 3 x∈Z

u = 0. −x

So the ZERO DYNAMICS IN THE ORIGINAL COORDINATES IS: ẋ = . Siste-

1 1

−1,

ma lineare con autovalore λ = quindi x = 0 is an A.S. equilibrium thus the system is

1

MINIMUM PHASE. 1 21 21

∧ −x ⇐⇒

Dimostrazione. V (x) = x V̇ (x) = x (−x ) = < 0 x = 0 A.S.

1 1 1

2

exercise 5 

ẋ = x + x

1 2 3

 2 2

−x −

ẋ = + x (x + x 4) + u

 2 1 2 2 3

2 23

−x −

ẋ = + x (x + x 4) + u

3 1 3 2

y = x

 1  

 

x 0

1

x 0

Let’s study the internal stability of the equilibrium . x = 0 is an equilibrium.

=

2

 

 

0

x 3

1 2 2 3 22

(x + x + x ) V̇ (x) = x (x + x ) + x (−x + x (x +

The system is time-invariant. V (x) = 1 2 3 2 1 2

1 2 3

2

22 23 22 23 22 23

23 − − − −

4)) + x (−x + x (x + x 4)) = x (x + x ) x (x + x ) + (x + x )(x + x 4).

x 3 1 3 1 2 3 1 2 3

22 23 22 23

Troviamo un intorno dell’origine tale che V̇ (x) < 0. x + x 4 < 0 =⇒ x + x < 4 =⇒

V̇ (x) NEGATIVE SEMI-DEFINITE ON 2) (poiché non compare x ) =⇒ The equilibrium

B̊(0, 1

x = 0 is locally stable. Look at Lasalle’s principle for check the possibile a.s. for x = 0.

{x ∈ | {x ∈ |

S = D V̇ (x) = 0} = D x = x = 0}

2 3

22 23

Siamo nella palla di raggio 2, 2), quindi i punti x + x = 4 non ci interessano. Le

B̊(0,

traiettoria che partono in S e che rimangono in S, sono:

x (t) := 0 =⇒ ẋ (t) := 0

2 2

x (t) := 0 =⇒ ẋ (t) := 0

 3 3

ẋ = 0

1

 −x

0 =

 1

 −x

0 =

 1 →

=⇒ x = 0 =⇒ thus for Krasowskii’s corollary, x = 0 is a.s. (locally) se non partissi

1 →

da x = 0 anche se x = x = 0, allora uscirei da S. x = x = 0 retta x . Se parto da

1 2 3 2 3 1

x = 0 allora resto in S altrimenti la traiettoria esce da S.

1 Check if the sys. is I/O LINEARIZABLE:

ẏ = ẋ = x + x

1 2 3

61 22 23

−2x −

ÿ = ẋ + ẋ = + (x + x )(x + x 4) + 2u

2 3 1 2 3

3 ⇐⇒

=⇒ ρ = 2 =⇒ I/O LINEARIZABLE ON . There is no way to set γ(x) = 0 γ(x) = 2.

R

x 1 ∈

ξ = , η := φ(x) R

x + x

2 3

Impose: ∂φ(x) g(x) = 0, φ(0) = 0 =⇒

∂x  

0

h i

∂φ(x) ∂φ(x) ∂φ(x) 1

=⇒ =⇒

 

∂x ∂x ∂x

1 2 3 1

∂φ(x)

∂φ(x) + =0

=⇒ ∂x ∂x

2 3

If φ(x) = x is ok but ξ = x , so it is not a good change of coordinates. φ(x) must be

1 1

function of x and x .

2 3

φ(x) = x x . It is a good change of coordinates because it is independent from x and

2 3 1

x + x . Can’t choose in fact x + x because the change of coordinates will be not independent.

2 3 3 2

NORMAL FORM

˙

( ˙

ξ = ξ

ξ ξ

1 2 1 1

=⇒ = A + B (. . . )

c c

˙

˙ ξ

2 2 ξ

−2x −

ξ = + (x + x )(x + x 4) + 2u 2

2

2 1 2 3 2 3

with:

0

0 1 ∧ B =

A = c

c 1

0 0

1 22 23 22 23

−(x −4)+w). −x − −x −4).

u = (2x +x )(x +x η = x =⇒ η̇ = ẋ ẋ = (x )(x +x

1 2 3 2 3 2 3 2 3

2 −η)

(η+ξ ) (ξ

2 2

{η − ∧ } {x ∧ }

= x x ξ = x + x =⇒ = x = =⇒

2 3 2 2 3 2 3

2 2

2

2 (ξ η)

(η + ξ ) 2

2 −

+ 4]

=⇒ η̇ = η[ 4 4

ZERO DYNAMICS (ξ = 0)

2 2 2

2

η η η

− −

η̇ = η[ + 4] = η[ 4]

4 4 2 p

|η|

η = 0 is eq. of ZERO DYNAMICS. I can find a Lyap. function as if < 2 (2) =⇒ the eq.

is (locally) a.s., so the sys is MINIMUM PHASE. 2 2

1 η η

2 2

− −

V (η) = η =⇒ V̇ (η) = η(η[ 4]) = η [ 4]

2 2 2

√ √ √

−2 ∃ |

If 2 < η < 2 2 =⇒ V̇ (η) < 0 =⇒ 2 2) V̇ (η) < 0 on this ball.

B̊(0,

62

Capitolo 3

Distributed Systems

3.1 Starting Point

3.1.1 SCENARIO

Nuovo paradigma dei sistemi di controllo. Tecnologia pervasiva. Dispositivi diffusi in tutto

l’ambiente, che dispongono di parte fisica (Sensing, dinamica Robot mobile, generatore d’e-

nergia), che ne descrive l’evoluzione delle variabili fisiche e parte cyber (capacità di calcolo,

memoria e comunicazione). Unico sistema dinamico. Comunicazione tra i sistemi. Inizio anno

2000 2003. Nuova teoria del controllo, dell’ottimizzazione. Ciascun sistema ha la sua legge

di controllo. Non può conoscere tutti gli stati degli altri sottosistemi. Solo con alcuni vicini.

Risolvere problemi di ottimizzazione distribuita. Comunicazione locale.

New class of systems. A new realistic framework. New mathematical framework. We’ll

consider DISTRIBUTED CONTROL SYSTEMS (MULTI-AGENT SYSTEMS CYBER PHY-

∈ ∈

SICAL NETWORKS) and other similar notation. We have: N systems, each one (i

N N)

has its own ”dynamics”, and we’ll use this kind of notation:

( [i] [i] [i]

ẋ (t) = f (x (t), u (t))

i

[i] [i] [i]

x (t + 1) = h (x (t), u (t))

i

The dynamic could be for example a RANSAC computation. Discrete algorithm. Each

dynamic depends only on the state of the system. Could be some Transition function to

{-)

model an algorithm. Other assumptions: each system has some LOCAL MEMORY; -)

LOCAL COMPUTATION CAPABILITY; -) LOCAL COMMNICATION CAPABILITY}; the

first two are quite immediate. Each system has a memory and a processor. Possibility to com-

municate. Local, why? This system will be able to communicate just with a subset of all the

systems. Too many messages to exchange. Limit them. This number, n will be a big

N

⇐⇒ ∨ →

number (n 1) (n +∞) from a mathematical point of view. Doesn’t make sense to

allow global communication capability. Number of neighbors (agent I can communicate with);

One-to-one communication. The typical scheme we well have for system i is: SYSTEM i

{CY ⊇ ⊇

BER LAY ER (M emory, computation, communication), P HY SICAL LAY ER

(Sensing attuation, Dynamic of Robot, quadrotor, etc)};

System with some dynamic modeled by ODE, FDT. But we are missing the model of com-

munication. How can we model communication, interation among the systems? How can

we model this from a mathematical point of view? Binary matrices, graphs. A graphs is a

{COMMUNICATION GRAPH} tuple. First element is a set of elements called vertices. Edges

{1, }.

=⇒ pair of vertices. Let’s formally introduce it: G = (I, E); I = . . . , N We’ll associate

{set ⊂ × {pair

vertices to identify. System identifiers. E = of edges} I I = of vertices}. Say

63

that a node i can communicate with node j. Some notion of graph. Analysis and design. We’ll

combine graph’s theory and algebraic elements associated with graphs for developing distribu-

ted control laws. Il generico elemento di E dice se prendendo due vertici vi è un collegamento

tra di loro. Combineremo la teoria dei grafi con la teoria delle matrici e la teoria dei sistemi.

{1, } ∈

I = . . . , N identifiers of our systems. E tells us =⇒ (i, j) E means that i sends

information to j. Directed graph. The communications are not generally bidirectional. Try to

recall graph’s teory. DIRECTED GRAPH AS DIGRAPH. Digraph G = (I, E) is a set of N

{vertices ⊂ ×

elements called and EDGES}, E I I a set of pairs of vertices, called EDGES.

∀(i, ∈ ⇐⇒ ∈

UNDIRECTED GRAPH. A graph G is Undirected if j) E (j, i) E. Simply

if I have an edge (i, j) we have also the (j, i) one. Equivalently E is a set of unordered pair of

vertices. An usual way to represent an undirected graph is the following: by using an unarrowed

arc instead of an arc with the arrow in a way to represent the verse in which the communication

is valid, this one used in directed graph. Clearly this corresponds to a digraph in which there

are bidirectional communications for the edges. Let’s introduce some notions:

Notion of IN-NEIGHBOR and OUT-NEIGHBOR. If I’m a node, an IN-NEIGHBOR is

another node such that it can sends information to me. Given an edge (i, j) E, we say that

IN

i is an IN-NEIGHBOR of j and j is an OUT-NEIGHBOR of i. We’ll denote N the set of

i

IN-NEIGHBORS of i =⇒ it means that: IN

∀j ∈ ∃(j, ∈

N i) E

i

OU T

Similarly we’ll denote N the set of OUT-NEIGHBORS of i =⇒ it means that:

i OU T

∀j ∈ ∃(i, ∈

N j) E

i

Given these sets,

( IN IN −

d = N = ”IN DEGREE” = #{neighbors they can communicate with me}

i i

OU T OU T −

d = N = ”OU T DEGREE” = #{neighbors I can communicate with}

i i ∈

For an undirected Graph, we say that j is a neighbor of i if (i, j) E (We don’t need to specify

∨ |N |

(i, j) (j, i)). N set of neighbors and [d = degree of node i = #{neighbors}];

i i i

Regarding (in-out/degrees, a digraph is said BALANCED if

IN OU T

∀i ∀i ∈ {1, }

d = d , . . . , N

i i

Some special digraphs. An undirected graph is a balanced one. A directed CYCLE is a balanced

IN OU T

{d ∀i}.

graph. For example: = d = 1, Teorie dei grafi ormai assodate. Let’s introduce

i i

another notion of DIRECTED PATH.

A directed path is a sequence of vertices such that any two consecutive vertices are an edge

of the graph. Connectivity. STRONG CONNECTIVITY. If we have a distributed algorithm,

the informations have to propagate as well ass possible.

Definition 34. STRONG CONNECTIVITY

A digraph is strongly connected if

∀(i, ∈ ∃

j) E a directed path from i to j. GLOBALLY REACHABLE VERTEX. A vertex

is GR if there is a directed path from any other vertex to that one. A graph is STRONGLY

∀ ∈

CONNECTED if vertex i I is GLOBALLY REACHABLE.

In other communities, graphs are used to model a problem (Optimization problem, outing

problem). Here we’re using graph to model communications. It doesn’t make sense to have a

fixed, static graph. 64

Definition 35. TIME-DEPENDENT GRAPHS

7→ ∈

A time-dependent graph is a mapping G G(t) = (I, E(t)), t (integer variable) such that

Z

∀t.

G(t) is a digraph,

We can allow different scheme of communications. Some as. We need less. Not for all

t, G(t) to be strongly connected.

Definition 36. JOINT STRONG CONNECTIVITY

A TD digraph is joint strongly connected if +∞

∈ ∪

(∀t G(τ )

Z), τ =t

is strongly connected. It can happen that some edges just appears RARELY in time. PERSI-

STENZA di ATTIVAZIONE degli EDGES per la JSC.

Definition 37. UNIFORM JOINT STRONG CONNECTIVITY (PERIODIC)

tT

∃T ∈ | ∀t(. ∪

A digraph G is UJSC if . . ) G(τ ) is strongly connected. Stronger

N τ =(t−1)T +1

notion. I just need to connect. At some point I get strongly connected graph.

What’s the difference with the other notion? Nel caso non-uniform di volta in volta dobbiamo

aspettare un intervallo più grande, ad esempio. Es. se un lost edges (fluttuante) appare per

t = 1, 10, 100, 1000, . . . . Nel caso uniforme invece dovrebbe apparire per intervalli temporali

equidistanti: t = 1, 10, 20, 30, . . . . {CONNECTIVITY,

For an UNDIRECTED graph, we have: JOINT CONNECTIVITY,

UNIFORM JOINT CONNECTIVITY}.

3.1.2 Proprietà e definizioni dei grafi

Matrici collegate ai grafi.

MATRICES ASSOCIATED TO GRAPHS

Let’s start talking about a WEIGHTED DIGRAPH, is basically a tuple, G = (I, E, A),

n×n

where (I, E) is a DIGRAPH (I set of vertices, E set of edges) and A is a matrix that

R

we call WEIGHTED ADJACENCY MATRIX (WAM) with the generic element

| ⇐⇒ ∈ ∧ ⇐⇒ ∈

a a > 0 (i, j) E a = 0 (i, j) / E

ij ij ij

It means that we are basically considering a digraph in which we have WEIGHTS for each edge.

Our matrix A could be for example:  

0 a 0

12

0 0 a

A = 23

 

a a 0

31 32

⇐⇒ ∈ ∀k ∈

(diagonal has all 0 elements! (k, k) / E I). This is the matrix A associated

with our graph. This weighted matrix gives us also the structure of the graph! REVERSE

ENGINEERING. Using the matrix I can immediately draw the graph! For a digraph (simple),

→ {a ∈ ∧

we can define an adjacency matrix (Non-weighted) Boolean Matrix = 1, (i, j) E

ij

0 otherwise}.

For example,  

0 1 0

3×3

∈ {0, 0 0 1

A 1} =  

1 1 0

65

By using an adjacency matrix, again, I can construct the digraph. Both in the weighted

and in the simple one, I have 0 as diagonal elements. SELF-EDGES are special (i, i) edges (I

communicate my state to myself). Digraph with self-edges could have this WAM for example:

a a 0

11 12

0 a a

W AM = 22 23 

a a a

31 32 33

In designing our distributed algorithm we can have those type of matrices. If the graph is

UNDIRECTED, we have SYMMETRIC MATRICES, for example:

   

0 a a 0 1 1

12 13

a 0 a 1 0 1

A = , A =

23 23

W AM W M

   

a a 0 1 1 0

13 12

We’ll need another pair of matrices. In this case the matrices are DIAGONAL MATRICES,

and tell the In-degree and the out-degree. Suppose three elements for this sample graph to use

for definitions.

Definition 38. OUT-DEGREE (DIAGONAL) MATRIX

OU T

 

d 0 0

1

OU T OU T 0

0 d

D =  

2 OU T

0 0 d

3

Definition 39. IN-DEGREE (DIAGONAL) MATRIX

IN 

d 0 0

1

IN IN

0 d 0

D = 

 2 IN

0 0 d

3

Why we’ve introduced this? Let’s write this matrices for our digraph:

(   

1 0 0 1 0 0 )

OU T IN

0 1 0 0 2 0

D = , D =

   

0 0 2 0 0 1

OU T IN IN OU T

Notice that if I change the order of the edge, I’m changing D with D . (D , D

si scambiano, mentre A e A vengono rispettivamente trasposte).

W AM W M

=⇒ By changing the order of the edges,

( IN OU T

{D }

EXCHAN GE , D

{A }

T RAN SP OSE , A

W AM W M

Let’s introduce a new matrix, the LAPLACIAN OF A GRAPH. How it is defined?

Definition 40. LAPLACIAN OF A GRAPH

OU T OU T −

L := L = D A

example:      

−1

1 0 0 0 1 0 1 0

( )

OU T −1

0 1 0 0 0 1 0 1

D = , A = =⇒ L =

     

−1 −1

0 0 2 1 1 0 2

OU T

{diag{d }}| − ∈

How we construct? [(i, j) E] . Where here we’ve used the

×N ×N

N N

i

Iverson’s parenthesis notation for the second matrix term.

66

Definition 41. IN-DEGREE LAPLACIAN >

IN IN −

L = D A

is the LAPLACIAN of the graph with all the edges inverted.

If the edges are all inverted, then STRONG CONNECTIVITY property are well-mantained.

IN

It doesn’t change anything. Let’s keep in mind this remarks. L, L rimarrebbero invarianti in

presenza di self-edges! Sarebbero gli stessi della stessa versione del grafo con self-edges aggiunti.

We’ll give a theorem for the Laplacian Matrix:

Theorem 19. (PROPERTIES OF LAPLACIAN)

The following statements hold:

• i) L1 = 0; (vettore colonna), where:  

1

..

1 =  

.

 

1

IN OU T

(somma delle righe = elementi di D . Analogamente per colonne e D );

• >

⇐⇒

ii) G undirected L = L is symmetric;

• ≥

iii) G undirected =⇒ L 0 (pos. semidef.);

• ⇐⇒ −

iv) G contains a globally reachable vertex rank(L) = n 1;

>

(L+L )

• > >

⇐⇒ ∨ ≥

v) G is (weight) balanced 1 L = 0 0 (pos. semidef.);

2

Let’s comment:

• 1) If I sum all the elements of the row of a Laplacian I get 0 =⇒ 1 is a right eigenvectors

⇐⇒ ∈

of L with associated eigenvalue 0 1 ker(L);

• 4) For a STRONGLY CONNECTED GRAPH every vertex is globally reachable vertex

=⇒ rank(L) = n 1; ⊥

>

IN OU T ⇐⇒ ∈

If the graph is balanced, (d = d ) =⇒ 1 L = 0 1 ker L:

i i  

... ... ...

... .. ..

1 . . . 1 = 0

 

. .

 

... ... ...

3.1.3 NETWORK OF A PROCESSOR for a distributed algorithm

{G(t)} {(I,

We identify a NoP with a time-dependent graph = E(t))} , where I =

t≥0 t≥0

{1, }

. . . , N is a set of identifiers of the processors and E(t) is set of edges at time t, with

(i, j) E if processor i can send a message to processor j. Generally time-variant graph:

• DISTRIBUTED ALGORITHM SETS:

[i] [i] [i]

⇐⇒ ∈

– X of processor states x X ;

– A set of processor messages; 67

• MAPS to identify algorithm:

[i] [i] × 7→

– msg : X I A message GENERATION FUNCTION;

[i] [i] N [i]

× 7→

– stf : X A X STATE TRANSITION FUNCTION (STF);

Typically we have:  [i] ∀i ∈ {1, }

X = X . . . , N

 [i] ∀i ∈ {1, }

msg = msg . . . , N

 [i]

 ∀i ∈ {1, }

stf = stf . . . , N

[i]

7→ ×

msg : X A (Generally X I because it can happen that I may send different message

to different processors). A NETWORK in which these properties hold is called UNIFORM.

Each node will send its own update plus the network is given. . . Has not a parameter design to

decide.

It is not Parallel computation. What is the evolution of our distributed algorithm?

( [j] ∈

msg(x ), (j, i) E(t)

[i]

[i] [i] [i]

x (t + 1) = stf (x , y (t)), y =

j N U LL, otherwise

N U LL = there is no message =⇒ the message is not sent to i =⇒ i cannot use this informa-

 

N U LL

[1] N U LL

tion. For example, y = . VERY IMPORTANT! Node i can use information only

 

[3]

msg(x )

from neighbors! What is the goal? Design that state transition function. Centralized control

[i]

part. If we don’t have one, we’ll achieve that goal with a distributed system. X SPAZIO degli

STATI del processore i! Ex. boolean, reali, stringhe, etc. . .

[i] [i] [j]

{msg(x

x (t + 1) = stf (x (t), (t)} )

IN

j∈N (t)

i

where the INPUT part is the set off the states of IN neighbors (Messages composition). This

[j]

is a well structured dynamic. Another assumption that we’ll often use is that msg(x (t)) =

[j] ∈ {1, }.

x (t), i . . . , N

3.1.4 DISCRETE-TIME DYNAMICAL SYSTEMS

We have stacked states: [1]

 

x ...

x :=  

 

[N ]

x

x(t + 1) = ST F (x(t)) [i] [i] [j]

·(x {x

Keeping into account SPARSITY. Where ST F (x(t)) = x (t + 1) = (t), (t)}).

i

[2]

But y =? Algoritmi risolutivi in un contesto dinamico (sequenziale, etc. . . ). Questo stesso

problema, che risolverei con un algoritmo sequenziale, posso risolverlo in un mondo nel quale i

dati sono distribuiti? I nodi possono iterare un’azione per far compiere l’algoritmo alla fine nel

complesso. Sviluppare metodologie che sfruttano le nuove tecnologie. Collezione distribuita dei

dati. 68

STATIC CONTROL NETWORK IN DISCRETE-TIME

(How to model a control network?) (special case)

There exists also (general-related) one. What are the ingredients? First of all, the network:

• I = (1, . . . , N ) as before;

• ∈

E := E(t), set of edges such that (j, i) E(t) if j can send messages to i;

• ∀i, some local dynamic:

[i] [i] [i] [i] [i] [i] [i] [i]

× 7→

x (t + 1) = f (x (t), u (t)), f : X U X

⇐=

( Since we’re in discrete time), where:

( [i]

X set of states (state space) of system i

[i]

U set of inputs (input space) of system i

CONTROL AND COMMUNICATION LAW (CCL) (as DISTRIBUTED CON-

TROL LAW (DCL))

• SETS: [i] [i] [i] × 7→

– X set of states. X = X (as before, msg = msg : X I A);

– A which is set of messages, as before;

• MAPS: [i] [i] × 7→

– msg : X I A MESSAGE GENERATOR FUNCTION;

[i] [i] N [i]

× 7→

– ctr : X A U ;

[i] [i] [i] [i]

x (t + 1) = f (x (t), ctr(x (t), y (t)))

[i] [i]

where typically also the control function is the same: ctr = ctr, f = f .

( [j] ∈

msg(x ), (j, i) E(t)

[i]

y (t) =

j N U LL, otherwise

(As before we have a discrete time dynamical system. The difference is that each agent has

its own dynamic, imposed by the nature of the system). ctr is typically a special feedback law,

a function only of the neighbors. Really very challenging in distributed context.

[i] [i](t) [i] [j]

{msg(x

x (t + 1) = f (x , ctr(x (t), (t))} )

IN (t)

j∈N

i

Nel caso generale possiamo avere uno stato fisico ed uno virtuale assieme.

69

STATIC CONTROL NETWORK IN CONTINUOUS TIME

In continuous time. Now our time is a Real variable t R.

∈ 7→ · ·(t))

t t G(t) = (I, E(t)), G(t) = = (constant,

R,

The idea is that the graph contains a piecewise continuos set function (E(t)). G(t) is a

piecewise constant set valued function. What about the network? Exactly the same sets and

maps: [i] [i] [i] [i] [i]

ẋ (t) = f (x (t), ctr(x (t), y (t)))

this is the local dynamic of the agents i, where:

[i]

( [j] ∈

y (t) = msg(x (t)), (j, i) E(t)

j

N U LL, otherwise

According to the previous notation we have:

[i] [i] [i] [j]

{msg(x

[

ẋ (t) = f (x (t), ctr(x (t), (t))} )]

IN

j∈N (t)

i

We have also other dynamical systems, in which we have ODE where f changes disconti-

nuously. (SWITCHED SYSTEMS) HYBRID SYSTEMS. in which the dynamic can change

discontinuously. ⇐⇒ ·(t).

We’ve been considering graph where such as they are depending on time G =

This is a further challenging class of systems. If the graph is fixed and not dependent of time,

we go back to the class of systems we know. (Nei sistemi ibridi la dinamica addirittura cambia

istantaneamente). CLASSICAL STATE SPACE:

[i] [i] [i] [j]

{msg(x

ẋ (t) = f (x (t), ctr(x (t), (t)} ))

IN IN

6

j∈N = N (t)

i i

IN IN

⇐⇒ 6 6 ·(t).

We have a FIXED GRAPH N = N (t) = Nel tempo discreto le cose sono

i i

più semplici: [i] [i] [i] [i]

·(t), ∈ {1, }

x (t + 1) = f (x = u ), i . . . , N

[1] [1]

   

x u

( )

.. ..

← x :=

x(t + 1) = F (x(t), u(t)) , u :=

   

. .

   

[N ] [N ]

x u

Se mettiamo in retroazione tutti gli stati, avremo un approccio CENTRALIZZATO, non

distribuito. Prima classe di algoritmi distribuiti / leggi di controllo distribuite; algoritmi di

diffusione lineare. Algoritmi ove ogni agente applica una legge di controllo lineare sui propri

vicini o sul suo stato. Problema del CONSENSO/AGREEMENT.

3.2 LINEAR DISTRIBUTED ALGORITHM

A first class distributed algorithms (control laws). In particular, one of the first problem

is that one of the Agreement/Consensus. CONSENSUS or AGREEMENT. The goal is the

[i] ∈ {1, }

following: suppose we have our network system with x , i . . . , N and communication

{G} {1, }.

graph = (I, E(t)) , I = . . . , N Design a distributed algorithm such that:

t≥0 t≥0 [i] [j]

kx − ∀i, ∈ {1, }

lim (t) x (t)k = 0 j . . . , N

t→+∞ 70

[initial states]. All the limits, all the states agree, Reaching consensus in finite/asymptotic

[i] ∈ {1, }

time. AVERAGE CONSENSUS. The goal is: suppose x , i . . . , N are the initial states

0 [i]

N

1 P

of the systems. Goal: reach consensus on the average of the initial states =⇒ x .

0

i=1

N

GOAL: Agree on the average of CINITS. Let’s see how LDA can solve the consensus problem

and also the average consensus problem. Let’s recall what a distributed algorithm looks like:

[i] [i] [j]

{x

[x (t + 1) = stf (x (t), (t)} )]

IN

j∈N

i

Suppose the digraph G is fixed. State communication graph G = (I, E). Goal: design

{1, } ∧ ⊂ ×

stf =?. G = (I, E), I = . . . , N E I I. Want to design stf . We can do it by using

a LDA Linear (discrete-time) dynamics.

Try this kind of intuition: why doesn’t It start averaging my neighbors states?

X

[i] [j]

x (t + 1) = f x (t)

ij

IN ∪{i}

j∈N

i

∈ {1, }.

where i . . . , N f positive constants, positive coefficients. Distributed algori-

ij

thm/(control law of x treated as a physical state). Let’s compute a linear combination of

[j] IN

x (t). What should I choose as f ? Just average my neighbors states (#· = d + 1). Let’s

ij

stay however with a general combination. All these dynamics are linear =⇒ linear system.

How can we analyze this dynamic? First of all, let’s rewrite it in the following way: =⇒

N

X

[i] [j]

[x (t + 1) = f x (t)]

ij

j=1

⇐⇒ ∈ ∨ ∧

*, but assume f > 0 (j, i) E i = j f = 0 otherwise. Completely equivalent

ij ij

[i]

version of x (t + 1); all these dynamics are linear. Discrete-time linear dynamical system. Let’s

again stack all states in a single vector (by impilating them). Let’s suppose for the moment

that: [1]

 

x ...

x := =⇒ x(t + 1) = F x(t)

 

 

[N ]

x

LINEAR-TIME-INVARIANT SYSTEM (discrete-time), where for the state matrix we have:

×N

N [i]

∈ ∈

F , x (t) (it could be also other several possibilities). F is a constant matrix

R R

⇐⇒ f are constants. Since the graph is fixed is time-INVARIANT. But then, it’s easy!

Construct F , get the graph. Study its eigenvalues and then I get all informations about the

systems! Why the problem is so challenging? Once we get graph and F , we’re OK! But who

gives us the matrix? f ? Our goal is try to give some general answers. Try to connect properties

ij

of dynamical systems and the properties of the graph. Reach an acceptable generality level.

Remark

WAM if we consider our communication graph, F is some sort of WAM. In WAM we said

⇐⇒ ∈ ∧ ⇐⇒ ∈

a > 0 (i, j) E 0, otherwise. Here f > 0 (j, i) E is an edge! Sort

ij ij

of convention. Reverse definition. First difference f > 0 also for i = j! Second difference. It

ij

>

holds [F = A ], where A is the weighted adjacency matrix (WAM) of G when adding self-loops,

> →

then we get (F = A ). Why the transpose? For the differences of indices =⇒ (i, j) (j, i).

Changing the orientation of the edges. Just a convention, however. Discrepanza di notazione

71

dovuta al fatto che inizialmente la teoria dei grafi non era pensata per la comunicazione. F è a

tutti gli effetti la matrice di adiacenza per un grafo pesato. 7→

GOAL: Link properties from graph theory to linear systems theory. Mapping GRAPH

LTI’s. Let’s look at F as properties of the graph. First of all, F is a non-negative matrix

⇐⇒ 6 ⇐⇒ ≥ ∀i, ∈ {1, }.

f < 0! (F Non-negative matrix). f 0 j . . . , N Then let’s introduce

ij ij

some properties of matrix. Introduce the notion of irreducible MATRIX.

Definition 42. Irreducible Matrix

⇐⇒

A square matrix A is irreducible > −

1

| Π AΠ

is upper-triangular.

FACT If A is the (weighted) adjacency matrix of a (weighted) digraph, then A is irreducible

⇐⇒ the graph is strongly connected. (Connection between theories as (CBT)). If you have a

⇐⇒

SPARSE MATRIX with non-negative values, then you can do this association ( holds).

Now we’ll give the notion of PRIMITIVE MATRIX.

Definition 43. PRIMITIVE MATRIX

⇐⇒

A matrix A is primitive

k k k k

∃(k ∈ | ⇐⇒ ∀(i, ∈ {1, } × {1, }

A > 0 [A ] > 0 j) . . . , dim A . . . , dim A

N) i,j

k

Where A is substantially a positive matrix. It doesn’t do anything for the moment with

the graph’s theory, but. . .

Proposition 3. 1.35 BCM Let G be a weighted digraph, with weighted adjacency matrix A.

Then the following statements are equivalent:

• i) A is (primitive);

• ii) G is strongly connected and APERIODIC;

(La classe delle matrici primitive è INCLUSA in quelle delle matrici irriducibili). A graph is

APERIODIC if the largest common cycle I can create is of dimension 1. To generate an aperiodic

graph is sufficient to add self loops. Quindi se G presenta self loops, allora è APERIODICO.

Let’s make a connection with dynamical systems. t ⇐⇒

Lemma 2. The square matrix A is convergent, lim A = 0

t→∞

{kλ k |

ρ(A) = max λ = eigval(A) = spectral radius < 1}

C

√ 2 2 ka

kλk a + b = + jbk . A is semi-convergent, meaning that:

ex. a + jb =⇒ = C

t

∃ ⇐⇒

lim A

t→∞

• ≤

i) ρ(A) 1;

• ii) λ = 1 is the only eigenvalue on the unit circle;

• iii) algebraic multiplicity of (λ = 1) is equal to the geometric multiplicity;

Except of ii), these are the stability properties for a DLTI system. Con la ii) stiamo

escludendo la possibilità di avere moti oscillatori, altrimenti il limite non sarebbe ben definito.

72

LINEAR DISTRIBUTED ALGORITHMS

Theorem 20. Perron-Frobenius for primitive matrices

If the non-negative matrix A is primitive, then =⇒

• {kλ k |

i) ρ(A) > 0 (ρ(A) = max λ is an eigenvalue of A});

C

• and it is strictly larger than the magnitude of the

ii) ρ(A) is an eigenvalue, it is simple

other eigenvalues;

• iii) ρ(A) has an eigenvector with positive components;

Corollary 2. If the non-negative matrix A is primitive, then =⇒

1 t

∃ lim [ A]

ρ(A)

t→∞

(If I scale all eigenvalues with the largest eigenvalue, then all the eigenvalues are < 1).

N

X

[i] [j]

[x (t + 1) = f x (t)]

ij

j=1 N

P f = 1

Let’s suppose that. . . Of course, most of them will be equal to 0. Assume ij

j=1

∀i ∈ {1, }.

(convex combination) . . . , N If I take my matrix F and I sum each element of a

row, it is equal to 1 =⇒  

1

..

F 1 = 1, 1 :=  

.

 

1

A matrix that is non-negative and satisfies that property is called ROW STOCHASTIC.

Notice that F is a SPARSE MATRIX. If the elements on the columns sum up to N is equal to

1, respectively: =⇒ N

X

> > ⇐=

1 F = 1 f = 1

ji

j=1

=⇒ F is COLUMN STOCHASTIC. A (ROW+COLUMN) STOCHASTIC matrix is

called DOUBLY STOCHASTIC, or simply STOCHASTIC. We’ll show that [λ = 1 = ρ(F )]

(the largest eigenvalue of the matrix F ). 1 è un autovettore della matrice F con autovalore 1.

Dobbiamo dimostrare che è il più grande.

Theorem 21. Gershgorin

×N

N

Let A a matrix with a entries, then any eigenvalue λ of A satisfies:

R ij X

∈ ∪ {z ∈ | kz − k ≤ |a | }

λ a = R

C ii ij i

}

i∈{1, ..., N C j6 = i

In general this is a very rough result, but in our case is quite useful. Let’s see what happens

in our matrix =⇒  

a a a

11 12 13

a a a

A = 21 22 23

 

a a a

31 32 33

73

 |a | |a |

R = +

1 12 13

 |a | |a |

R = +

2 21 23

 |a | |a |

R = +

 3 31 32

|f | · · ·

=⇒ f = > 0. Let’s take the first row for example. f + f + + f = 1 =⇒

ij ij 11 12 1N

P P

− −

[f = 1 f ]. This holds in general. f = 1 f . Why is it interesting? Suppose

11 1j ii ij

j6 =1 j6 = i P

≤ ⇐= − −

I take (f > 0). Then the radius will be (R 1) 1 ( f = R ) = 1 R =

ii i ij i i

j6 = i

− ⇐⇒

f =⇒ R = 1 f Il raggio R sarà sempre minore di 1. Se λ = 1, esso sarà l’unico

ii i ii i

possibile autovalore più grande =⇒ λ = 1 = ρ(F ) is the spectral radius. Sappiamo solo che

(λ = 1) è l’autovalore di NORMA MASSIMA. Ma nessuno mi dice che (λ = 1) ha molteplicità

1! (f > 0) (Tutti i cerchi saranno centrati nel cerchio di raggio 1, e toccheremo l’1 solo in un

ii

punto!)

Theorem 22. Consider the consensus system for a linear distributed algorithm:

x(t + 1) = F x(t)

Assume F is ROW STOCHASTIC (I’m choosing f such as they are convex combination of

ij

neighbors). Then let the communication digraph G (with self-loops) be strongly-connected, and

n

let v be a left eigenvector of F with eigenvalue 1. Then =⇒

R

• i) consensus is asymptotically reached, i.e.  

α

..

lim x(t) = α1 :=  

.

 

t→∞ α

n

∈ ∀x ∈

for some (α and this holds ;

R), R

0

• ii) The consensus value is: N

> [i]

P v x (0)

v x(0) i

i=1

= ];

[α = > N

v 1 P v

i

i=1

• iii) If F is doubly stochastic, then average consensus is reached:

N

1 X [i]

[α = x (0)];

N i=1

(la media delle CINIT’s);

We’re assuming that the agents communicate according to a fixed graph. Assume G is

It’s a requirement to have complete spreading of information (informations

strongly connected.

can propagate). All vertex do a convex combination of other neighbors states. Agent i can

> > >

P ∧ ⇐⇒

decide its own weights! f = 1 v F = v v left eigenvector, with eigenvalue 1.

ij

j [1]

 

x ...

• 1) The Agent reach consensus x := ; in general, α is a convex combination of initial

 

 

[N ]

x > >

=⇒ [v = 1 ].

state. But if the matrix F is doubly stochastic

74

• 2) Preso qualsiasi grafo fortemente connesso =⇒ se agisci con questo protocollo, allora

raggiungi il consenso (alla media);

• >

3) Se G è UNDIRECTED (f = f =⇒ [F = F ], F diventa una matrice simmetrica

ij ji

ed è quindi (doppiamente stocastica); ⇐⇒

Grafo fisso, dinamica lineare. Situazione privilegiata. λ = 1 = ρ(A) Il raggio

spettrale è un autovalore, ed è proprio 1. Il raggio spettrale di una matrice stocastica è 1 per

Gershgorin.

LINEAR DISTRIBUTED ALGORITHMS AND CONSENSUS

Theorem 23. Consider the consensus system:

X

[i] [j] ∈ {1, }

x (t + 1) = f x (t), i . . . , N

ij

IN ∪{i}

j∈N

i

with G = (I, E) the associated fixed digraph, or equivalently: x(t + 1) = F x(t). Assume

n

that F is ROW STOCHASTIC and let (v ) be a left eigenvector of F with eigenvalue 1.

R

Assume G is strongly connected. Then:

• n

∈ ∈

Consensus is asymptotically reached, i.e. lim x(t) = α1, for some (α (x );

R), R

t→∞ 0

• The consensus value is: N

> [i]

P v x (0)

v x(0) i

i=1 ];

[α = =

> N

v 1 P v

i

i=1

• If the matrix F is doubly STOCHASTIC =⇒ average consensus is asymptotically

reached;

Dimostrazione. Let’s start to prove the theorem.

• First of all, since we are considering G as it does have self-loops, then F is primitive since

G is aperiodic and strongly connected. We’re assuming positive weights in the diagonal

of the matrix F . So, the graph G is strongly connected and by considering self-loops, is

aperiodic. Thus, by 3, F is primitive.

By using 21 and the assumption that F is ROW STOCHASTIC (λ = 1) is an eigenvalue

and ρ(F ) = λ = 1. So, λ is the spectral RADIUS of that matrix.

By using theorem 1.13 (BCM) and the 20, (λ = 1) is a simple eigenvalue (algebraic

kλ k ≥ kλ k ≥ · · · ≥ kλ k.

multiplicity := geometric multiplicity); and λ = 1 > This

2 3 N

is a consequence also of Gershgorin theorem. L’unico modo per un autovalore per avere

norma pari a 1 è proprio essere l’autovalore 1! E secondo Gershgorin gli altri debbono

essere anche inferiori ad esso in norma. Thus by using Lemma 1.7, F is semi-convergent,

i.e. t t

∃ ∃

lim F =⇒ lim x(t) = lim F x(0)

t→∞ t→∞ t→∞

and this limit exists. Since this and since F is row stochastic (F 1 = 1) =⇒ λ = 1

is an eigenvalue! It can be shown that lim x(t) = α1. The space span by 1 is the

t→∞

largest INVARIANT SET (some sort of discrete time Lasalle’s version). Then =⇒

consensus is reached. But how consensus is reached? RECAP-RECALL: If the matrix

75

F is DOUBLY STOCHASTIC =⇒ then the average consensus is also asymptotically

>

t

reached. It is possible to show that: lim F = 1w , and therefore

t→∞ >

t

lim x(t) = lim F x(0) = 1w x(0) = 1α

t→∞ t→∞

• > > > > > ∀t ≥ ∀t ≥ ⇐⇒

v x(t + 1) = v F x(t) = v x(t) thus v x(t) = v x 0; If holds 0

0 >

v x(0)

> > > > > >

⇐⇒ ;

lim v x(t) = v x(0), but v (α1) = v x αv 1 = v x(0) =⇒ α =

t→∞ 0 >

v 1

• > >

⇐=

If F is doubly stochastic, v = 1 [1 F = 1 ]. Thus:

N

>

1 x(0) 1 X [i]

α = = x (0)

> N

1 1 = N i=1

=⇒ and this is exactly the average. In questo caso sia l’autovettore sinistro che quello

destro sono un vettore di uni.

Ricordiamo che abbiamo scelto una particolare F non-negativa. F è un parametro di pro-

getto che decido io. In realtà i pesi (i, j) se li sceglie il nodo i autonomamente senza interpellare

gli altri. Design della F fatta da noi. Avendo supposto f > 0, F può ad esempio essere questa:

ij

 

f f 0 0

11 12

f f f 0

21 22 23

 

F =  

0 f f f

32 33 34

 

0 0 f f

43 44

Se impongo che questa debba essere stocasticamente riga e colonna, possiamo dimenticarci

in questo momento delle dimostrazioni ed impongo che F sia in partenza simmetrica.

LINEAR DISTRIBUTED ALGORITHMS FOR TIME-DEPENDENT SYSTEMS

We’ve proven this consensus result for a static graph (it is fixed and doesn’t change with

time). But typically can happen that communication depends somehow by the time. Consensus

algorithm for a time-dependant system: X

[i] [j]

x (t + 1) = f (t)x (t)

ij

IN

j∈N (t)∪{i}

i

(Some structure as in the static case). Therefore it can be written as:

N

X

[i] [j]

x (t + 1) = f (t)x (t)

ij

j=1

⇐⇒ ∈ ∨ ∧ ∀i ∈ {1, }.

and assume that: f (t) > 0 (j, i) E j = 1 f = 0 otherwise . . . , N

ij ij

Now we need the same connectivity. ASSUME that:

• i) ∃δ | ≥ ∀t ≥ ∀i ∈ {1, }

> 0 f (t) 0 0 . . . , N

ii

• ii) ∈ {0} ∪ {[δ, ∀t ≥ ∧ ∀i, ∈ {1, }

f (t) 1]} 0 j . . . , N

ij 76

• iii) N

X ∀t ≥ ∀i, ∈ {1, }

f (t) = 1 0 j . . . , N

ij

j=1

Theorem 24. Consider a matrix in linear distributed algorithm:

N

X

[i] [i]

x (t + 1) = f (t)x (t)

ij

j=1

{G(t)}

with time-dependent communication graph . Assume that ASSUMPTION 1 holds and

t≥0

{G(t)}

that is uniformly joint connected (let’s recall the definition):

t≥0

Definition 44. RECALL: UNIFORMLY JOINT STRONGLY CONNECTED GRA-

PH tT

∪ G(τ )

τ =(t−1)T +1

∀t ≥ 1, for some T > 0.

Then consensus is asymptotically reached. (This is the counterpart of the static version we

are given).

REMARK: If the graph is balanced =⇒

joint connectivity strongness is sufficient. (GRAFO BILANCIATO) =⇒ F PRIMITIVA

> >

=⇒ [F = F = A = A ]. If the matrix is doubly stochastic then =⇒ average consensus is

reached. Let’s just give an idea of what is included in the proof of this result. Let’s write the

definition in a compact way: [1]

 

x ...

x = ; x(t + 1) = F (t)x(t), x(0) = x

  0

 

[N ]

x

Now the system is time-varying. Let’s see what is the STM = state transition matrix:

(

x(t) = φ(t, t )x(t )

0 0

− −

x(t) = F (t 1)F (t 2) . . . F (0)x

0

φ sia in tempo continuo che un tempo discreto. In other words, we want to study:

− ⇐=

lim x(t) = lim (F (t 1) . . . F (0)x )

0

t→∞ t→∞

basically is equivalent to study the product of these matrices:

∃ −

lim F (t 1) . . . F (0)

t→∞

LEFT PRODUCT OF (ROW) STOCHASTIC MATRICES

. . . by the way both the static version and the time-dependent version of the algorithm

are strictly related to MARKOV CHAIN different states and somehow these states are

connected to a graph. Consider a random walk on the graph. That’s why they’re called somehow

N

P

⇐⇒

STOCHASTIC. f can be considered as a probability f = 1. WEAK STRONG

ij ij

j=1

77

ERGODICITY. In particular, for the left products of row matrices, WEAK ERGODICITY

holds, and >

[∃ lim F (t 1) . . . F (0) = 1w ] =⇒

t→∞

for some properties of the graph. Then you’ve immediately consensus reached.

>

=⇒ lim x(t) = lim F (t 1) . . . F (0)x = 1w (x = x(0)) = 1α

0 0

t→∞ t→∞

Questi prodotti a sinistra di queste matrici hanno a che fare con la teoria della catena di

MARKOV. Lo stesso comportamento dinamico ce l’abbiamo con i sistemi di consenso. Ov-

viamente la proprietà di ergodicità si può dimostrare tramite delle proprietà del grafo (Alcune

proprietà di connettività =⇒ Joint Strong Connectivity as JSC); piccola osservazione: i ri-

sultati dati, per avere consenso alla media, abbiamo dovuto assumere il grafo bilanciato =⇒

o quantomeno che la matrice F sia DOPPIAMENTE STOCASTICA. Per avere una media

dobbiamo avere una sorta di simmetria nella/e comunicazione/i. Si ricorre all’utilizzo di altri

algoritmi non lineari. Nella pratica però è molto difficile!! CASO UNDIRECTED CASO

BILANCIATO.

LINEAR DISTRIBUTED CONTROL FOR CONTINUOUS-TIME SYSTEMS

Let’s make an observation on some of these algorithms. Agents reach consensus If we think

processors as having physical states. . . Discrete-time control with ”first-order dynamics”. Each

system has some dynamics that is: [i] [i] [i]

x (t) + u (t) = x (t + 1)

[i]

SOME SORT OF INTEGRATOR. u (t) some sort of accumulator. If I look at this system

as a control system with an INPUT, we can apply consensus algorithm theorem and reach

consensus =⇒ equivalent to solve RENDEZVOUS PROBLEM. Try to asymptotically reach

the same point. This is supposed to be done by chatting only with local neighbors. How can

we design u such I can get consensus? DESIGN u such as my dynamic become:

X

[i] [i] [j]

x (t + 1) = f x (t) + f x (t)

ii ij

IN

j∈N i

(CONSENSUS DYNAMIC). Il grafo quindi in questo caso potrebbe dipendere dallo stato

e quindi eventualmente perdere ad un certo punto la connettività! Il consensus si raggiunge

mantenendo quindi la CONNETTIVITA’.

IN ·(t, ·t)].

[N = X, Studio delle dinamiche sociali. Possibile autoinfluenza su una opinione

i

oppure mano a mano avremo sempre più delle opinioni distanti? Grafo e dinamica sono in

feedback l’uno con l’altra! Possibile perdita della Connettività, e quindi del consenso.

(. . . ) N control systems, (continuous time). Let’s suppose integrators:

[i] [i] ∈ {1, }

ẋ (t) = u (t), i . . . , N

Does there exist a feedback control (u) such the agents reach consensus? Consensus means:

[i] [j]

kx −

lim (t) x (t)k = 0

t→∞ X

[i] [j] [i]

(x (t) x (t))

u (t) = IN

j∈N i 78

It’s like we’re applying a (P = Proportional) Control, (Controllo proporzionale solo alla diffe-

renza tra stati). If CLOSED-LOOP DYNAMICS

X [j] [i]

[i] [i] −

(x (t) x (t));

ẋ (t) = u (t) = IN

j∈N i

LINEAR SYSTEM (linear combination). We’ll prove that this dynamics allows us to reach

consensus. [Problema del consenso strettamente collegato a quello della SINCRONIZZAZIO-

NE]. Progettare una legge di controllo distribuita per una rete di singoli integratori.

LINEAR DISTRIBUTED CONTROL FOR CONTINUOUS-TIME SINGLE IN-

TEGRATORS [i]

The goal of this network is to reach consensus. (Could generalize a little bit since x (t)

may be a vector itself) [i] [i] ∈ {1, }

ẋ (t) = u (t), i . . . , N

X

[i] [j] [j]

ẋ (t) = (x (t) x (t));

IN

j∈N i

[i]

[i]

(x (0) = x . Reach consensus and possibly reach consensus on the average of CINITS).

0 IN

Distributed because j N . (fixed version of the graph) =⇒ then the dynamic of this

i

system is that of a classical LTI system. [1]

 

x ..

},

G = ({1, . . . , N E), [

ẋ =?x(t)], x :=  

.

 

[N ]

x

[i] ∀i,

Since ẋ (t) is linear dynamic the entire dynamic is LTI =⇒ ẋ = M x (UNFORCED

6

LTI SYSTEM (No INPUT)). What is that M = M (t)? Are we able to find that special matrix?

[i]

Let’s try to rewrite ẋ (t) in a different way: IN

IN

IN }

{d = #N

N

ẋ = M x, = i i

i

X X

[i] [j] [i]

ẋ (t) = x x (t) = (. . . ) =

IN IN

j∈N j∈N

i i

where the underlined summation are terms that don’t depend by j. Then =⇒

N

X X

[j] IN [i] [j] IN [i]

− −

(. . . ) = x (t) d x (t) = [ a x (t) d x (t)]

ji

i i

j=1

IN

j∈N

i

⇐⇒ ∈

where a = 1 (j, i) E. In other words, let’s introduce this notation: Given A as the

ji

ADJ. MATRIX:  

a . . . a . . . a

11 1i 1N

... .. .. .. ..

A =  

. . . .

 

a . . . a . . . a

N 1 N i N N

Let A be the i-th column of A. Then =⇒

i >

[i] IN [i]

ẋ = A x d x

i i

79

If I stack all of these dynamics I get: IN

 

d 0 ... 0

>

[1] 1

   

ẋ A IN

1 0 d ... 0

 

.. .. 2

= x x

     

.. .. ..

. .  

    . . ... .

>  

[N ] A

ẋ IN

N 0 ... 0 d

N

where: >

( IN

? = (A D )

> IN IN

− −L

ẋ = (A D )x =⇒ ẋ = x

If we write a different dynamics:

REMARK: X

[i] [j] [i]

ẋ (t) = x x (t)

OU T

j∈N i

−Lx,

In this case the dynamic become ẋ = where L is the Laplacian of G.

Noi per convenzione abbiamo adattato la convenzione che i riceve da j. Differente conven-

zione: ci viene fuori comunque il LAPLACIANO della teoria dei grafi standard.

OU T

− −Lx

ẋ = (A D )x, ẋ =

Theorem 25. (Continuos-time consensus)

Consider a network of N systems,

[i] [i] ∈ {1, }

x (t) = u (t), i . . . , N

},

with communication digraph G = ({1, . . . , N E); applying the consensus algorithm we have:

X

[i] [j] [i]

ẋ (t) = x (t) x (t)

OU T

j∈N

i

Let’s write the OUT neighbors so we can play with the standard LAPLACIAN L. Sup-

pose that G is strongly connected, let L be the LAPLACIAN of G with left eigenvector γ =

> >

γ . . . γ satisfying γ L = 0 left associated (γ autovettore sinistro, generatore dello

1 N

spazio ker L).

• ∀ ⇐⇒

i) a consensus is asymptotically reached the initial states (CINIT’s) lim x(t) =

t→∞

α1, for some (α R);

• ii) the consensus value is: N

> [i]

P γ x (0)

γ x(0) i N

i=1 ∈

α := =[ ]; (x );

R

0

> N

γ 1 P γ i

i=1

• iii) If the graph is BALANCED then the average consensus is asymptotically reached =⇒

N

1 X [i]

[α = x (0)];

N i=1

Those conclusions are strictly related to that of the discrete-time counterpart.

80

Dimostrazione. i) First of all, we need to prove that we reach consensus. Property of

Laplacian: 0 is an eigenvalue of Laplacian. By using the theorem on the properties of

the Laplacian, since G is strongly connected, then =⇒ rank(L) = N 1 (Each node

is globally reachable vertex); i.e., λ = 0 is a simple eigenvalue. [L1 = 0]. The kernel is

generated by 1, the unique (only) right eigenvector. Take a Jordan basis:

1 v . . . v

T = 2 N

How can we construct this Jordan Basis? Just to complete these Jordan basis. If I do

that, then I can take a change of variables x = T z. My x(t), I can write it as:

· · ·

x(t) = z (t)1 + z (t)v + + z (t)v

1 2 2 n N

Let’s write the dynamic of ż(t) =

− − − −

1 1 1 1

−T −T

ż(t) = T ẋ = Lx = LT z = (−T LT = J)z

where J is a Jordan matrix. Let’s again consider the Gershgorin theorem to study the

−L.

eigenvalues of the matrix All the eigenvalues belong to the left half plane =⇒ thus

OU T

P |l | ⊂

λ , . . . , λ have negative real part. R = = d . Consider r = #(blocks J).

2 i ij

N i

j6 = i

Now let’s write the expression of:  

z (0) R

1 0

 

v

r i  

t→∞

X X 0

0t j−1 λ t −−−→

z(t) = z (0) e + ( α t e 0) =

i  

1 ij  

.

.

 

i=2 j=1 .

 

0 0t

− ∞, →

Since all of this λ have negative real part, As t z(t) z (0) e = z (0). Thus

i 1 1

=⇒ [ lim x(t) = z (0)1];

1

t→∞

 0t

z (t) = z (0) e = z (0)

1 1 1

 v i

X j−1 λ t

α t e

z (t) = i

i ij

 j=1

Le altre componenti z sono influenzate da autovalori a parte reale negativa, quindi non

i

ci danno comunque alcun problema.

• >

ii) Adesso dobbiamo vedere a quale valore di consenso perveniamo! (γ is a left eigenvector

associated to eigenvalue 0).

> > >

y(t) = γ x(t), ẏ(t) = γ ẋ(t) = (−γ L = 0)x(t) = 0;

>

⇐⇒

=⇒ y(t) is a constant y(t) = y(0) = γ x(0). As in the discrete time case, this

− ∞, ∀t

must hold for t =⇒

> > > >

=⇒ lim y(t) = γ lim x(t) = γ (α1) = αγ 1 = γ (x(0) = x ) =⇒

0

t→∞ t→∞ >

γ x(0)

=⇒ α = >

γ 1

(Exactly as in the same version of the discrete-time case);

81

• >

⇐⇒

iii) If G is BALANCED 1 L = 0 =⇒ (1 = γ) (l’autovettore sinistro è in questo

caso proprio 1!). Therefore, N

> [i]

P x (0)

1 x(0) i=1

=

α = > N

1 1

(AVERAGE CONSENSUS).

(l’autovettore destro è in questo caso proprio 1!)

Il grafo path (linea) è uno dei casi peggiori per questo consenso. La velocità di convergenza

è legata al secondo autovalore più grande. Reverse network. Problem called CONTAINMENT

or IN LEADER-FOLLOWER NETWORK. 6

What is a LFN? Network of N systems = (master-slave = struttura di calcolo parallelo).

For continuos-time consensus, suppose:

[i] [i] ∈ {1, }

ẋ = u , i . . . , N

( l

n agents called leaders

f

n agents called f ollowers

l f

INPUT u is different depending on i. [n +n := N ]. Without loss of generality, let’s suppose

f

{1, }

. . . , n follow set of leaders. Informally, the leaders can somehow decide independently the

[i]

control INPUT u , depending on their own tasks. Suppose leaders are fixed. (fixed positions

2

∈ ). Suppose:

R [i] f

∀i ∈ {n }

u (t) = 0 + 1, . . . , N

},

(stationality of integrators). Let G = ({1, . . . , N N ) be the UNDIRECTED communication

⇐⇒ 6

graph. Let’s suppose G is fixed G = G(t); we stay we have a fixed undirected graph. The

followers run the following input: X

[i] [j] [i]

u (t) = (x (t) x (t))

j∈N

i

Note that N can include both leaders and followers! So, each follower looks at its neighbors,

i f

∀i ∈ {1, }

and it simply applies a consensus protocol, and this holds . . . , n (only for the fol-

lowers). I follower comunicano secondo G fisso. What is the GOAL? The GOAL is to show

that the followers asymptotically converge to the convex hull of leaders. HULL := CHIUSURA

CONVESSA == Convex enclosure = HULL. How do we show this? Let’s write the dynamic of

leaders/followers, by distinguishing their own dynamic: ẋ = 0 (naturalmente la dinamica dei

l

leader è nulla, avendo supposto la stazionarietà dei loro input-integratori.

[i] [j] [i] [j] [i]

X X f

− − ∀i ∈ {1, }

[

ẋ = (x (t) x (t)) + (x (t) x (t))], . . . , n

f f f l f

l

f j∈N

j∈N i

i

[i]

Let’s forget (x = 0); suppose all the vertices are running a consensus protocol. Let

l

us pretend that both leaders and followers run the consensus protocol, then I can write the

dynamics: [1]

 

x

f

x .

f  

.

x := , x = .

f  

x l  

f

[n ]

x f

82

−Lx]

then I can write: [ ẋ = (seen it in previous case). But now let’s remember that x

contains both x and x parts:

f l

L l

ẋ x

f f l

f f

ẋ = = >

l L

ẋ x

l

l l

f l

Where the antidiagonal blocks are equal and transposed since

(Just written L into blocks).

the Laplacian is symmetric.

But now let’s remember that leaders don’t run the consensus problem, thus:

( −L −

ẋ = x l x

f f f f l l

ẋ = 0

l

(TRUE DYNAMICS), where the underlined term is driven by an input that are the state of

the leaders! And then Now if we look at the following this is a LINEAR DYNAMIC (LINEAR

DISTRIBUTED) CONTROL PROTOCOL. This constant input (fixed locations of leaders)

refers to the product of l with the positions of the leaders x . We need to understand what are

f l l

the property of L matrix! One thing that we need to be careful is the following! L is NOT

f f

THE LAPLACIAN MATRIX of the followers subgraph! This could be a confusion that may

happen. Cambia il grado sulla diagonale! Tiene conto dei leaders e dei followers, non soltanto

dei followers! Inoltre non somma più a 0.

Lemma 3. If the graph G is connected, then L > 0 is positive definite. Let’s see why L is

f f

positive definite.

Dimostrazione. First of all, we know that: L is positive semidefinite, Because L is the Laplacian

> >

≥ ∀x ∧ ⇐= ∈

of an undirected graph =⇒ x Lx 0 x Lx = 0 x = α1, α Let’s take

R.

x f >

. If I compute x Lx, this will be equal to:

x = 0 > > ≥ ∀x

L x 0

x Lx = x f f

f > Lx = 0? No, it

For sure this L is positive semidefinite. Is it possible that for some x x f

f f f

> ⇐=

isn’t possible because x Lx = 0 x = α1. Thus =⇒

>

>

x x

f

f 6 ∀x

( L = 0) > 0 f

0

0 ∀i.

and this concludes the proof. L > 0 =⇒ λ (L ) > 0

i

f f

Lemma 4. Given fixed leaders’ states x , then =⇒

l −

1

−L

[x = l x ]

f l l

f e f

is the asymptotically stable equilibrum of the followers dynamics.

Dimostrazione. For followers dynamic, −L −

0 = ẋ = x l x

f f f f l l −

1

∃L

Since we want to compute the equilibrium, since L is positive definite and thus =⇒

f f

we can compute the equilibrium by simply inverting the element:

1

[x = (−L )l x ]

f e f f l l

83

−L

It is asymptotically stable because is neg. def. and thus Hurwitz and thus the equilibrium

f

is asymptotically stable. Followers will converge on the equilibrium point. We need to prove:

( −

∆x := x x

f f f e

˙ − −L

∆x = ẋ (

ẋ = 0) = ẋ = x

f f f e f f f

Theorem 26. containment for fixed graph

Given a leader/follower network with connected communication graph G, then the followers

asymptotically converge to the convex hull of the leaders.

Dimostrazione. From the previous Lemmas the followers states converge to x asymptotically.

f e

What are the components of x ? By simply setting to 0 each

f e [i] [i]

X [j] −

ẋ = 0 = (x x )

e

f e f e

j∈N

i

This means that: [i] [i]

X X X

[j] [j]

|N |

x = x =⇒ x = x =⇒

i

e e

f e f e

j j∈N j∈N

i i

1

[i] X [j]

x

=

=⇒ x e

f e |N |

i j∈N i

f

∀i ∈ {1, }.

. . . , n (This holds ONLY FOR THE FOLLOWERS!) This condition is saying that

each equilibrium state of the followers is a convex combination of the neighbors. Each followers

must end-up into the convex combination of its neighbors. For each followers, thus it must

belongs to the convex hull of its neighbors. No matter if they are leaders/followers! But since

the leaders are fixed and do not have to satisfy this condition, the proof holds.

84

Capitolo 4

(Distributed) Optimization’s

Algorithms

4.1 Starting Point Teoria del Controllo Ottimo. Requisiti di

Teoria dell’ottimizzazione finito-dimensionale. n

Ottimalità della Legge di Controllo. Gli spazi in cui lavoreremo saranno sottoinsiemi di .

R

Nel caso continuo saranno infinito-dimensionali (Spazi di funzioni continue). -) Ottimizzazio-

ne NON VINCOLATA. OPTIMIZATION METHOD IN CONTROL. NONLINEAR (FINITE-

n n

DIMENSIONAL) OPTIMIZATION. The decision variable lives in a subset of (A ).

R R

n

You can write every vector of with a linear combination of n vectors. (Generally it isn’t

R 1

possible =⇒ infinite-dimensional). We’re considering the finite-dimensional case. Let f C

(continuously differentiable).

• n 7→

min f (x), f : (it takes a vector, and gives in output a number). Same

R R

x∈X

assumptions of regularity on f ;

• n n n

X =⇒ x is a vector in . Se X = , globalmente, localmente od in qualche

R R R n

punto, allora abbiamo il problema UNCONSTRAINED. [If f is linear and X = , linear

R

PROGRAM];

• LOCAL MINIMUM We want to find x in order to minimize this cost function f =⇒ f

? ∈

achieves LOCAL MINIMUM =⇒ x X is a LOCAL MINIMUM if:

? ?

∃ | ≤ ∀x ∈

> 0 f (x ) f (x) , )

B̄(x

(quite natural) (Definition of local minimum);

• ? ∈

STRICT LOCAL MINIMUM x X is so-called when:

? ? ?

∀x 6 ∈

f (x ) < f (x) = x , x , )

B̄(x

(Se vale la disuguaglianza debole)

• ? ∈

GLOBAL MINIMUM x X is is so-called when:

? ≤ ∀x ∈

f (x ) f (x) X

Clearly we can talk also about Strict global minimum.

85

n n

∈ ∈

x can be any vector =⇒ x . Low-dimensional example is the 2D classical

R R

example. Now, what is the goal we want to achieve? Special optimization problem with Control

Theory. But less stay more general. How to calculate (compute) these minima? By using some

iterative algorithm. But to develop those, first we want to characterize these minima. There

are some necessary conditions:

• ∇(·)

The gradient must be equal to 0;

• ≤

The action msut be (·) 0;

4.1.1 NECESSARY CONDITIONS OF OPTIMALITY

Why are these necessary conditions important to approach those optimizations problems?

Typically what the algorithms do is the following: Construct a list, a vector of vectors that

{LOCAL

asymptotically satisfies these properties. Of course we can also define MAXIMA,

}.

STRICT LMAX, GLOB MAX, . . . (and so on). The definition of the maxima is completely

equivalent! ? ?

≥ ∀x ∈

f (x ) f (x) , )

B̄(x

Just changed the inequality. Quite straightforward property. Suppose you want to find a

? ? −f

minimum of some function f . x be min of f (x). Then x is local max of: (x) = g(x).

The same holds for the other properties. We just concentrate on minima problems. Just one

?

more definition: x is EXTREMUM if it is either a minimum or maximum (for example Saddle

Points) (punti di sella). (Se non sono massimi o minimi).

NECESSARY CONDITIONS OF OPTIMALITY FOR UNCONSTRAINED PRO-

BLEMS n ? n

We mean that either X = , or in a neighborhood of x , X can be approximated with .

R R

Why necessary? Conditions, which when they will be violated, we are sure that the point is

NOT a minimum or a maximum. It’s a condition on the gradient of the function f :

Theorem 27. FIRST ORDER NEC. COND

?

If x is a local minimum =⇒ ?

[∇f (x ) = 0] n

Let’s see why this is true! We use a more general method. Let’s take a vector (d ) (a

R

? ∈

fixed direction in which we want to move). Consider x + αd, (α (some real number).

R) ⇐⇒

We fix the direction and then we move in this direction, Clearly if α is sufficiently small

? ? n

|α| ∈ ≈

1 =⇒ (x + αd X). We’re assuming that in a neighborhood of x , X . Now we

R

? ?

remember x is a LOC. MIN. In particular, we want to evaluate f (x + αd). We fix d and we

? ?

change α. (x , d) can belong to different spaces! d and x are complicate objects, but α is a

simple real number! We define: ? n

g(α) := f (x + αd), (f ixed d )

R ?

Let’s try to understand. If we define g := g(α), no matter how complicate is f (x + αd).

7→

Once fixed d, [g : We’re able to play with this function! It is a simple function!

R R].

? ←

[g(0) = f (x )] (important thing).

? ?

≤ ∀α ∈

g(0) = f (x ) f (x + αd) = g(α) )

B̄(0,

86

?

Remember we’re assuming x is LOC. MIN for f =⇒ 0 is loc. MIN for the scalar real

function g! (α = 0 is LOC. MIN. for g! Let’s analyze the function g! Taylor expansion of

R)

g: 0

g(α) = g(0) + g (0)α + o(α)

∂g(α)

0 ? | . What is o(α)? It is defined such that:

where g (α ) = ?

α=α

∂α o(α)

| |

[ lim = 0]

α

α→0

0 0

If 0 is local minimum of g(α), then g (0) must be equal to 0 =⇒ [g (0) = 0].

0 6

Dimostrazione. Suppose by contradiction that g (0) = 0. Let’s write explicitly this limit here:

o(α)

∀r ∃δ | ∀|α| | |

> 0, > 0 < δ =⇒ <r

α

|o(α)|

which means that: < r|α|. Let’s write: 0

g(α) < g(0) + g (0)α + r|α|

0 0

− |g

and therefore, g(α) g(0) = g (0)α + o(α). Take r < (0)| =⇒

0 0 0 0

− |g |g

g(α) g(0) < g (0)α + (0)|α = g (0)α + (0)α|

?

∀α ∈

So, this must holds in a neighborhood of x (∀α )). Then we can take:

B̄(0,

0

−sign(g Then:

[sign(α) = (0))].

0 0

|g ∧ − ⇐⇒

(0)α| > 0 g (0) < α =⇒ g(α) g(0) < 0 g(0) > g(α)

6

(CONTRADICTION! Because g(α) < g(0). Nella realtà, g(0) era supposto essere il mini-

mo). 0

The condition we’ve got is: g (0) = 0. So now let’s go back to our f ! Let’s compute what

is this: ?

∂g(α) ∂f (x + αd)

0

g (α) = =

∂α ∂α >

?

∇f

How do we compute this derivative? CHAIN RULE! (. . . ) = (x + αd) . In particular,

0 >

? n

∇f ∈

g (0) = (x ) d = 0 (∀d )! So we get

R >

? ?

∇f ⇐⇒

(x ) d = 0 [∇f (x ) = 0] 0 ?

∇f

Again, a very important step! Recognize g(α) and then compute g (0) = 0 =⇒ (x ) = 0,

Then we’re able to characterize the first order necessary condition for optimality:

? ?

∇f

F.N.C.: (x ) = 0. Why first order? Used only first derivative of f . If x is LOC. MIN,

?

{x }

then this condition must hold. If I take a bunch of points such that FNC holds, then those

?

{x }

points are called STATIONARY POINTS. Clearly, we’ve assumed that f is differentiable

1 ?

⇐⇒ ∈ ∇f

(f C ) to do FNC test: (x ) = 0.

Theorem 28. SECOND ORDER (NEC. CONDITION) (of optimality)

Let’s approximate: 1 00

0 2 2

g (0)α + o(α )

g(α) = g(0) + g (0)α + 2

2

o(α ) 00 ≥

[lim = 0]. So now what is this NEC. COND for optimality? It is: g (0) 0.

α→0 2

α 87

Let’s see why this is true again: ?

Dimostrazione. We know how to play. We’re assuming 0 is LOC. MIN for g(α) and x is LOC.

0

MIN of f (x). It does satisfy g (0) = 0: 1 00 2 2

− g (0)α + o(α )

g(α) g(0) = 2

Use basically the same argument. Suppose:

2 2 2

∀r ∃δ | ∀α |o(α

> 0 > 0 < δ =⇒ )| < rα

00 00

1 2

In fact, g(α) g(0) < ( g (0) + r)α . And now, assuming by contradiction that: [g (0) < 0],

2

00

1 |g

then take: r < (0)|, we get a negative RHS member =⇒

2 − ∀α ∈

g(α) g(0) < 0 δ)

B̄(0,

6

Then, violating the Hypothesis! g(α) < g(0)! Because g(0) is supposed to be LOC. MIN for

g(α).

Let’s go back to: n ?

∂f (x + αd)

> X

0 ?

∇f d

g (α) = (x + αd) d = i

∂x

i

i=1

00

How do we compute g (α)? n n 2 ?

∂ f (x + αd)

X X

00

[g (α) = d d ]

i j

∂x ∂x

i j

j=1 i=1

, and in particular: 00 > 2 ?

2 ? ≥ ≥

∇ 0 =⇒ (∇ f (x ) 0)

g (0) = d f (x )d

2 ?

∇ ⇐⇒

Just what does it mean? That f (x ) must be POSITIVE SEMI-DEFINITE SNC

(Second order derivative’s informations). 2

⇐⇒ ∈

The action must be positive semidefinite. At least f differentiable two times (f C ).

Now these are necessary conditions. If we compute all the points that satisfy these conditions,

we find loc. min among them.

Theorem 29. SECOND ORDER SUFFICIENT CONDITION OF OPTIMALITY

Let’s compute again g(α), by using f this time: 1

> >

? ? ? 2 ? 2 2

f (x + αd) = f (x ) + (∇f (x ) dα = 0) + d f (x )dα + o(α )

2

Clearly if we want to find SSC, clearly we have to satisfy the (-)NC’s.

> >

? ?

∇f ∇f

(x ) dα = 0 =⇒ (x ) d = 0

, we want: 1 >

? ? 2 ? 2 2

− d f (x )dα + o(α )

f (x + αd) f (x ) = 2

to be non-negative. Clearly if:

> 2 ? 2 ?

∇ ≥ ∀d 6 ∇

d f (x )d 0 = 0 =⇒ f (x ) > 0

is (POSITIVE DEFINITE). 88

If the action is positive definite, we want to prove LOC. MIN.

2

o(α )

Dimostrazione. Use: (lim = 0). Repeat the same argument as before. It’s very similar:

α→0 2

α )

2 ? ? ?

∇ − ∀ ∀α ∈

f (x ) > 0 =⇒ f (x + αd) f (x ) > 0 but notice that f ixed d, δ ), we can find

B̄(0, d

? ∀x 6

some δ := δ(d) such that this condition holds. The STRICT condition was f (x ) < f (x) =

d

? ∈ 6

x , x ). This is not that condition (δ(d) = ). We can simply take, without loss of

B̄(0, 2

| kdk ⇐⇒

generality, d = 1. Suppose we are in (n = 2). I can use α to scale the d vector! So

R

kdk | kdk {d}

by taking = 1, and scale with α (restrict d = 1). Then, the set of such constructed,

n ? ?

{d ∈ | kdk ∃δ | ≥

= 1} is compact and then by using continuity argument, (δ δ ). We can

R d

?

find a minimum δ (δ ), and therefore I can write:

? ? ?

− ∀α ∈

f (x + αd) f (x ) > 0 δ )

B̄(0,

∀x ∀d.

which is equivalent that this inequality holds

This condition may seem obvious (Weierstrass theorem). δ depends continuously on the

direction d. The important keypoint is that we we’re able to use Weierstrass theorem. We’ll

see that in optimal control there can be spaces in which this condition doesn’t hold.

? 2 ? ?

∧ ∇

For F.D., if [∇f (x ) = 0 f (x ) > 0] =⇒ x is a LOCAL MINIMUM (STRICT)

(Condizione Sufficiente) (SSC). Again we can prove this condition by using the trick of the g(α)

function plus the fact of the existence of δ minimum wrt d.

d

Questo fatto non vale per spazi INFINITO-DIMENSIONALI. Dovremo assumere qualcosa

di più forte.

NECESSARY CONDITIONS FOR CONSTRAINED (CONVEX) PROBLEMS

Remember we’ve given these conditions for unconstrained problem. This was somehow

0 0

n

crucial. For unconstrained g (0) = 0 for some fixed (d ). And then we have: g (0) = 0 =⇒

R

> ¯

? n n n

∇f ∈ ∈ ∈

(x ) d = 0 for some fixed (d ). The trick was take ANY d . For some d

R R R

MEANS: > ¯

• ?

∇f (x ) d = 0;

> ¯

• ?

∇f (x ) (−

d) = 0;

In constrained problem. . . our X is a closed set and has a boundary. It is NOT true that

¯ ¯

we can take d, (−

d)! It is not true anymore we can take the opposite direction. How do

we write the necessary conditions in this case? The point is that we can take only (ONLY)

n

FEASIBLE DIRECTIONS (d ). Not all d’s! Not all the directions are allowed. In general

R

n ?

∈ →

(d ), but some of these aren’t allowed. Do it as an exercise Show that x LOC. MIN

R >

? n

∇f ≥ ∀d ∈

=⇒ (x ) d 0 F EASIBLE (d ). I’m not allowing to take every d, but some

R

vectors d that are feasible.

Similarly, d feasible if ?

∃ᾱ ∈ | ∈ ∀α ∈

(x + αd) X ᾱ)

R B̄(0,

SNC basically looks the same: >

> 2 ? n

∇ ≥ ∀d ∈

[d f (x ) d 0] F EASIBLE (d )

R

Theorem 30. SNC for Constrained Problems

> >

> 2 ? n ?

∇ ≥ ∀d ∈ | ∇f

[d f (x ) d 0] F EASIBLE (d ) (x ) d = 0

R

89

We’ll not use these conditions. We’ll use it in some particular problem. But this characterizes

local minima =⇒ iterative algorithms for constrained problems. In fact, the case in which

? ∈

these conditions are really useful is the case of convex problems. (x = x + ᾱd) X, I’ll be

? ∈ ∀α ∈ ∀

sure (x + αd) X ᾱ); therefore this gives me some very useful informations. some

B̄(0,

small α it belongs to that set.

Let’s go back to our problem. (

f convex

min f (x), where : X convex set

x∈X

Minimizing a convex function over a convex set, and it can be stated: If X is convex, then by

using the convexity of the function f , we can demonstrate that, if it exists, every global minimum

?

is a local minimum. It means that If I apply: [∇f (x ) = 0], it can be shown that this condition,

the NC one, is also a sufficient condition for optimality. Convex problems are very special

?

∃!

problems. If strictly convex, x and it is global!

OPTIMIZATION OVER CLOSED (CONVEX) SET

n

x or at least locally. min f (x) . . . then we can treat the constrained version. We

R x∈X

cannot consider both directions. Necessary CONDITION for optimality:

>

?

∇f ≥ ∀

(x ) d 0 F EASIBLE DIR. d

FEASIBLE DIR d. means: ?

∃ᾱ | ∈ ∀α ∈

x + αd X [0, ᾱ]

We’re almost considering x to be in a convex set, at least locally. In some sense, a convex

? 2

structure around locally in x . Remark: notice that: if X is closed, (Supposed to be in ,

R

3

ex. A surface in ), it may happen that these conditions are meaningless for a set like this

R

=⇒ we’ll give some meaningful condition like this (Every direction we choose it will don’t

correspond to a feasible solution). 0

g(α) = g(0) + g (0)α + o(α)

∈ ≥ ⇐=

where (α In this case if we consider ONLY feasible dir, we have to get: α 0

R)

0 0

≥ ≥ ⇐⇒

g (0) 0. Even in the unconstrained case, we may claim the following as FNC: g (0) 0

> ¯ ¯

?

∇f ≥ ∀d.

(x ) d 0 Because in the UNCONSTRAINED, by taking d, (−

d), it must hold:

> ¯

( ?

∇f (x ) d = 0 ?

⇐⇒ (∇f (x ) = 0)

> ¯

?

∇f (x ) (−

d) =0

∇(·)

So it is true only for = 0. >

?

∇f ≥

(x ) d 0

IS MORE GENERAL! Modo alternativo per ridimostrare quello che abbiamo fatto prece-

dentemente. Quindi senza ledere di generalità possiamo scegliere: Let’s suppose to consider:

Even the X is (locally) convex, [α 0]. f may not be convex. In that case the problem will be

not convex. 90

CONVEX PROBLEMS

min f (x), where as REMARK: X convex means that:

x∈X ∀x, ∈ − ∈ ∈

y X αx + (1 α)y X, α [0, 1] ⇐⇒

In our case, f is a convex function and X is a convex set. REMARK: f convex function

∀x, − ≤ − ∈

y [f (αx + (1 α)y) αf (x) + (1 α)f (y)] α [0, 1]

⇐⇒

But there’s a more simple way: a function f is convex the epigraph of the function is

convex. If strict version holds, the function is said to be strictly CONVEX. Then with X, f

convex the problem is CONVEX =⇒ For a convex problem:

Proposition 4. For a Convex Problem, Local Minima are also global.

Dimostrazione. Show that since the convexity property must hold for any point, we can suppose

? ∃

by contradiction that x is a local minimum but not a global minimum. If then, there >

? ? ?

| ∀x ∈ ≤ ∃x̄ |

0 , ) f (x ) f (x). Let’s suppose that this is not global. There (f (x̄) < f (x )).

B̄(x

If the function is convex, it must hold that:

? ?

− ≤ −

f (αx + (1 α)x̄) αf (x ) + (1 α)f (x̄) = (. . . )

?

, but since f (x̄) < f (x ) =⇒ ? ? ?

≤ −

(. . . ) αf (x ) + (1 α)f (x ) = f (x )

Since α [0, 1] =⇒ then it contradicts the local optimality. those = points arbitrarily close

? ?

| ≤

to x f (those) f (x ).

That’s an appealing property. Another one is that FNC is FSC.

Theorem 31. It can be shown that if f is convex then we have that:

>

≥ ∇f −

f (z) f (x) + (x) (z x)

?

Thus, suppose if for some x ? ?

≥ ∀z ∈

[∇f (x ) = 0] =⇒ f (z) f (x ) X

?

∇f

So, (x ) = 0 is not only necessary but it is also a SUFFICIENT CONDITION OF OPTI-

MALITY.

If we’re able to find stationary points then these points are global minimum. This is a

sufficient condition. (PER PROBLEMI CONVESSI!).

EXISTENCE OF MINIMA

We’re characterized minima. But do there exist minima for this problem? min f (x).

x∈X

Under what conditions (SUFFICIENT CONDITIONS) those minima exist?

• f is continuous;

• X is COMPACT;

And this is the Weierstrass theorem Hypothesis. Here we’re assuming the continuity for the

f . But we require also the fact that X has to be closed and limited. Another sufficient conditions

⇐⇒

are that f is again continous and coercive (radial unbounded function) (lim f (x) =

kxk→∞

∞), and X is closed. Of course, if not satisfied, it may still exist a global minimum! These are

only sufficient conditions. 91

4.2 ITERATIVE ALGORITHMS

Suppose we give sufficient conditions and at least a minimum exists. Then the question is:

How can I compute it? The way to do is to generate an (iterative) algorithm. We want to

? ?

{x } | →

generate a sequence of points x x , with x a (local/global) minimum. A first class

k k

of algorithms we can analyze are called:

4.2.1 DESCENT ALGORITHMS

The idea is the following: we need to converge to some local minimum. We may start with

a point and construct a new point in which the f evaluated at that next point is less than the

{x } |

previous one. DESCENT ALGORITHMS: Design f (x ) < f (x ) k = 0, . . . . Suppose

k k+1 k

we are in some point x . Let’s construct x .

k k+1

[x = x + αd ]

k+1 k k

I’m not saying that designing a descent algorithm will be also a minimum-convergent algorithm.

But let’s prove the descent property.

Dimostrazione. Suppose the function at least differentiable:

>

f (x ) = f (x ) + α∇f (x ) d + o(α)

k+1 k k k

(As we’ve already done). Remember that o(α) is an HIGHER-ORDER TERM (HOT):

o(α)

lim = 0 =⇒

α→0 α ∀r ∃δ | ∀|α

> 0 > 0 > 0| < δ =⇒ o(α) < rα

>

?

| ∇f

Take d (x ) d < 0. If I take d satisfying this condition, for α 1 sufficiently small

k k k >

?

− |∇f |

=⇒ f (x ) f (x ) < 0. Take r < (x ) d =⇒

k+1 k k

>

?

α(∇f (x ) d + r) < 0

k >

?

|

x = x + αd [∇f (x ) d < 0]

k+1 k k k

At least, I’m descending! >

?

∇f

Look at this condition: (x ) d < 0. This could be written such that this property is

k

−∇f

the most negative possible: d = (x ) =⇒

k k −

x = x α∇f (x )

k+1 k k

(STEEPEST DESCENT ALGORITHM). Again we don’t know about the convergence. We can

do something more general. We can create sequence of points:

>

∧ ∇f

x = x + α d (x ) d < 0

k+1 k k k k k −

with a > 0, d such that I can take the STEEPEST DESCENT: x = x α∇f (x ).

k k k+1 k k

92

GRADIENT METHODS

Let’s see a class of algorithms called ”Gradient Methods”.

− ∇f

x = x α D (x )

k+1 k k k k

n×n

D > 0 (positive definite).

R

k • D = I =⇒ steepest descent;

k

• ⇐⇒ ≤ ≤

D uniformly bounded δ I D δ I. (≤ in the sense of pos. semidefinite);

1 2

k k

• We can choose something about second order approximation of the function. Let’s consider

a quadratic approximation of the function: 1

> >

2 − −

∇f − (α ) (x x ) D (x x )

f (x) = f (x ) + α (x ) (x x ) +

k k k k k k k k

2

Let’s suppose to freeze the situation. Let’s take:

− ∇f

[x = x α D (x )]

k+1 k k k k

1 −

[∇f (x ) + α D (x x ) = 0]

k k k k − ∇f

To minimize the quadratic approximation of the function, x̄ = x α D (x ) soddisfa

k k k k

1

2

questa con: D = f (x ) and in this case we have:

k k 2 −

1

− ∇ ∇f

[x = x α f (x ) (x )]

k+1 k k k k

Let’s suppose f differentiable 2 times. Se D NON È INVERTIBILE questa scelta non si

può fare:

• Newton method α := 1.

k 2 −

1

x = x (α = 1)(∇ f (x ) )∇f (x )

k+1 k k k k

If we start sufficiently close to the minimum, it will converge to the minimum in a fast

way! There are other ways to choose D !

k

• −

1

2

D = f (x ) ;

0

k

We have our descent algorithm, a gradient algorithm. We need to choose D in a such way.

k

How do we choose a ? How do select STEPSIZE?

k

STEPSIZE SELECTION

The first way is the so called:

• | ≥

MINIMIZATION RULE We choose α f (x + αd ) := min f (x + αd ), (α 0),

k k k k k

α∈[0,s]

where: s = +∞ s = s̄ > 0 constant; In this case (s = s̄ < +∞) =⇒ LIMITED

MINIMIZATION RULE. (s = +∞) is somehow consuming; Really very important: f :

n 7→ For this positive f could be also more complex function in an infinite-dimensional

R R. 2

space. And even d . If I fix x , d , then I can make a picture of what is called g(α) .

R

k k k

Get α such we get the minimum value. I descend as much as possible: Descending:

– A) choose direction; 93

– B) choose stepsize;

– 1) A B possible workflow;

– 2) The 1) is not the best from a computational point of view.

Another rule is the so-called:

• ∈ ∈

ARMIJO RULE Choose some β (0, 1), σ (0, 1), and then choose some s > 0. Three

parameters (experience, heuristical way). Typically = 0.7, σ = 0.5, s = 1}. We choose

m ∗

[α = β s], and m is the first non-negative integer such that the following holds:

k k

k >

m m

− ≥ −σβ

[f (x ) f (x + β sd ) s∇f (x ) d ]

k k

k k k k k

This condition is called a SUFFICIENT DECREASE CONDITION. We’ll see that Armijo

rule is a way to demonstrate convergence (Enough descentness). For example, let’s try to

3 3

evaluate the function with α = 1, m = 0, s = 1. f (x + αd ) with m = 3, α = β s = β .

k k k

If σ = 1, I’m taking the tangent. Per la retta è fissata la pendenza! Una volta scelto x , d ,

k k

7→

mi resta un grafico bidimensionale! g(α) : Utile per il debugging dell’algoritmo

R R.

del gradiente. In casi patologici, la retta potrebbe non essere ben definita;

• |α|

CONSTANT STEPSIZE a = α > 0. For example, for a gradient method, if < 1

k

and strictly positive, there is convergence. So we get a constant stepsize. In distributed

optimization it is useful;

• DIMINISHING STEPSIZE +∞

X ∞

a =

[ lim α = 0], k

k

k→∞ k=1

|a |

Somehow we want a not summable series. If < 1 asymptotically we could get near

k

the minimum.

When a function has no gradient, α is chosen DIMINISHING. (SUBGRADIENT ME-

n n

∈ ≈

THODS). What happens to min f (x) In CONSTRAINED case? (x or locally X ),

R R

x∈X

where:

• n

6 ∅) ⊂

(X = CLOSED CONVEX;

R

• 1

f C .

How do we choose the D ?

k ⇐⇒

IDEA: Generate a sequence of feasible points, where d is a feasible direction (x +

k k

∈ |α|

αd X), 1.

k x = x + α d

k+1 k k k

>

[∇f (x ) d < 0]

k k

{α | ∈

I need to choose: > 0 x + α d X}. This is our possibility to iterate feasible points.

k k k k

There are algorithms that iterate on unfeasible points. Osservazione:

0 ≥ ⇐= ≥

g(α) = g(0) + (g (0) 0 α 0)α + o(α) 0 ≥ ⇐⇒

Posso prendere una certa direzione ed andare sempre in quella direzione. g (0) 0

(Non può essere negativa). (L’informazione sul segno è contenuta in d); se l’osservazione è che

94 ?

∃x ≤

abbiamo un problema di minimizzazione min f (x), global minimum =⇒ f (x )

x∈X k

n n

∀(((x ∈ ⊂ ⇐⇒ ∈

f (x) X) )) x ). (caso unconstrained).

R R

{x } | f (x ) > f (x ) > . . .

0 1

k {x } {f

(Algoritmo di discesa). La mia successione degli è tale che (x )} è decrescente. Limitata

k k

inferiormente. Convergerà. Quello che può andare storto è che converga non al minimo ma

ad un’altra parte. Ci sono algoritmi non di discesa che in un bacino all’interno del minimo

convergeranno; diversamente addirittura divergono (metodo di Newton puro); quindi un metodo

di discesa valido NON può divergere od oscillare. Eventuali problemi intrinseci. Nel caso NON

vincolato, abbiamo visto che ci sono condizioni non subito semplici, ma che ci danno delle

informazioni.

Gradient Method, Descent Method. We can do something similar for constrained problems:

6 ∅)

min f (x), (X =

x∈X ?

where X closed (convex) set. Convex locally (in some neighborhood of x ). We’re somehow

| ∈

generating the descent method. d must be a FEASIBLE DIRECTION =⇒ d (x + αd )

k k k k

>

|α| ∇f

X. For 1 sufficiently small, (x ) d < 0 =⇒ d must be a descent direction.

k k k

Construct this sequence: [x = x + α d ]

k+1 k k k

What about STEPSIZE SELECTION RULE? We can use bother the (limited) MINIMIZA-

TION RULE or even the CONSTANT STEPSIZE. Depending on the rule we use there are

results for convergence. Choose a possible descent direction. We can do something similar for

the constrained case. These are several possibilities. We just focus on a method called PRO-

JECTED GRADIENT. This is the most known version =⇒ (GRADIENT PROJECTION).

Let’s start by defining a projection: 2

kz −

min xk

x∈X 2

∈ kz −

Notice that the domain variable is (x X) in convex set. Our function, xk is a strictly

convex function (quadratic function). Then, X closed convex set and we take it to be closed

and convex. So, the minimum and this is unique =⇒ unique global minimum. We call this

unique minimum the projection of z over the set X =⇒ 2

kz −

P (z) = arg min xk

x x∈X

We call it the projection of z over the set X. If the set X is a polytope, Then it is the

orthogonal projection =⇒ (intersection of linear inequalities). The important point is that:

∃! P (z). Define, get a feasible direction for our method:

x − ∇f

x = P (x α (x ))

x

k+1 k k k

This is called PROJECTED GRADIENT METHOD. And we can apply this method to our

minimization problem.

CONVERGENCE (UNCONSTRAINED)

We can state: let’s give the notion of GRADIENT RELATED DIRECTION. d is a GRD

k

>

kd k ∧ ∇f

if: < +∞ [lim sup (x ) d < 0]; for example, if we’re in this situation: dk =

k k k

k→∞

−D ∇f ≤ ≤

(x ), but D must satisfy: [δ I D δ I]. (Meant in term of positive semidefinite).

2

1

k k k k

(Questo è il caso unconstrained). The result is the following theorem:

95

Theorem 32. (UNCONSTRAINED)

{x }. {x }

Let be a sequence generated as: [x = x +α d ], with d gradient related direction

k k k+1 k k k k

(GRD). Suppose α generated by the limited minimization rule or the ARMIJO rule. Then the

k {x }

result is the following: Every limit point of is stationary.

k

We’re able to prove that this method converges to a stationary point. Good news: we don’t

need to assume any convexity of the function f . A some regularity is required. Bad news:

we’re converging to a stationary point. Are we really converging to a global minimum? It is

applicable for non-convex problem. One looks for local minima.

CONVERGENCE (CONSTRAINED CASE)

The difference in the constrained case is that. . . d must be a feasible direction.

k

Theorem 33. (constrained)

{x }

Let be generated as: [x = x + α d ], with d GRD and a feasible direction. Suppose

k k+1 k k k k {x }

α is generated with the limited minimization rule or by the Armijo rule. Then converges

k k

to a stationary point {x }

(Exactly the same as before). REWR := Every limit point of is stationary (satisfying

k

the FNC in the constrained case).

Classe di algoritmi applicabili in questi casi. Special class of constrained problems.

4.2.2 EQUALITY CONSTRAINED OPTIMIZATION

We want to consider: n

⊂ |

min f (x), (X ) h (x) = 0, . . . , h (x) = 0

R 1 m

n

x∈R

Our set X is the set defined by these (in)equalities. So, since we’re considering equality con-

straints, our set X is closed. But unless the constraint functions h (x), h (x) are NON linear, the

1 n

n

k 7→ ∈ {1,

set X isn’t convex. The only way for X to be convex is that h h : i . . . , m}.

R R,

i

In case we have a general set. At least graphically there could be no way to find a useful

feasible direction. We require meaningful requests. UNCONSTRAINED: We go back to un-

? ?

constrained case. x is a local minimizer. Consider x + αd for some fixed direction. Trying to

?

reach x with some sort of update. Let’s use a more general way. General curves that traverse

?

x . n

7→

x := x(α) : R R

Let’s consider a function like this, and as before let’s consider: [g(α) := f (x(α))]. Now as

before, let’s write: 0

[g(α) = g(0) + g (0)α + o(α)]

(Taylor expansion) and let’s see if we can play the same game we’ve play in the unconstrained

? ? ∀α ∈

case. Notice that g(0) = f (x(0)) = f (x ). Suppose x = x(0) and x(α) satisfies:

R

0

h (x(α)) = 0, . . . , h (x(α)) = 0. Let’s write g (α) by using CHAIN RULE:

1 n >

0 0

∇f

g (α) = (x(α)) x (α) >

0 0 0

n 1 ?

7→ ∈ ∃x ∇f

Let’s assume that (x : ) C =⇒ (α) and it is continuous. g (0) = (x ) x (0);

R R

The condition of optimality (FNC) is: > 0

?

[∇f (x ) x (0) = 0]

96 6 ∈

By using this parametrization, once we have x(α) we can choose (α = 0) Esistono dei

R.

risultati che sotto condizioni di regolarità delle h c’è la convergenza. We can expand x using a

i

Taylor expansion: 0 0

?

[x(α) = x(0) + x (0)α + o(α)] = x + x (0)α + o(α) 0

? 1 ?

∈ ⇐⇒ ≈

[linear approximation of our function at x . (x C ) (Taylorizable). Quindi x +x (0)α

?

x(α) in a neighborhood of x . It means that if I take my nonlinear function x(α) that passes

0

?

through x then I’m approximating x locally by its linear approximation. Vector x (0) gives me

a direction tangent to my curve, and since this curve belongs to the surface, I’m considering

vectors tangents in my curve. > 0

? n 1

∇f ∀x 7→ ∈

(x ) x (0) = 0 : (x C )

R R

Is this characterization useful? We’re defined this surface in terms of our equality constraints.

0 ?

(x (0) tangente alla superficie!) x è un vettore.

0

? ∈

[x + x (0)α] T X

?

x

0 ∈

Prendo tutti i punti appartenenti a quella superficie (x = d ) T X. Partendo da un minimo

?

x

k

locale, stiamo derivando le NC. ∈

Let’s go back to the constraints: x(α) X for α in a neighborhood of 0. It means that:

[h (x(α)) = 0, . . . , h (x(α)) = 0]

1 n

1

At least locally. Remember (x C ). Some regularity for h ’s (h continuously differentiable)

i i

1

⇐⇒ ∈

(h C ). Differentiate it:

i dh (x(α))

i

[ = 0]

⇐⇒ ∀α∀i ∈ {1,

h = constant . . . , m}. Let’s compute this derivative. Chain RULE := for

i

differentiation =⇒ > 0

∇h (x(α)) x (α) = 0

i

, and in particular we get that the following holds:

> 0

?

[∇h (x(0) = x ) x (0) = 0]

i

∀i ∈ {1, . . . , m}. We’re descending the tangent space. The condition that we have is that:

> 0

?

∇f ∈ ∈

(x ) d = 0, (x T X); for (d T X). So this condition could be rewritten as:

? ?

x x

> >

?

?

∇f ∀d | ∇h

(x ) d = 0 (x ) d = 0

i

RECAP: The condition is now: > 0

? n 1

∇f ∀x 7→ ∈

(x ) x (0) = 0 : , (x C )

R R

?

∇f (x ) perpendicular to T X.

?

x >

?

∇f ∀(d ∈

(x ) d = 0 T X)

?

x

This is the condition we’ve got. How to rewrite it in a better way?

Definition 45. REGULAR POINT

? ?

∇h ∈ {1,

x is a regular point if the (x ), i . . . , m} are linearly independent.

i 97

This is a technical assumption that allows us to avoid some general case. This gradient, in

?

x is a basis for that tangent space. Let’s then proceed this way:

? ? ?

Proposition 5. Let x be a local minimum, with x regular. Then the gradient of f at x is a

linear combination of:

? ? ?

∇h ∈ {1, ⇐⇒ ∇f ∈ ∈ {1,

(x ), i . . . , m} (x ) span∇h (x ), i . . . , m}

i i

? ?

∇f

The claim is that if x is a regular local minimum, (x ) belongs to that span. Let’s prove

it, and let’s prove it by contradiction:

Dimostrazione. Suppose by contradiction that this is NOT true, =⇒

?

∇f } ∈

(x ) = LIN COM B{∇h + (somevector / LIN COM B) =⇒

i m

X

? ?i ?

∇f − ∇h

=⇒ (x ) = d λ (x )

i

i=1

>

?

∇h ∀i ∈ {1, →

, with (x ) d = 0 . . . , m}. and here I’m simply taking this because I have

i ?

⊥ ∇h

a linear combination of these gradient and another vector [d (x )]. OK. Let’s see that

i

this is impossible! Remember what the FNC of optimality was saying:

? ?

∇f ⊥ ∇h

(x ) (x )

i >

m

> ?

? ?

? P ⊥ ∇h

∇f − Because d (x ) =⇒

λ (∇h (x ) d = 0).

Let’s try to compute (x )d = d d i i

i

i=1

> >

>

? 2 ?

∇f kdk ∀d 6

(x ) d = (. . . ) = d d = > 0 = 0, but this is a contradiction because: [∇f (x ) d =

>

?

∀d | ∇h ∈ {1,

0] (x ) d = 0, i . . . , m}. The only way for that to be true is that:

i ? ?

∇f ∈ ∈ {1,

(x ) span∇h (x ), i . . . , m}

i

In other words, m

X

? ?i ?

∇f ∇h

(x ) + λ (x ) = 0

i

i=1 ?i ∈

and this is a nice NC of optimality. It requires to find (λ such that this condition is

R) ?

satisfied. Let’s remember that this condition must be combined with the fact that: h (x ) =

i

∀i ∈ {1,

0 . . . , m}.

Let’s see if these conditions make sense. FNC of optimality.

FIRST ORDER NECESSARY CONDITION OF OPTIMALITY

? ?m

{λ }

Those , . . . , λ are called LAGRANGE MULTIPLIERS. We need to find this m real

1

?i ∈ ∃λ |

such that this condition holds. LM if it holds. We’ll see that there’s

values, λ R i

another way to reinterpret this FNC which we’ll lead to a natural way to compute it. Class of

algorithms. Let’s introduce the function L(x, λ): m

X

[L(x, λ) = f (x) + λ h(x)]

i

i=1

what we’re doing? I’ve my constrained problem. Suppose that I allow myself to violate this

?

constraints h (x ) = 0. There’s a cost proportional to this violation. If I take feasible points,

i 98

I get [L(x, λ) = f (x) + 0]. L(x, λ) = (AU GM EN T ED) COST LAGRAN GIAN . Let’s see

basically why they’re called LM. Suppose we forget to start at the original problem. We want to

n ? m

∈ ∈

find stationary points of this L(x, λ). x . I’m considering λ . Compute the gradient

R R

of L(x, λ) and set it equal to 0. Let’s take: m

 X

? ? ? ?i ?

∇L(x ∇f ∇h

 , λ ) = (x ) + λ (x ) = 0

 i

 i=1

 ?

  h (x )=0

 1

 

 ?

 h (x ) = 0

 2

∂(. . . ) 

 = 0 =⇒

 ..

 ∂λ

 .

 

 

 

 

 ?

h (x ) = 0

 m

{n

Now let’s take those + m} = #EQU AT ION S = #V ARIABLES. Somehow this condi-

tion is well-posed. Le m variabili sarebbero i λ , che al valore ottimo sono chiamati moltiplicatori

i

di Lagrange. (OPTIMAL LAGRANGE MULTIPLIERS). AON Different conventions for their

names. λ ADJOINT VARIABLES. All’ottimo sono chiamati LM.

i ?

Just a remark: notice that here we’ve shown that if x is a stationary point for the original

?i ?

problem and λ are LM, then it’s quite straightforward. It can be shown that x is stationary,

? m ? ?

∈ ⇐⇒ ∇L(x ∇

with associated Lagrange multipliers vector λ , λ ) = 0. is meant in term

R →

of two variables x, λ. The converse is more complex but it is also true We can start from

? m

finding λ .

R

SECOND ORDER NECESSARY CONDITION OF OPTIMALITY

? ? Then let’s see how

Let x be a local minimum of the constrained problem and x regular.

to compute SNC. L(x, λ). If we take the gradient of L(x, λ),

m

X

∇f ∇h

[∇L(x, λ) = (x) + λ (x)]

i i

i=1

We derived the FNC simply setting this gradient equal to 0. Temptation now is: let’s compute

2

∇ L(x, λ) and set it positive definite. m

X

2 2 2

∇ ∇ ∇

L(x, λ) = f (x) + λ h (x)

i i

i=1

And can be proven that: SNC:

2 ? ? 2

∇ ∇ ≥

L(x , λ ) = L(x, λ)| 0

? ?

x=x ,λ=λ >

?

∀d | ∇h ∈ {1,

is positive semidefinite, but not for all vector, but: (x ) d = 0, i . . . , m}. In

i

≥ ⇐⇒ ∈

other words it must be 0 for the tangent space. Only for those d’s. (d T X).

?

x

SNC: >

> 2 ? ? ?

∇ ≥ ∀d | ∇h ∈ {1,

d L(x , λ )d 0 (x ) d = 0, i . . . , m}

i

Similarly we can derive SNC: 99

SECOND-ORDER SUFFICIENT CONDITIONS OF OPTIMALITY

? ?

Suppose (x , λ ) satisfies FNC of optimality:

m

X

? ?i ?

∇h

[∇f (x ) + λ (x ) = 0]

i

i=1

? ?

∈ {1, ⇐⇒ Here we don’t need to assume the

, h (x ) = 0, i . . . , m} x is a feasible point.

i ?

regularity of x . We need: >

> 2 ? ? ?

∇ ∀d | ∇h ∈ {1,

d L(x , λ )d > 0 (x ) d = 0, i . . . , m}

i

?

Then =⇒ x is a strict local constrained minimum of f subj. to: h (x) = 0, or let’s rewrite

i

∈ {1,

f (x) subj. to: h (x) = 0, i . . . , m}.

of min n i

x∈R

If I’m able to verify (FNC && SNC), then it holds. Caratterizzare i minimi nel caso

vincolato. Molto più esplicita ed operativa.

UNCONSTRAINED (UNCONSTRAINED CONSTANT STEPSIZE)

{x }

Let be a sequence generated by: x = x + α d , with d gradient related direction

k k+1 k k k k n

∇f ⇐⇒ k∇f − ∇f ≤ − ∀x, ∈

(GRD). Suppose is Lipschitz ( (x) (y)k Lkx yk y ), and:

R

>

k

|∇f |

(x ) d

k

≤ ≤ − ]

[ α (2 )

k 2

k

Lkd

k

{x }

and > 0 fixed. Then every limit point of is stationary. (Preso uno stepsize costante), o

k

variabile ma che soddisfi la condizione di doppia disuguaglianza di cui sopra. Questi teoremi

sono importanti perché non stiamo chiedendo nulla sulla convessità della funzione. Solo una

sua certa regolarità.

Problema immediato. È possibile scrivere le condizioni del primo ordine in termini del

Lagrangiano L. Porre uguale a 0 il gradiente del lagrangiano rispetto a x e λ. Per i vin-

coli di disuguaglianza quando abbiamo un punto di minimo i vincoli possono essere attivi o

meno. Quando non saranno attivi potremmo considerarli inesistenti. Quelli che giocheran-

no un ruolo sono quelli attivi. Let’s recall what we’ve done for (EQUALITY CONSTRAINT

OPTIMIZATION): min f (x)

n

x∈R

subj. to:  h (x) = 0

1

 .. ⇐⇒ h(x) = 0

.

h (x) = 0

 m n

where h(x) is a vectorial function (a field in ).

R m

X

L(x, λ) = f (x) + λ h (x)

i i

i=1 ?

and we can write the NC in terms of Lagrangian. FNC: If x is a regular local minimum

? ?m

∃{λ }

(minimizer) then =⇒ , . . . , λ called Lagrange Multipliers such that:

1 m

 X

? ?i ?

( ? ? ∇f ∇h

(x ) + λ (x ) = 0

∇ 

L(x , λ ) = 0 i

x =⇒ i=1

? ?

∇ L(x , λ ) = 0

λ  ?

 ∀i ∈ {1,

h (x ) = 0 . . . , m}

i

100

2 ?

FNC of optimality. SNC: Suppose that f C as well as h variables. Then if x is a regular

i

local minimizer, then we get that: > 2 ? ?

∇ ≥

d L(x , λ )d 0

xx

>

?

∀d | ∇h ∈ {1,

(x ) d = 0, i . . . , m}. Basically this is:

i m

X

2 ? ?i 2 ?

∇ ≥

[∇ f (x ) + λ h (x )] 0

i

i=1

(Nel senso di semidefinita positiva). (Solo per le d che soddisfano quel vincolo). Possiamo

n

{x ∈ | ∀i ∈

∀d ⇐⇒ ∀d ∈ X, X = h (x) = 0

scrivere anche in tangent space T R

? i

x

{1, . . . , m}}.

? 2 ?

SSC: If x satisfies the FNC of optimality and if the action is positive definite (∇ f (x ) +

>

m ? 2 ? ? ?

P ∇ ∀d | ∇h

λ h (x ) > 0) (x ) d = 0 =⇒ , then x is a STRICT LOCAL MINIMIZER.

i i

i

i=1

Here we don’t require the point regularity. Very powerful theorems. Possible workflow:

• 1) Construct the Lagrangian (with additional variables λ ), and:

i

• 2) Just compute gradients;

• 3) Solve that equations (∇(. . . ) = 0);

(Proprio su questi si fonderanno gli algoritmi che andremo a sviluppare). (Run a numerical

method to solve these equations).

Now we move towards inequality constraints:

4.2.3 (INEQUALITY CONSTRAINTS)

Let’s consider the following optimization problem:

( (

h (x) = 0, . . . , h (x) = 0 h(x) = 0

1 m

min f (x), subj. to : =⇒

≤ ≤ ≤

g (x) 0, . . . , g (x) 0 g(x) 0

n

x∈R 1 r

where we have both equalities and inequalities constraints. The h , let’s remember they are:

i

n n

7→ 7→ ∧ ≤

[h g : g(x) 0 is a little bit different notation.

R R], R R

i j ( n m

7→

h : R R

n r

7→

g : R R

What does it mean that V EC < 0?  ≤

g (x) 0

 

g (x) 1

1 

... ..

≤ ⇐⇒

g(x) = 0

  .

  

g (x)  ≤

g (x) 0

r  r

Somehow we can partly think of writing that problem in a similar way as before. . . but not exact

∧ ≤

as before. Let’s define a set. Let x̄ a feasible point, meaning h(x̄) = 0 g(x̄) 0. Let’s define

the set A(x̄) = {j ∈ {1, |

A(x) = . . . , r} g (x̄) = 0}

j

In other words, the A(x̄) is the set of the indices of the active constraints on the g (indices in

the inequalities). I’m just considering the indices of the inequalities that are active. Dal punto

101

di vista delle condizioni che contano le condizioni che contano sono quelle attive (quelle che

stanno sul bordo). I significativi saranno sulla frontiera. Solo per gli inequality constraints. I

vincoli di uguaglianza sono invece sempre attivi. Clearly this set can be distinguished into two

sets:

• ∈ ≤

j A(x̄) =⇒ g(x̄) 0 ACTIVE (we’re on the boundary);

• ∈ ≤

j / A(x̄) =⇒ g(x̄) 0 INACTIVE (strict inequality);

?

Let’s suppose only inequality constraints. . . x is in the interior of the constraints. From a local

point of view, the inequality constraints play no role.

NECESSARY CONDITIONS OF OPTIMALITY

m r

X X

L(x, λ = (λ, µ)) = L(x, λ, µ) = f (x) + λ h (x) + µ g (x)

i i j j

i=1 j=1

If we have inequalities that are INACTIVE, then we can forget that inequalities. Focus on

inequalities that are ACTIVE! Let’s define again a REGULAR POINT:

Definition 46. REGULAR POINT

n ∇h ∈ {1, ∇g ∈

∈ ∧ ≤ if the (x̄), i . . . , m} (x̄), j

x̄ feasible (h(x̄) = 0 g(x̄) 0) is regular

R i j

A(x̄) are linearly independent.

Then we can write FNC of optimality:

m

X X

? ?i ? ?j ?

∇f ∇h ∇g

(x ) + λ (x ) + µ (x ) = 0

i j

?

i=1 j∈A(x )

Notice that in an equivalent way

m r

X X

? ?i ? ?j ?

∇f ∇h ∇g

(x ) + λ (x ) + µ (x ) = 0

i j

i=1 j=1

?j ? ?

∀j ∈

with µ = 0 / A(x̄). But the point is: look at λ and µ . In the pure equality constraint case,

?i

{λ } ∈ ∈ {1,

we need to find , . . . , λ such that that condition is satisfied. (λ i . . . , m}.

R)

1 m

Now we’ve got to this point again. We need to be careful, we cannot treat the problem in

exactly the same way. When we construct Lagrangian we want to penalize ourself in order to

violate the constraints. λ AMPLIFIER that amplifies the fact that we pay something because

i

we’re violating some constraints. Suppose we’re in the inequality constraints: Suppose let’s

?

treat in x as an EQUALITY CONSTRAINT! I’ve to be careful! NON c’è una stretta analogia!

Se trattiamo i vincoli di uguaglianza allora dobbiamo pagare una prezzo! A kind of violation

∇h ∇g

such that the term λ (x) doesn’t depends on the sign. But in the µ (x), the sign of

i i j j

?j ?

≥ ∀j ∈

µ 0 A(x ). ⇐⇒

Il teorema che enunciamo risponde al fatto che quando i vincoli sono attivi allora (

µ 0).

j

Theorem 34. ((KARUSH-KUHN-TUCKER) OPTIMALITY CONDITIONS)

?

Let x be a local minimum of the problem: (

h (x) = 0, . . . , h (x) = 0

1 m

f (x) subj. to :

min ≤ ≤

g (x) 0, . . . , g (x) 0

n

x∈R 1 r

102

1

∈ ∈ {1, ∈ {1,

where f, h , g C i . . . , m}, j . . . , r} (continuously differentiable). And assu-

i j

? ? ? ?m ? ? ?r

∃!

me x is regular. . . then λ = (λ , . . . , λ ), µ = (µ , . . . , µ ) unique Lagrange multipliers

1 1

such that: ? ? ?

∇ L(x , λ , µ ) = 0

x

( ? ≥ ∀j ∈ {1,

µ 0 . . . , r}

j

?j ?

∀j ∈

µ = 0 / A(x )

? ?

and in particular where A(x ) is the set of ACTIVE constraints at x . This is the FNC of

2

optimality. If in addition, f, h , g C =⇒ then holds:

i j > 2 ? ? ?

∇ ≥

d L(x , λ , µ )d 0

xx >

( ?

∇h ∈ {1,

(x ) d = 0, i . . . , m}

i

n

∀d ∈ |

R >

? ?

∇g ∈

(x ) d = 0, j A(x );

j

Basically we have to check feasibility of the point. Moreover in the theorem µ CANNOT be

j

?

negative. Similarly we have the SNC of optimality. KKT conditions asks x to be regular! The

?

theorem can be relaxed, in such a version where x can also be nonregular.

?j ? ?j ? ?j

If I take µ g (x ), this can be <> 0. µ must be depending on what g (x ) does: µ =

j j

? ?j

⇐⇒ 6 ≥

0 g (x ) = 0. Se i vincoli sono attivi, µ 0.

j ( ? ?

[µ g (x ) = 0]

j

j ∀j ∈ {1, . . . , r}

?j ≥

µ 0 ?j ? means:

(COMPLEMENTARY SLACKINGS). µ g (x ) = 0

j

• ?j ?

µ = 0, g (x ) < 0 INACTIVE;

j

• ?j ?

(∀µ 0), g (x ) = 0 ACTIVE;

j ?j ?

These are ways to make µ g (x ) = 0.

j +

?j ≥ {0,

Dimostrazione. IDEA: µ 0. We can define a function, g (x) = max g (x)}. Then

j

j

∀j ∈ {1,

similarly we can consider an approximating function . . . , r}:

α

K 2 ? 2

kh(x)k kx − k

+ ((. . . ) = 0) + x

min f (x) + 2 2

(EQUALITY CASE). In the INEQUALITY VERSION we have:

r

K X + 2 6

(. . . ) = g (x) = 0

j

2 j=1

?

∈ {x | kx − k ≤

and this is subj. to: x x ( > 0)}.

n ? k

λ = lim kh (x )

i i

k→∞

?i ?j

Role of λ and µ . It gives us a price of violation of my constraints. But tells us what

happen when we violate in particular g! To prove the constraint condition we haven’t proven

?i ?j

, µ ).

the effective ROLE of (λ r

K K α

X +

2 2 ? 2

kh(x)k kx − k

+ g (x) + x

j

2 2 2

i=1

is a quadratic penalty. ? ?

{x } ⇐⇒ {x } →

è una sequenza che converge a x x .

k k

103

SECOND ORDER SUFFICIENT CONDITION OF OPTIMALITY

> 2 ? ? ? n

∇ ∀d | ∈ |

Suppose the FNC holds and d L(x , λ , µ )d > 0 (∀d )

R

xx >

( ?

∇h ∈ {1,

(x ) d = 0, i . . . , m}

i >

? ?

∇g ∈

(x ) d = 0, j A(x );

j

?j ?

∀j ∈

and suppose µ > 0 A(x ). Somehow the fact that the µ ’s of the ACTIVES constraint

j ? of

are positive gives us a condition of strict local optimality. Then x is a strict local minimum

the

INEQUALITY CONSTRAINED PROBLEM.

• -) Hessiano positivo nello SPAZIO TANGENTE!

• ?j

+) µ > 0 per I VINCOLI ATTIVI!

Questi metodi sono PRIMALI DUALI (Si aggiorna sia x sia λ). λ è detta variabile duale.

Dato un problema di ottimizzazione, sotto certe ipotesi si può scrivere un problema di ottimizza-

zione rispetto alla sola λ e µ, che sotto alcune ipotesi è equivalente al problema di partenza. Idea

di fondo della trasformata. Metodi numerici per risolvere problemi di ottimizzazione vincolata.

4.2.4 COMPUTATION METHODS FOR EQUALITY AND INEQUALITY

CONSTRAINED PROBLEMS

BARRIER METHODS

Constrained problem. Set of algorithms called BARRIER METHODS (part of INTERIOR

POINT METHODS). Why Interior Point? Because we basically try to approach a minimum

from the interior of the constrained set. Start with a feasible point. Suppose we only specify

explicitly inequality constraints: ≤ ∀j ∈ {1,

min f (x) subj. to : g (x) 0 . . . , r}

j

x∈X

f, g continuous and X closed. Clearly an assumption that we need is that:

j {x ∈ | ∈ {1, 6 ∅

S = X g (x) < 0, j . . . , r}} =

j

Here in the X we’re taking into account that: X is a general closed set. For example X can be

a subset represented by the equality constraints. The main idea: think about a 3-dimensional

pic. Let’s suppose we want to minimize some function over a set S (we want to find some

point x that minimizes f but x must stay inside S). x that are outside S aren’t allowed =⇒

we’ve got a cost function that is f (x) inside S and +∞ outside. We may equivalent state the

following problem: minimize a class of functions (INDICATOR FUNCTIONS). With the barrier

method we approximate this idea. We’ll introduce a sequence of this function. We approach

the indicator function at limit. Let’s state:

min f (x) + B(x)

k

x∈X

| ∈

where B(x) B(x) is our barrier at iteration k We compute the minimizer of this:

N.

k x = arg min f (x) + B(x), k = 0, 1, . . .

k k

x∈X 104 ↑

We keep going by increasing the . x è comunque una sorta di aggiornamento. Vary and

k k

update x , il quale non è quello in particolare che mi risolve il problema! In questo algoritmo

k

si procede all’interno! Un modo per ovviare al problema è quello di andare ad approssimare la

funzione barriera e definirla in maniera tale che si possa estendere per continuità fuori. How do

{ } { } | →

we choose the sequence? Take: 0 < < (decreasing sequence of ). ( 0!)

k k k+1 k k

(All the must be positive). What it can be shown? It is the following:

k {x }

Proposition 6. Every limit point of is a global minimum of the constrained problem.

k {x }

Quite strong result. Generating a sequence this way, will converge to a global minimum.

k ∀k ∃!

There’s a trick: Clearly the proposition holds if at every iteration x . If the barrier is

k

convex, then the result is interesting! As longer as 0, we can run numerical problems.

k

Take an approximation of our function. If we stop, moving through our interior point, at least

it is a feasible point. Two most used B(x)’s are:

• r

X

[B(x) = log(−g (x))]

j

i=1 ·)

The most commonly used function. When g (x) < 0 then we are inside the set (log > 0.

j

− →

If g 0, log(·) +∞. Logarithm barrier function do the job! If we approach the

j →

− →

− ≥

boundary, g 0 and B(x) +∞! B(x) is not defined in g 0. Another possibility is:

j j

• r 1

X

[B(x) = ]

g (x)

j

j=1

− →

As g (x) 0, B(x) +∞.

j n

Now if I have an unconstrained problem, then X = and I can minimize like an unconstrai-

R

ned problem. Se abbiamo soli vincoli di disuguaglianza =⇒ approssimati con funzione lineare

n

e X = . Se abbiamo anche vincoli di uguaglianza, =⇒ costituiscano X e dobbiamo applicare

R

metodi per unconstrained problem. We can handle inequality constraints in this way.

PENALTY METHODS (for EQUALITY CONSTRAINTS)

∈ {1,

min f (x) subj. to : h (x) = 0, i . . . , m}

i

x∈X ⇐⇒

Remember that: h(x) = 0

As before we’re assuming f, h continuous and X closed!

 

h (x)

1 .. m

= 00 . What is the idea? Is something similar to the barrier methods

h(x) :=   R

.

 

h (x)

m

for inequality constraints. Since we have h(x) = 0, let’s approximate this problem (viola-

te constraints and I’ll be penalized). Let’s make a relaxation of the problem. What is the

approximation? It is the following cost: m c

k

X 2

kh(x)k

[L (x, λ) = f (x) + λ h (x) + ]

i i

ck 2

i=1

or: m m

c

k

X X 2

|h

L (x, λ) = f (x) + λ h (x) + (x)| =

i i i

ck 2

i=1 i=1

105 c

k

> 2

kh(x)k

= f (x) + λ h(x) + 2

in a more compact way; Notice that the underlined terms are the Lagrangian! We cannot

c 2

kh(x)k

minimize the Lagrangian wrt (x, λ) to minimize the original problem. We add in

k

2

order to allow ourself to minimize L. min L (x, λ )

ck k

We fix some λ = λ , at each (k So,

N).

k c

k

> 2

kh(x)k

min L (x, λ ) = f (x) + λ h(x) +

ck k k 2

x∈X {x } {c }

Ro where as we said, Just assume that is a bounded sequence and is such that

k k

{c →

− ∞);

(c > c > 0). There are increasing sequence of positive} and plus (c IDEA:

k+1 k k k

Take my constrained problem. . . Relax the problem taking into account the AUGMENTED

LAGRANGIAN and solve an unconstrained problem (taking into account h(x) = 0 of the

original problem) and let x = arg min L (x, λ )

k ck k

x∈X

n

X UNCONSTRAINED (es. X = ). As before: we can claim this result:

R {x }

Proposition 7. Every limit point of sequence is a global minimum of the original (con-

k

strained) problem. | {λ }

This result is true, no matter how we choose λ < +∞ bounded. (λ = 0) it is also

k k k

{λ | ∀k}.

possible (It is however a bounded sequence: λ = 0 Assuming that we’re solving

k k

exactly the provlem! Nel risolvere il mio algoritmo ci sono delle cose fattibili ed altre meno.

X può anche essere limitato, ma chiuso, positivo. X lo so gestire (Il problema rilassato lo

n {λ }

sappiamo risolvere, es. X = ). servono perché se li scegliamo in un determinato modo,

R k

ci aiutano. NB: Dobbiamo risolvere esattamente il problema aumentato cosı̀ posto! Now the

point is: We’re not able to solve that problem exactly at each iteration k. In MATLAB we’ll

get an approximation for the relaxed problem. PRACTICAL IMPLEMENTATION Typically

happens that: Terminate algorithm when: ≤

[k∇ L (x , λ )k ( > 0)]

x ck k k k

with > 0 some positive tolerance (something sufficiently small). We can define the tolerance

k

for the algorithm. What happens if we solve the problem’s approximation this way?

?

{x } →

− of the (equality constrained) problem.

Proposition 8. If x , which is a regular point

k

Then =⇒ ?

λ + c h(x ) λ

k k k

? ?

and (x , λ ) are such that they satisfy the FNC of optimality, meaning that:

m

X

? ?i ?

∇h

[∇f (x ) + λ (x ) = 0]

i

i=1

? ∈ {1,

and h (x ) = 0, i . . . , m} (FEASIBLE POINT!).

i

This result is saying that there are 3 possibilities for this procedure from a practical point

of view: 106

• 1) I’m not able to solve the minimization at some tolerance (We’re not able to find x k

⇐⇒

such that it respects the tolerance condition TOLERANCE FAULT);

• {x }

2) Ok, at each k I find that, but either doesn’t converge or it converges to x̄ not

k

regular!

• ?

{x } →

3) The one that we hope to happen: x regular and then this is the nice situation!

k

? ?

Then (x , λ ) satisfy the FNC of optimality.

{λ }

We should try to find such that we can help the procedure to don’t stack at either 1st and

k ?

2nd condition. If we’re in the 3rd good situation, then [λ + c h(x ) λ ].

k k k

METHOD OF MULTIPLIERS λ = λ + c h(x)

k+1 k k ? ?

− ∧ →

plus the previous resolver. If we’re in the good situation, then λ + c h(x) λ λ λ

k k k+1

too!)

There are other possibilities to choose the penalty function. L (x, λ) is a QUADRATIC

ck

PENALTY FUNCTION. With this penalty function we’re solving a relaxation of the problem.

?

| {x } →

Choose c x . These are [INEXACT PENALTY FUNCTION METHODS]

k k

EXACT PENALTY FUNCTION METHOD - SEQUENTIAL QUADRATIC PRO-

GRAMMING

La classe dei metodi che vedremo si basa sull’idea di creare delle penalty functions che ri-

solvono esattamente il problema! Non in maniera approssimata. Algoritmi costruiti con una

logica diversa. Risolvere la FNC di ottimalità. Algoritmi che risolvono la FNC. E trovare

dei punti stazionari del lagrangiano non serve a nulla se non utilizzando propriamente del-

(SQP) sono tra gli algoritmi più utilizzati. Per risolvere problemi

le exact penalty functions

di controllo ottimo. IDEA: Approssimo in maniera quadratica il mio problema. I problemi

quadratici sono di più semplice risoluzione. Let’s start to have a look of EPF’s methods.

m

X

L(x, λ) = f (x) + λ h (x)

i i

i=1

Let’s consider the function (penalty function) P (x, λ):

1 1

2 2

k∇ kh(x)k

P (x, λ) = L(x, λ)k +

x

2 2

Let’s suppose we minimize: min P (x, λ)

x,λ

This is called (P (x, λ)) an EXACT PENALTY FUNCTION. How the minima of this P is

∀x, ∀λ.

related to points that satisfy FNC of the original constrained problem? P (x, λ) What

can we say? ∀x, ≥

P is a non-negative function. λ, P (x, λ) 0 because it’s a sum of squared NORMs

≥ ⇐⇒ 6

(that are 0) it is a NORM < 0. What about the minimum value? [P (x, λ) = 0] è il

? ?

⇐⇒ ∇ ∧ ⇐⇒

minimo (. . . ) = 0 h(. . . ) = 0. So, P (x , λ ) = 0 (minimum value)

x ? ? ? ? ?

| ∇ ∧

(x , λ ) L(x , λ ) = 0 h(x ) = 0

x 107

This is interesting. If we minimize P (x, λ) wrt x, λ then we get points that satisfy FNC of

the original constrained problem. P may also be not convex. We require global minimum!

I minimi locali NON soddisfano queste condizioni! P (x̄, λ̄) > 0 se sono locali non globali.

P (x̄, λ̄) = 0 =⇒ x̄, λ̄ sarebbero punti di minimo globali! (E soddisfano la FNC per l’OP).

We can construct algorithms that try to satisfy FNC of optimality. To prove convergence

we have to use algorithm that decrease P (x, λ). P plays the role of a Lyapunov function for

our optimization problem (as a discrete time system). In this case, think the minimum (or

stationary point) to be an equilibrium of our dynamical system as optimization algorithm.

{(x

PRIMAL-DUAL METHODS are algorithms such that the sequence of , λ )} converge to

k k

m ?

? ? ? ?

P

← {(x {∇f ∇h

λ

some (x , λ ) , λ )} that satisfies FNC of optimality: (x ) + (x ) =

i

k k i

i=1

? ?

0, h(x ) = 0}, where the underlined condition means x is a feasible point. In general we can

design a class of algorithms such that:

(

x = G(x , λ )

k+1 k k

λ = H(x , λ )

k+1 k k

⇐⇒

Even a non linear algorithm ( (G, H) NL). If we want this sequence to converge

? ? ? ?

to some (x , λ ) satisfying FNC; construct (x , λ ) such that (x , λ ) are equilibrium for

k+1 k+1

{x }

, λ dynamical system! They must be equilibria for this UPDATE. (for the algorithm

k+1 k+1 ? ?

as discrete-time dynamical system) and so (x , λ ) have to be fixed point (ex AUTOPOINT’s).

We remember that: ( ? ? ?

x = G(x , λ )

? ? ?

λ = H(x , λ )

? ?

is the FPR Equation (Fixed Point Requirement). (x , λ ) fixed point of our map. Just compute

? ?

(x , λ ) such that FNC is satisfied. Dobbiamo progettare G, H tale che soddisfi la FPR equation

e la FNC. Vogliamo che siano degli equilibri asintoticamente stabili. A result we’ve already

known. The result holds. The INDIRECT METHOD of Lyapunov. Suppose we start in

? ? ? ?

a neighborhood of (x , λ ). L.A.S. for NL system (based on Linearization). (x , λ ) is an

? ? ? ? ? ?

equilibrium for the algorithm (x = G(x , λ ) λ = H(x , λ )). And all the eigenvalues of

the matrix: ? ? ? ?

∇ ∇

G(x , λ ) H(x , λ )

x x

?

R = ? ? ? ?

∇ ∇

G(x , λ ) H(x , λ )

λ λ

? ?

lie strictly inside the unit circle, then (x , λ ) is a point of attraction for the algorithm. Actually

this result tells us also something more. When the sequence converge, the rate of convergence

is linear, meaning that: ( ?

kx − k ≤ − k

x ckx x

k+1 k k ?

kλ − k ≤ − k

λ ckλ λ

k+1 k k

This type of convergence is [locally EXPONENTIAL]. This is a result that holds for general

algorithm. Start looking to specific algorithm. SPECIALIZE: generare una sequenza che mi

porti a convergere a dei punti FNC-feasible. Let’s look an update: is there a way to solve an

instance of the augmented Lagrangian method?

FIRST ORDER (PRIMAL-DUAL) METHOD

( −

x = x α∇ L(x , λ )

x

k+1 k k k

λ = λ + αh(x)

k+1 k 108

? ?

(x , λ ) is an equilibrium satisfying FNC? Yes. If we look at the Lagrangian L(x, λ) =

m

P

f (x) + λ h (x) (If you think of this update, it could be thought as a gradient method for

i i

i=1

the lagrangian). ∇

[h(x ) = L(x , λ )]

k λ k k ⇐⇒ ↓ ∧ ↑.

We’re descending in the variable x whilst the sequence of λ’s is ascending x λ

k k

How now do we prove that it converges? Use the EXACT penalty function:

1 1

2 2

k∇ k kh(x)k

P (x, λ) = L(x, λ) +

x

2 2

⇐⇒

(decreasing penalty function P (x , λ ) < P (x , λ )). Con questo metodo

k+1 k+1 k k

convergiamo ad un punto di sella per il lagrangiano (Saddle Point).

Dimostrazione. IDEA of the proof: Use the discrete version of the Direct Lyapunov’s theorem.

f (x) funzione di costo. Invece di forzare la x a stare lı̀ dentro =⇒ f (x) + B(x). Mandando

a 0 l’, la funzione si appiattisce sempre di più sino a diventare 0. (Solo B(x)). Se è troppo

grande, mi sto allontanando di più dalla f (x). Troviamo qualcosa di approssimato a seconda di

− ↓

come è fatto . Quando mandiamo 0, più le cose rischiano di complicarsi se il proble-

ma totale NON è condizionato! Traiettorie a tempo minimo. Approccio di tipo BARRIERA

( 0). Evitare problemi numerici di condizionamento. Metodi PRIMAL-DUAL del II ordine.

Algoritmi di SEQUENTIAL QUADRATIC PROGRAMMING (risolto in maniera iterativa).

Due metodi per implementare questi algoritmi. Dualità. Costruire un altro problema di otti-

mizzazione. Schema applicativo ove il problema duale è più fattibile da risolvere. Algoritmi

paralleli e distribuiti di ottimizzazione.

Just briefly recall (PRIMAL-DUAL METHODS). FIRST-ORDER METHODS. The idea

is: we construct: (

x = G(x , λ )

k+1 k k

λ = H(x , λ )

k+1 k k ? ?

Construct an update wrt (x, λ) that possibly converges to the pair (x , λ ).

( −

x = x α∇ L(x , λ )

x

k+1 k k k

λ = λ + αh(x )

k+1 k k ? ?

We’ve seen a theorem which basically states that: if (x , λ ) is A.S., then in an its neigh-

? ?

borhood, the linearization of the dynamical system converges to (x , λ ). One method we’ve

seen.

Local convergence with the Primal-Dual methods. This is something general. If we want

to gain some global convergence property, then we have to use some suitable descent function.

Slightly modified method. Switchable versions. Those PD methods try to satisfy FNC of

optimality. This update gives us something local. Former methods: Penalty methods that were

global.

EXACT PENALTY FUNCTION

Our possible modified Penalty FUNCTION is the following one:

1 c

2 2

kW kh(x)k

P (x, λ) = L(x, λ) + (x)∇ L(x, λ)k +

c x

2 2

109

∃c̄ ∀c ≥

For this PF: > 0 c̄ =⇒ any stationary point of P (x̄, λ̄), satisfies the FNC of optimality:

c

m

 X

∇f ∇h

(x̄) + λ̄ (x̄) = 0

 i i

 i=1

h(x̄) = 0

where the last one equation is a vectorial version of the constraints.

Obiettivo: cambiare metodi PD con convergenza locale con metodi PF. Switch tra PD e

PF. Convergenza quadratica. Usare una qualsiasi funzione di Penalty. Questo è un esempio es.

W (x) = I. Giustificare la convergenza locale.

NEWTON-LIKE METHODS

Let’s have a look to the NEWTON-LIKE METHODS. Sometimes it is called 2nd ORDER

METHOD. One of the property is that if we’re in a neighborhood of a strict local minimizer,

then we have a local convergence (quadratic convergence, that is something much faster). We

can use other methods to get close to the minimum. What is the idea? We want to solve the

following: m

 X

∇f ∇h

(x̄) + λ̄ (x̄) = 0

 i i

 i=1

h(x̄) = 0

We want to solve this system of equation. Let as a shorthand:

[∇L(x̄, λ̄) = 0]

Remember that L is defined as: m

X

L(x, λ) = f (x) + λ h (x)

i i

i=1

 

h (x)

1 ..

when h(x) := . How can I solve this system? I can think of apply a Newton’s

 

.

 

h (x)

m

method to solve this equation. Defined as a numerical method to find zeros of an algebraic

∇L(x) ∇L(x, How can I solve? Can

equation. = 0 then we apply Newton’s method. λ) = 0.

compute:

∆x k

2

∇ −∇L(x

L(x , λ ) = , λ )

k k

k k ∆λ

k

where I update with this update: (

x = x + ∆x

k+1 k k

λ = λ + ∆λ

k+1 k k

∇L(x,

The idea behind NM is: If I want to find the 0’s of λ) = 0, I set to 0 this approximation

of that equation, at first order:

∆x k

2

∇L(x ∇

, λ ) + L(x , λ ) + o(something) = 0

k k k k ∆λ k

110

where the underlined terms are higher order terms. If I want to find the 0 of something,

then: (

l(z) = 0 0

l(z ) + l (z )∆z (·) = 0

k k k ∇L(x, Dobbiamo

What we do in general. In our case, the equation is perfectly λ) = 0.

implementare questa regola di aggiornamento. Let’s compute the action of the Lagrangian.

What is the blocks structure? m

2 2 2 2

P

∇ ∇ ∇ ∇ ∇

f (x) + λ h (x) h(x) L(x, λ) L(x, λ)

i i J

2 i=1 xx xλ

∇ L(x, λ) = = = (. . . )

> 2 2

∇ ∇

∇ L(x, λ) L(x, λ)

h(x) 0

J λx λλ

4 blocks. In which we have this structure of derivatives.

H N

k k

(. . . ) = >

N 0

k

Let’s rewrite it by using this block notation:

2

∇ L(x , λ ) := H

k k k

xx

 2 ∇h(x

∇ L(x , λ ) := N = )

k k k k

 >

2

 L(x , λ ) = N

k k k

λx

( −∇

H ∆x + N ∆λ = L(x , λ )

x

k k k k k k

> −h(x

∆x = )

N k k

k > −h(x

∆x = )] can be

We want to solve these equations. H will be much clear, but [N k k

k

rewritten as we want to set: >

> ∇h(x

∆x = 0 =⇒ h(x ) + ) ∆x = 0

h(x ) + N k k k k

k k

first order approximation of h(x), at x . Instead of considering h(x) = 0, if I’m at some x ,

k k

let’s set to 0 this first order approximation. Stay in the tangent space. Mostreremo due versioni

della risoluzione del problema. ∧

I have this system of #EQU AT ION S = (n + m) #V ARIABLES = (n + m). (∆x k

and ∆λ ). Notice that H is a fixed matrix. The same for N . Once I know (H , N ), then I

k k k k k

evaluate this Jacobian. (∇ L(x , λ )) is something we can compute. There are two possibilities:

x k k

manipulate the expression in terms of matrices, where H > 0 (positive definite). If we approach

k

(x , λ ) strict local minimizer, then it is true. Let’s invert from the 1st equation:

k k − −

1 1

− ∇

−H N ∆λ (H ) L(x , λ )

∆x = x

k k k k k k k

Then if we go to the II’nd equation and substitute ∆x , we get now:

k

> >

− −

1 1

− ∇

[−N (H ) N ∆λ N (H ) L(x , λ ) + h(x ) = 0]

x

k k k k k k k

k k

where the underlined term is something invertible! Then:

> >

− − −

1 1 1 ∇

∆λ = (N (H ) N ) (−N (H ) L(x , λ ) + h(x ))

x

k k k k k k k

k k

[∇ L(x , λ )] is a feasible term. Then we get the following by rewriting it in such a manner

x k k

way: 111

( > >

− − −

1 1 1

− ∇

λ = λ + (N (H ) N ) (h(x ) N (H ) L(x , λ ))

x

k+1 k k k k k k k

k k

− −

1 1

− − ∇

x = x H N ∆λ (H ) L(x , λ )

x

k+1 k k k k k k k ∇f

where the second equation is a manipulable expression. Here I’m putting (x ). x =

k k+1

x + something.

k (

x = x + ∆x

k+1 k k

λ = λ + ∆λ

k+1 k k

This is the CORE! Questa è quella che riusciamo (facilmente) a codificare su MATLAB. Sistema

{H ∇ }

lineare di equazioni , N , L(x, λ)| e troviamo (∆x , ∆λ ). Una volta trovati li

x

k k x ,λ k k

k k

mettiamo nella legge di aggiornamento. (Non stiamo utilizzando alcuno STEPSIZE). Se

invece dobbiamo farlo funzionare più lontano, dobbiamo prendere speciali accorgimenti (metodi

Scrittura esplicita del metodo di Newton. Si esamini la struttura dei

tipo penalty function).

vari pezzi.

Let’s go back. Another way to interpret these system of equations. Rewrite (with the

notation as before): m

 X

∇f ∇h = (. . . )

H ∆x + N ∆λ + (x ) + (N λ = λ (x ))

 i

k k k k k k k ki k

 i=1

∇f

(. . . ) = (x ) + H ∆x + N (λ = λ + ∆λ ) = 0

k k k k k+1 k k

 >

 ∆x = 0

h(x ) + N k

k k ∇

where the underlined term has been derived from explicitly having rewritten L(x , λ ).

x k k

So now, let’s reorder. Sono le stesse due equazioni matriciali adeguatamente rimanipolate. Why

I want to write these equations this way? Let’s consider this problem:

1

> >

min (∇f (x ) ∆x + (∆x H ∆x))

k k

2

∆x

> ∆x = 0. Let’s look at this problem. What kind of problem it is?

subj. to: h(x ) + N

k k

It is an optimization problem. Something subject to an equality constraint. Cost function:

> >

12

∇f

linear term. (x ) + (∆x H ∆x ) where the underlined term is a quadratic cost function.

k k k

> ∆x is a linear term wrt ∆x domain variable. This is a

h(x ) is a vector of numbers. N

k k

quadratic optimization PROBLEM. What can we say about quadratic function? IF (H > 0)

k

(positive definite) (in realtà potrebbe anche essere semidefinita positiva), then the cost function

>

{x |

is a (quadratic) convex function. Linear equality constrained (on a convex set). If I’ve A x+

>

n×m m n

∈ ∈ {a ∈ ∧ ∈

b = 0}, where A , b , a x + b = 0, with b sono più semplici.

R R R R}

Optimization problem with convex cost and a convex constraint. =⇒ (CONVEX OPTI-

MIZATION PROBLEM). Local minimum = global minimum =⇒ enforce FNC and it will

be also a sufficient condition. Very special convex problem (Quadratic problem). They have

more structure. Easier to solve (quadprob). Somehow the general structure of this problem is:

> >

1 −

min q z + z Qz subj. to: Ax b = 0 =⇒ QUADRATIC problem.

z 2

Stiamo cercando di risolvere questo problema, il quale abbiamo visto essere un problema

quadratico, più facile da risolvere. Perché l’abbiamo tirato fuori? Let’s write the FC of opti-

mality for this problem (that is convex). Sufficient also. Enforce this condition and we’ll find a

global minimum. Just to give another name:

1

> > > >

∇f

L (∆x, l) = (x) ∆x + (∆x H ∆x) + l (h(x ) + N ∆x)

cp k k k

2 112 ∇(·)

Let’s write the FNC of optimality. Let’s write it (∇(·), == 0). Just differentiate wrt

∆x (writing as a vector): (

∇f (x ) + H ∆x + N l = 0

k k k

>

h(x ) + N ∆x = 0

k k

FIRST ORDER NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY FOR

THE QUADRATIC PROBLEM (QP). Now let’s just compare these two sets of equations.

{l

Se risolvo il QP, automaticamente risolvo quel set di equazioni, semplicemente ponendo =

∧ }.

λ ∆x = ∆x È un algoritmo implementabile ricorsivamente il QP (ricorsione). Mag-

k+1 k

giore velocità. Maggiore velocità di convergenza. Convergenza migliore spazio-temporale.

Quantomeno un’altra possibilità fruibile.

Schema: [SEQUENTIAL QUADRATIC PROGRAMMING]. Il VINCOLO LINEARE è

semplicemente l’approssimazione quadratica del mio lagrangiano! Schema:

• INITIALIZATION: choose (x , λ );

0 0

• REPEAT:

– EVALUATE: ( 2

∇f ∇

f (x ), (x ), L(x , λ ) = H ,

j

k k k k

xx

∇h(x

h(x ), );

k k

– SOLVE: 1

> >

∇f

min (x ) ∆x + (∆x H∆x)

k 2

∆x

> ∆x = 0.

subj. to: h(x ) + N

k k

– Obtain: (x , l );

k k

• UPDATE: (

x = x + ∆x

k+1 k k

λ = l

k+1 k

Perché Sequential Quadratic Programming? Partiamo da una coppia (x , λ ), calcoliamo

0 0

∇f (x ), calcoliamo lo Jacobiano della funzione vincolo e dopodiché calcoliamo il problema qua-

k

dratico; lo risolviamo (l := moltiplicatore di Lagrange per quel problema quadratico!) ed l viene

fuori come moltiplicatore di lagrange per la FNC del problema quadratico. Ci sono metodi nu-

merici per ricavare simultaneamente ∆x ed l! Mettendo a confronto, possiamo semplicemente

porre (l = λ ). (SQP) (Risoluzione di una serie di Sequential Quadratic Programs). Metodo

k k+1

di Newton. Convergenza locale e quadratica (abbastanza veloce). È un problema quadratico

VINCOLATO. Ma se abbiamo solo vincoli di uguaglianza, è come se divenisse NON VIN-

COLATO i.e. minimizzazione di una funzione quadratica soggetta ad una dinamica lineare!

Minimizzazione su un IPERPIANO. Volendo possiamo riscrivere questo metodo aggiungendo

i vincoli di disuguaglianza. Aggiungere i µ e ci sarà anche la linearizzazione del vincolo di

k

disuguaglianza. Il problema diventa però un po’ più complesso. Alternativa: per i vincoli di

disuguaglianza introdurre una funzione BARRIER e considerare i vincoli di uguaglianza per la

barriera. Il vincolo di disuguaglianza deve avere un moltiplicatore di Lagrange associato non

negativo! Metodo con convergenza locale! Possiamo abbinarlo a metodi di tipo penalty. Varie

2

alternative. Varianti: volendo si può sostituire ad L(x , λ ) una matrice definita positi-

k k

xx

va H > 0 ed utilizzo uno Stepsize con regola opportuna di Selection. (H definita positiva!

k k

⇐⇒ H > 0).

k 113

RECAP: problema di partenza: min f (x) subj. to: h(x) = 0. H serve per allargare

x∈X k

il bacino di attrazione dell’algoritmo. Algoritmo dove ad ogni passo ho un qualcosa di più

semplice da risolvere (sempre di ottimizzazione). Osservazione: stare attenti che se applichiamo

questi metodi, un aspetto negativo (o comunque da tener conto), è che l’aggiornamento della

x , lo facciamo in generale su punti che non stanno sul vincolo! Se sono sul vincolo siamo sullo

k 6

spazio tangente, diversamente avremo qualcosa di affine. Generalmente h(x ) = 0 =⇒ x

k k

NON FEASIBLE! Quindi se vogliamo soddisfare il vincolo, o non siamo sufficientemente vicini

al minimo, NON sto soddisfando il vincolo. Criterio di STOP:

• -) Avvicinamento al costo ottimo;

• +-) Quindi se ci dovessimo fermare prima, in qualche modo potremmo non stare soddi-

sfando il VINCOLO! Il set di vincoli sarà proprio la DINAMICA del sistema!

Troviamo quindi eventualmente delle curve che non sono traiettorie e che quindi NON soddisfano

l’eq. differenziale (Non ammissibili quindi)!

Ripercorriamo il percorso: partiamo da problemi di ottimizzazione non vincolati. Condizioni

necessarie e sufficienti del I e del II ordine. Condizioni necessarie nel caso CONSTRAINED.

Dopodiché abbiamo specializzato il caso CONSTRAINED. (Condizioni necesarie e sufficienti

del I e del II ordine). Funzione Lagrangiana. Per trovare le NC’s ma anche per sviluppare

algoritmi (PF e PD). Aumento il Lagrangiano con delle funzioni di Penalty opportune.

4.3 Problemi di Controllo Ottimo

Sistemi dinamici tempo discreto. Problemi di Ottimizzazione vincolata. Problemi con più

struttura, facili da particolareggiare. Gestione della dinamica più generale.

4.3.1 DISCRETE-TIME OPTIMAL CONTROL

Optimal Control. We’re dealing with dynamical systems in discrete time. We have a dyna-

mical system, that can evolve with time. Driven by an external input. Some objective. Some

input that allows you to meet these objectives! Follow a path with some speed profile. Inputs

eventually driven by a controller. Accomplish some task. Feedback law in order to do that.

Techniques to design feedback laws for more complex systems. We have not considered any

cost in what we do. Let’s try to do that. How do I choose those parameters? i.e. K matrix

for linear systems. Eigenvalues Allocation Problem. Can I use this degree of freedom in order

to optimize some cost? Go from one point to another in a minimum time. Do the fastest lap

with a car. Minimize lap time. Send a satellite. Minimize the cost to execute an input amount.

Some limited energy of the system. How do we do that? Setup an optimization problem:

−1

N

X

min l(x(t), u(t), t) + m(x(N ))

{x(1), ∧ {u(0), −1)}

..., x(N )} ..., u(N t=0

subj. to: x(t + 1) = f (x(t), u(t), t), x(0) = x . (It satisfies a certain dynamic, with fixed

0

CINIT). We may also want to satisfy some constraint in some space:

( n

∈ ⊂ ∀t

x(t) X , = 1, . . . , N

R

t m

∈ ⊂ ∀t −

u(t) U , = 0, . . . , N 1

R

t

We start the input at t = 0. We look at the x at time 1, . . . We have our dynamical system.

Trajectories are (x(t), u(t)); in particular x is the state trajectory, and u is the input part of the

114

trajectory. We have to minimize our cost function (a number, non negative typically). Since we

want to penalize the entire trajectory, how can we get a number that takes into account these

elements? m is some penalty. It doesn’t depend on the INPUT. t [0, N ]. Time window. N

can also go to +∞; let’s see what are possible costs that we may want to minimize. In general,

we want to consider N fixed. What are some possibilities of cost function? We may want to

consider: −1

N

1 1

X 2 2 2

− ku(t) − kx(N −

(. . . ) =: J(x, u) = (kx(t) x (t)k + u (t)k ) + ) x (N )k

d d d

Q R P

2 2 f

t=0

12 {{x ∧

where is only a scaling value. The desired trajectory is: (1), . . . , x (N )}

d d

{u −

(0), . . . , u (N 1)}} (DESIRED) GIVEN. Choose some desired trajectories. These do not

d d

satisfy some dynamics. Getting close to that trajectory however. Maybe I want to choose for

example for U , some interval:

t m

{u ∈ | kuk ≤ }

U := u

R

t M AX

kuk ≤ k·k

Choose an input constraint set that is simply saying that: u . can be any valid

M AX

given norm. Some desired trajectory. Want to find a trajectory, close to the desired objectives,

while given some constrains on the actuators (sforzo di controllo).

>

( 2

kx(t) − − −

x (t)k = (x(t) x (t)) Q(x(t) x (t))

d d d

Q

2

ku(t) − u (t)k = . . .

d R

the same as first for the second equation. (The state can have different dynamics!) Q is

> ≥

typically [Q = Q ] 0 symmetric pos. semidef. Weight these in different ways. Very close

in the position, but don’t care about the angle. Choose so, different weights. R > 0 a ma-

trix (symm. pos. def.). Pesi che pesano in maniera positiva e differente questi costi. Norme

>

quadratiche. Forma quadratica. Caso fattibile per un vettore v W v. Sorta di norma pesata

> 2

kvk . Norma classica se W = I. Basta che sia quantomeno

del vettore v. (. . . ) = v W v = W

semidefinita positiva. u potrebbe anche essere 0! Chiaramente non è una situazione feasible!

d

Violerebbero il secondo principio della termodinamica. Fare una traiettoria senza eccessivo sfor-

zo di controllo. x e u non le posso scegliere a caso! Devono soddisfare la dinamica! Cercare i

(x, u) che siano però realizzabili! Eventualmente scontrarsi anche con le esigenze pratiche limi-

tate degli attuatori. Limiti eventuali sugli stati (es. posizioni), eventuale presenza di ostacoli.

u (t) valore che viene fuori da una qualche intuizione su un modello fisico molto complesso.

d

u parametro progettuale. Soddisfi una qualche DINAMICA SEMPLIFICATA. Prendiamo un

d

modello semplificato. Chiaramente NON sarà l’ingresso del sistema fisico! Tipicamente trovare

un equilibrio è più semplice che trovare una traiettoria! Diverse possibilità. Stima di un even-

tuale sforzo di controllo u (t). Differenti pesi. Oppure arrivare ad una posizione spendendo il

d

minimo sforzo di controllo possibile.

Approach this problem: rewrite it Just using some different notation. U è un vincolo

t

abbastanza generico! Tipicamente potrebbe essere un vincolo di disuguaglianza, ma non per

{x(t) ∧ };

forza. Si ponga: := x u(t) := u f (t) := f .

t t t

−1

N

X

min l (x , u ) + m(x )

t t t N

{x } ∧ {u }

, ..., x , ..., u −1

1 0

N N t=0

115

subj. to:  −

x = f (x , u ), t = 0, . . . , N 1

t+1 t t t

 n

∈ ⊂

x X , t = 1, . . . , N

R

t t m

 ∈ ⊂ −

u U , t = 0, . . . , N 1

 R

t t

CINIT (GIVEN). The domain variable is:

where x FIXED

0

x

z := of our optimization problem.

u

These constraints are typically inequality constraints, for the input and the state. Let’s con-

sider a general class of optimization problem by specifying these sets. INPUT-CONSTRAINED

n

⇐⇒

OPTIMAL CONTROL ( X = ); what happens to the dynamic? Equality constraint for

R

t

the x (the dynamics). Written however in some fixed way! The two variables aren’t coupled! FIX

u, gets x. If I haven’t constraints on the state, then I can write: x = φ (u) = φ (u , . . . , u ).

−1

t t t 0 N

{u }.

Some sort of discrete transition function. Notice that I need , . . . , u Then my optimal

−1

0 N

control model can be written in this way:

−1

N

X

min l (φ (u), u ) + m(φ (u))

t t t N

t=0 ∈

subj. to: (nothing except our INPUT constraint u U ). Reduced my equality constrained

t t

problem in (x, u) in an OPTPROB with only u!

If I assume I don’t have input constraints, UNCONSTRAINED OPTIMAL CONTROL

m

⇐⇒

( U = ).

R

t −1

N

X

min J(u) := l (φ (u), u) + m(φ (u))

t t N

N m

u∈R t=0

 

 

u

01

..

 

u  

 

.

0  

 

..  

u

where = . We have N m = # components of our u, which becomes an N m

 

. 0m

 

   

..

u  

.

−1

N  

u −1

N

dimensional vector!

Spesso problemi di Controllo Ottimo vengono chiamati NON VINCOLATI proprio per que-

sto motivo. φ (u) = x . Da un punto di vista numerico si può approcciare questa tecnica,

t t Progettare delle leggi

ma ci sono dei problemi (SEQUENTIAL QUADRATIC PROBLEM).

di controllo che soddisfano dei vincoli sugli ingressi! Mantenere il vincolo di saturazione degli

Ingressi? Possiamo mettere in conto anche dei vincoli! Oneroso da un punto di vista computa-

zionale. Quindi il problema, se togliamo il vincolo sulla dinamica, diventa UNCONSTRAINED.

Preparare dei metodi numerici per risolvere questo problema e scrivere le condizioni necessarie

per il controllo ottimo. Questo problema NON vincolato è equivalente a quello con il vincolo di

uguaglianza. Let’s see how to proceed: Necessary condition of optimality, since it is an uncon-

⇐⇒ ∇J(u)

strained problem is: set the gradient of J(u) equal to 0 = 0. Let’s go back to the

constraints: ∨

[ min J(u) := F (φ(u), u)] [min F (x, u)] subj. to : h(x, u) = 0

x, u

N m

u∈R 116

where h(x, u) = 0 is our constraint!

x f (x , u ) = 0

1 0 0

 −

x f (x , u ) = 0

 2 1 1

 h(x, u) = 0

..

.

 −

x f (x , u )

−1 −1

N N N

(Teniamo conto del vincolo di uguaglianza (dinamica!)) (traiettoria del

(DINAMICA!)

sistema!), N m n

∈ ∈

Notice that given u , h(φ(u), u) == 0! If φ(u) = x . Exactly what we’re

R R

∇J(u)?

saying! How do we compute Let’s proceed in considering both the unconstrained and

the constrained version of the problem. Apply chain rule =⇒ Then we should compute F ,

x

∇φ(u), ∇F

and then and then wrt u. To do that in an easier way, let’s write starting from the

equality constrained version of this problem. Treat is as a constrained-equality problem:

>

L(x, u, p) = L(z, p) = F (x, u) + h(x, u) p

(Written in a compact way); where: −1

N

X

>

h(x, u) p = h (x, u)p

t t+1

t=0

(x, u) is the decision variable. Just a matter of notation. Lagrangian of our system. How

? ? ?

do we compute FNC of optimality for the problem? (∇L(x , u , p ) = 0). Let’s write FNC of

? ? ?

∇L(x

optimality. We need to set: , u , p ) = 0. Let’s write it explicitly:

? ? ?

∇ L(x , u , p ) = 0

x

 ? ? ?

∇ L(x , u , p ) = 0

u ? ? ? ? ?

∇ ⇐⇒

L(x , u , p ) = 0 h(x , u ) = 0

 p

where the RHS of the last implication is the FEASIBLE POINT CONDITION or actually

the DYNAMICS (The equality constraint). Let’s compute this gradient:

? ? ? ? ? ? ? ?

∇ ∇ ∇

L(x , u , p ) = F (x , u ) + h(x , u )p

x x x

>

n

∇ ∇

∈ ∇ h . . . h

(p ). Definiamo: h(x, u) = (vector of gradients).

R −1

x 0 x

x N

We get: ( ? ? ? ? ?

∇ ∇

F (x , u ) + h(x , u )p = 0

x x

? ? ? ? ?

∇ ∇

F (x , u ) + h(x , u )p = 0

u u

? ?

⇐⇒

ovvero feasible point h(x , u ) = 0. So now, let’s look at the unconstrained version of

∀u

the problem, let’s compare: min F (φ(u), u). This expression: h(φ(u), u) is NOT a

N m

u∈R

∀u, {φ(u) | ∀u}.

constraint! let’s suppose: h(φ(u), u) = 0 >

J(u) = F (φ(u), u) + h(φ(u), u) p

∀p!

where the underlined part is 0 (It’s like I’m evaluating the lagrangian). If I compute:

∇J(u) ∇φ(u)[∇ ∇ ∇ ∇

= F (φ(u), u) + h(φ(u), u)p] + F (φ(u), u) + h(φ(u), u)p

x x u u

117 >

Per adesso teniamo libero p. TRICK: Notice that if I don’t put the [h(φ(u), u) p = 0], then

|

since p is a degree of freedom, we can choose p = p(u)

∇ ∇ ⇐⇒

F (φ(u), u) + h(φ(u), u)p(u) = 0 [∇ L(φ(u), u, p(u)) = 0]

x x x

∀u, ∇J(u)

Let’s force to be 0 then we get that become:

∇J(u) ∇ ∇

= F (φ(u), u) + h(φ(u), u)p(u)

u u

but for some special p(u). Is it constant for FNC of optimality for our unconstrained

problem?

FNC (FIRST ORDER NECESSARY CONDITIONS OF OPTIMALITY)

?

∇J(u

Simply, ) = 0? ∇ ∇

F (φ(u), u) + h(φ(u), u)p(u) = 0

u u

? ? ? ? ? ? ?

∀u, {x ∧

If holds the it holds also for u . Remember: = φ(u ) p = p(u )}; h(φ(u ), u ) =

?

∀u

0 . Then If I go back to the previous equation, we have:

? ? ? ?

∇ ∇

F (φ(u ), u ) + h(φ(u ), u)p(u ) = 0

x x

Problema di Ottimizzazione. Vedere come fare per un problema di controllo ottimo. Il

vincolo è molto particolare! Data la u, φ(u) ci ritorna automaticamente la x! Non solo all’ottimo,

ma sempre! (Eventuali problemi numerici per φ(u)). φ(u) al posto della x nel problema NON

vincolato!

Let’s write explicitly this: ∇ ⇐⇒

[∇ F (φ(u), u) + h(φ(u), u)p(u) = 0] (. . . )

x x

∇ {p(u) ∧

This is simply: (. . . ) = L(φ(u), u, p(u)) = 0. Let’s call: = p φ(u) = x}.

x

Just a notation simplifier.

−1 −1

N N

X X >

L(x, u, p) = l (x , u ) + m(x ) + (f (x , u ) x ) p

t t t t t t t+1 t+1

N

t=0 t=0

Let’s compute L(φ(u), u, p) (∇(·) wrt x): First of all this Lagrangian has a very special

x

⇐⇒ 6 ·{x }) ·(x

structure! (l = , x , . . . = ) =⇒

t t+1 t+2 t

 

∇m(x )

N

∇ l (x , u )

−1 −1 −1

x N N N

 

−1

N +

 

..

 

.

 

∇ l (x , u )

x 1 1 1

1

 

−I 0 ... 0  

∇ −I

f (x , u ) ... 0 p

−1 −1 −1

x N N N N

 

−1

N

  ..

0 f (x , u ) ... 0

+ = 0

−2 −2 −2

x N N N

   

.

−2

N

   

.. .. .. ..

 

. . . . p 1

 

∇ −I

0 ... f (x , u )

x 1 1 1

1

Let’s now write these equations blocks: 118

( ∇m(x

p = )

N N

∇ ∇

p = f (x , u )p + l (x , u )

−1 −1 −1 −1 −1 −1 −1

x x

N N N N N N N N

−1 −1

N N

This is a discrete time dynamical system! Instead of going forward in time, I go backward:

→ → →

p p . . . p . Obtain the others.

−1 1

N N ∇

∇ l (x , u )

f (x , u )p +

p = t t t

t t t t+1 x

t x t

t

∇m(x

with p = ) as FINIT.

N N >

If we know the trajectory, we can compute the gradient. This is basically: p = A(t) p +

t t+1

∇m(x ∧

a(t), p = ). (Linear dynamics) (Affine dynamic) driven by a(t). Calcoliamo la

N N

linearizzazione (Equazione in realtà alle differenze finite). Since we have this recursion, we can

write things more explicitly. Rewrite it in different way:

−1

N

X >

L(x, u, p) = ((H(x , u , p ) := l (x , u ) + f (x , u )p ) p x ) + m(x )

t t t+1 t t t t t t t+1 t+1 N

t+1

t=0

where: H(x , u , p ) := l (x , u ) + f (x , u )p

t t t+1 t t t t t t t+1

[HAMILTON FUNCTION]. Why it is useful?

∇ ∇ ∇J(u)

F (φ(u), u) + h(φ(u), u)p(u) =

u u

component-wise rewritten:

∇ ∇ ∇

J(u) = [∇ l (x , u ) + f (x , u )p ] = H (x , u , p )

u u t t t u t t t+1 u t t t t+1

t t t t

Rewrite p ’s components:

t ∇ ∇m(x

p = H (x , u , p ), p = )

t x t t t t+1 N N

t

FIRST ORDER NECESSARY CONDITION OF OPTIMALITY (FNC)

>

? ?N

?

u . . . u be a local minimum control trajectory (or sometimes called

Let u = −1

0 >

? ?N

?

x . . . x the corresponding state trajectory. Then

OPTIMAL CONTROL) and x = 1

=⇒ we need to satisfy (∇J(u) = 0). ?t ?t ?t+1

∇ H (x , u , p ) = 0

u t

t

? ?N

where (p , . . . , p ) is the COSTATE VECTOR obtained as:

1 ?t ?t ?t ?t+1 ?N ?N

∇ ∇m(x

p = H (x , u , p ), p = )

x t

t

TRY THIS: − ∇J(u

u = u α )

k+1 k k k

Algoritmi di tipo gradiente. Generare traiettorie o controlli che minimizzano una cer-

ta funzione di costo. We can compute the: UNCONSTRAINED OPTIMAL CONTROL

PROBLEM: 119 −1

N

X

min l (x , u ) + m(x )

t t t N

{x } ∧ {u }

, ..., x , ..., u −1

1 0

N N t=0

subj. to: x(t + 1) := x = f (x , u ). Where the underlined term is the final cost.

t+1 t t t

FNC: ( ?t ?t ?t ?t+1 ?t ?t

∇ ∇

p = f (x , u )p + l (x , u )

x x t ?N

t t ∇m(x

, p = )

N

?t ?t ?t+1 ?t ?t

∇ ∇

f (x , u )p + l (x , u ) = 0

u t u t

t t

t = 0, . . . , N 1; x GIVEN (FIXED);

0

Basically write the Lagrangian with x as a function of u: >

L(φ(u), u) = F (φ(u), u) + h(φ(u), u) p

It turns out that: ∇J(u) ∇ ∇

= F (φ(u) := x, u) + h(φ(u) := x, u)p(u)

u u

where basically component-wise rewritten it becomes:

[∇ f (φ (u), u)p + l (φ (u), u)]

u t t t+1 u t t

t t

where p satisfies:

t ∇ ∇m(x

p = H (x , u , p ), p = )

t x t t t t+1 N N

t

⇐⇒ ∇

∇ If we set L(·) = 0, then we get p and u that satisfies this

L(φ(u), u, p(u)) = 0 x

x

expression: ∇

[∇J(u) = L(φ(u), u, p(u))]

u

we’re somehow always satisfying the constraint by finding the φ(u). In order to write these two

equations (FNC i and ii), we can just write them wrt a different function. Let’s introduce the

Hamiltonian: >

H (x , u , p ) = l (x , u ) + p f (x , u )

t t t t+1 t t t t t t

t+1

( ?t ?t ?t ?t+1

p = H(x , u , p )

x t ∇m(x

, p = )

N N

?t ?t ?t+1

∇ H (x , u , p ) = 0

u t

t

Just a way to write these equations. If we look at the hamiltonian, we can write also that:

?t ?t ?t+1 ?t+1 ?t ?t

x = f (x , u ) = H (x , u , p ) =⇒ x = f (x , u )

t+1 t t t p t t

t+1

with x GIVEN; Now let’s remember what we’ve done for standard optimization problem.

0

Develop a class of gradient method in which we try to enforce (FNC). Try to converge in a

∇(·)

structure in which the = 0; for our optimal control problem. Then we can simply apply a

gradient method (the most classical gradient method) (UNCONSTRAINED PROBLEM):

− ∇J(u

min J(u), [u = u α )]

k+1 k k k

N m

u∈R  

∇ J(u)

u

0 ..

>

∧ ∇J(u)

u . . . u

where u = = .

 

.

−1

0 N  

∇ J(u)

u −1

N

120

We would like to update each component of u. L’indice prefisso scandisce le iterazioni del-

l’algoritmo di ottimizzazione, mentre il postfisso scandisce l’istante temporale discreto. Calcolo

 

u

k1

...

di nuove traiettorie. Ad ogni iterazione avrò un u = . (N controlli agli N istanti

 

k  

u −1

k N

di tempo (alla k-esima iterazione)); (u control at time t of iteration k of the optimization

kt

algorithm).

Now let’s see how we can explicitly calculate H:

− ∇

[u = u α (∇ f (x , u )p + l (x , u ))]

u t u t

k+1 t kt k kt kt k t+1 kt kt

t t

Notice that: x = (φ (u )) satisfies the dynamic and p = p(u ). (one that satisfies that

t t t kt kt

equation). Now let’s rewrite the algorithm step by step. If I know (x , u ), then I know

kt kt

∇ ∇

f (x , u ) and we know also l (x , u ). So, it’s just a gradient of a function that we

u t u t

kt kt kt kt

t t

know:  

f (x , u )

0 0 0

..

 

.

  N n×N m N n

  7→

f (x , u )

F (x, u) = : R R

t t t

 

 

..

 

.

 

f (x , u )

−1 −1 −1

N N N

where:  

x

 1

 ..

x =

  

.

  

 x

 N

 

u

0

 ...

u =  

  

 u −1

N

n×m n

7→ : è di questa che facciamo il gradiente. La variabile di ottimizzazione è un

f : R R

t

grande vettore. Stiamo cercando di esplicitarne la struttura.

STEEPEST DESCENT FOR OPTIMAL CONTROL

>

u . . . u

INITIALIZE: u = ;

−1

0

0 N

• REPEAT: AT k, FOR k = 0, 1, . . . . Given u , get x = φ(u ). (Integrate the dynamic)

k k k

get the state. The following holds: x = f (x , u )

t

k t+1 kt kt

where x GIVEN (FIXED); (x , u ) will be a trajectory of the system, by construction.

k0 k k

Then given (u , x ), get p . How do we get p ? From:

k k k k

∇ ∇

[p = f (x , u )p + l (x , u )]

x t t t x t t t

kt k t+1

t t

∇m(x

with p = ). (integrazione all’indietro). (FROM BACKWARD RECURSION).

kN kN >

Questa equazione è una del tipo: p = A p + q . L’equazione del COSTATO è

t

kt k t+1

t

sempre lineare, nonostante la dinamica del sistema sia non lineare. (SISTEMA LINEARE

GUIDATO DA UN CERTO INGRESSO);

121

• UPDATE: ∇

− f (x , u )p

l (x , u ) +

u = u α (∇ )

t

t u

u kt kt k t+1

kt kt

k t+1 kt k t

t ∇J(u).

∇ H (x , u , p ) =

where the underlined term is t

u kt kt k t+1

t

Posso codificare questo algoritmo in MATLAB: >

u α (r + B p )

t

kt k k t+1

t

>

x = f (x , u ). (B è la matrice degli ingressi del linearizzato). Se sono la linearizzazione

t+1 t t t t

la posso scrivere come: ∆x = A ∆x + B ∆u

t+1 t t t t

>

n×m m×n

{B ∈ ∈ }.

where: =⇒ B

R R

t t 2

We can also derive a Newton’s method by computing J(u) if we are in a neighborhood

of the minimum, then the convergence is very fast (quadratic convergence).

→ ∨

min F (φ(u), u) (CONDENSING METHODS) (SHOOTING METHODS) (Why shoo-

u

ting? Because we’re deciding the control and then we get a trajectory associated to that

control).

• ADVANTAGE: At each k, (x , u ) is a trajectory of the system. It is interesting! If

k k

the algorithm stops at some k iteration, then we still have a trajectory! (RECURSIVE

FEASIBILITY);

• DROWBACK: Se la DINAMICA è INSTABILE, la dinamica DIVERGE! Non possia-

mo comunque parlare di stabilità perché siamo su un orizzonte temporale finito! In

this case the instable dynamics integration of x = f (x , u ) can be numerically

t

k t+1 kt kt

ill conditioned! Siamo costretti a prendere intervalli di tempo brevi!

LINEAR QUADRATIC OPTIMAL CONTROL

If we have a linear system, then we can solve the optimal control problem:

−1

N

1 1

X > > >

min (x Q x + u R u ) + x Q x

t t t t N N

t t N

2 2

{x=x } ∧ {u=u }

, ..., x , ..., u −1

1 0

N N t=0

subj. to: [x = A x + B u ]. Where the underlined term is a general version for a final linear

t+1 t t t t

cost.

We have some dynamics, that is linear. Two extreme cases. Scegliere la Q e la R significa

puntare l’attenzione (penalizzare) od il controllo o l’ingresso. Quanto velocemente voglio andare

a 0 oppure quanto sforzo di controllo voglio adoperare.

• (

kQk 1

kRk 1

I wait to go to zero;

• (

kQk 1

kRk 1

I don’t want so much control; 122

> >

{[Q ≥ ∧ ≥ ∧ {[R ∧

ASSUMPTIONS: = Q ] 0 Q 0} = R ] > 0 R > 0}. So we

N N

have Q, Q symmetric positive semidefinite matrices and R, R symmetric positive definite

N N

matrices. (Q, R weighted matrices). Here we’re not considering constraint about state or input.

≥ ∧

Let’s look at this problem. If Q , Q 0 R > 0, then I’ve a quadratic form (sum of quadratic

t n t

terms). The cost function is quadratic, with non-negative quadratic term. The cost function

J(u) is quadratic, positive definite and thus convex! J := J(u), Convex wrt u (the input). Also

⇐⇒

[x = A x + B u ] is a linear equality constraint; x = φ(u). φ is a linear function x is

t+1 t t t t

a linear function of u. If it’s linear function, J(u) is pos. def. convex. It means that this is a

⇐⇒

convex problem, and since is a convex problem, FNC FSC. (FNC is also SUFFICIENT).

The cost J(u) is strictly convex =⇒ (unique global minimum) and we get it by setting FNC

(∇(·) = 0);

Visto che il problema è convesso, FNC = SNC; cerchiamo di ottenere il controllo ottimo.

Conditions are: ( ?t ?t ?t+1

∇ H (x , u , p ) = 0

t

u ?N

t ∇m(x

, p = )

N

?t ?t ?t+1

?t ∇ H (x , u , p )

p = t

x

t

In this case we have a special version of dynamics and cost function.

1 1

> > >

H (x , u , p ) = x Q x + u R u + p (A x + B u )

t t t t+1 t t t t t t t t

t t t+1

2 2

>

where the underlined term is simply: p f .

t+1

Why it’s convenient R > 0? It is invertible!

t >

?t ?t+1 ⇐= ∇u

(R u + B p = 0 H (·) = 0) =⇒

t t t

t

> >

?t ?t+1 ?t ? ? ?

1

−R

=⇒ u = B p =⇒ p = A p + Q x , p = Q x

t t N N

t t t+1 t N

(Costate equation). We could pair this system with the dynamics:

>

( ? ? ? ?

p = A p + Q x , p = Q x

t N N

t t t+1 t N

> ?t+1 ?t

1

x = A x + (−B R B p = B u )

t+1 t t t t t

t

?

with x GIVEN (FIXED); this is the HAMILTONIAN SYSTEM. (double point BOUNDARY

0

conditioned problem (system) both at the beginning and at the end). For this special case this

?N ·(x).

is another explicit way to solve, Start from p = Q x . It means that p = and viceversa

N N

∀t, ∃P ≥ |

is linear. this dependence is linear, Prove that 0 p = P x . We know that this is

t t t t

?t+1 ?t+1 ?

·

true for t = N . Let’s prove it by induction. Suppose: p = P x . Let’s remove the .

t+1

Let’s prove it is true for P , x .

t t >

?t ?t+1

1

−R

u = B P x

t t+1

t

but then let’s substitute the dynamics: >

?t ?t ?t

1

−R

[u = B P (A x + B u )]

t t+1 t t

t

?t

Solve this equation for u =⇒ > >

?t ?t

−B

(R + B P B )u = P A x =⇒

t t+1 t t+1 t

t t

> >

?t ?t

1

−(R

=⇒ u = + B P B ) B P A = K x

t t+1 t t+1 t 0

t t ⇐⇒ ∃(R

If we’re assuming P > 0, then I can consider the inverse of that term (. . . ) +

t+1 t

> −

1

B P B ) .

t+1 t

t 123

?t ?t ?t ?t

u = (some matrix)x . SOME FEEDBACK LAW =⇒ u = K x , where

0

> >

1

−(R

[K = + B P B ) B P A ]

0 t t+1 t t+1 t

t t

We want to prove that this is true for (x , u ). Let’s write the dynamics of x :

t t t

?t ?t ?t

x = A x + B K x = (A + B K )x

t+1 t t 0 t t 0

>

And now let’s multiply each side of equations by A P :

t+1

t

> > >

? ? ?

= A P A x + A P B K x

A P x t+1 t t+1 t 0

t+1 t t t t

t t+1

?t+1

where the underlined term is p . And now let’s use the fact that: [p = P x ]:

t+1 t+1 t+1

> >

? ?

>P

A p = (A P A + A B K )x

t+1 t t t+1 t 0

t t+1 t t

So now, let’s simply do: > >

?t+1 ?t ?

A p + Q x = (A P A + A P B K + Q )x

t t t+1 t t+1 t 0 t

t t t

> ? ?

But now if we look at the COSTATE EQUATION, it tells us that: A p + Q x =⇒

= p

t+1 t t

t t

> >

?t ?

p = (A P A + A P B K + Q )x

t+1 t t+1 t 0 t

t t t

where the underlined term is the P matrix. (Esiste una controparte anche a livello continuo)

t

> > > >

1

=⇒ P = A P A + A P B (−(R + B P B ) B P A ) + Q , P = Q

t t+1 t t+1 t t t+1 t t+1 t t N N

t t t t

(FINITE-DIFFERENCE) RICCATI EQUATION.

> >

?t ?t

1

−(R

u = + B P B ) B P A x

t t+1 t t+1 t

t t

− ∞,

this is a linear feedback. If N it is a stabilizing input. (Sistema in anello chiuso).

Modo per stabilizzare il sistema discreto tempo variante.

OBSERVATION: Suppose we have a LTI system =⇒ [x = Ax + Bu ]

t+1 t t

6

(time-INVARIANT). . . If for some reason we find P = P (t), we have a time-constant feedback!

> > > >

1

[P = A P A A [P B(R + B P B) B P ]A + Q]

6

Suppose we choose Q = Q(t). We have a STATIC FEEDBACK (that doesn’t depend on

?

− ∞,

time). If N we don’t have a final cost and u will be:

> >

? ?t

1

−(R

u = + B P B) B P Ax

It gives us some optimal way to choose the feedback. Se riusciamo a trovare P che è un punto

fisso dell’equazione, qui stiamo fornendo un criterio per l’allocazione arbitraria degli autovalori.

·(Q, kRk

P = R), 1 =⇒ (controllo non esagerato).

A tempo continuo abbiamo a che fare con spazi di funzioni (spazi infinito-dimensionali.

Complesso. 124

SEQUENTIAL QUADRATIC PROGRAMMING FOR O.C. (SQP)

Let’s rewrite our optimal control problem (Nell’approccio SQP la dinamica sarà trattata

come un vincolo esplicito): −1

N

X

min l (x , u ) + m(x )

t t t N

{x } ∧ {u }

, ..., x , ..., u −1

1 0

N N t=0

subj. to: [f (x , u ) x = 0].

t t t t+1

With this techniques we may include other constraints, like INEQUALITY CONSTRAIN-

∨ {g ≤ ∧ ≤

TS. Simple bounds on control input state constraints =⇒ (x , u ) 0 g (x , u ) 0}.

t t t N N N

Saturation on the input time t or bounds on the state. Tipicamente quando troviamo CON-

STRAINED CONTROL PROBLEM troviamo anche gli altri vincoli sopra citati es. (FINAL

STATE CONTROL PROBLEM) (In tal caso m(x ) si potrebbe anche omettere).

N

The quadratic programming could also be extended for those other constraints.

min F (x, u), subj. to : h(x, u) = 0 =⇒ min F (w), subj. to : h(w) = 0

x,u w

x x

Now the domain variable is (entire optimization variable). w := .

u u

That’s exactly the same structure we have seen. Take SQP algorithm. We setup at each

iteration a quadratic program. QP at iteration k:

1

> 2

∇F ∇

[min (w ) ∆w + L(w , p )∆w]

k k k

w

2

∆w

>

∇h(w ×

subj. to: [h(w ) + ) ∆w = 0]. Where the underlined term is a N (n + m) N (n + m)

k k

dimensioned matrix. (Here we call p the Lagrange multiplier).

k

∇F

Simply constraint , where F is the cost function. Approssimazione quadratica del costo.

Il constraint sul vincolo è un vincolo affine. h(w ) = f (x , u ) x . La f (x , u ) non è una

t t t t+1 t t t

k

traiettoria, quindi h(w ) non è precisamente 0, mentre vogliamo che lo sia!

k

Let’s try to compute this optimization problem. Write the Lagrangian:

−1

N

X > −

L(x, u, p) = [l (x , u ) + p (f (x , u ) x )] + m(x )

t t t t t t t+1 N

t+1

t=0

> ∗

Cost function +(p constraint). We can rewrite:

t+1 −1

N

X >

L(x, u, p) = [H (x , u , p ) p x ] + m(x )

t t t t+1 t+1 N

t+1

t=0

where: >

H(x , u , p ) = l (x , u ) + p f (x , u )

t t t+1 t t t t t t

t+1

(HAMILTONIAN OF OUR SYSTEM). Now let’s look at the quantities we have to compute:

( ∇

[∇ F (x , u ) = l (x , u )]

x x t

k k kt kt

t t

∇ F (w = x, u) =

w ∇

[∇ F (x , u ) = l (x , u )]

u u t

k k kt kt

t t

(quite easy to compute). EXAMPLE: Suppose

l (x , u ) + l (x , u ) + l (x , u ) + m(x )

0 0 0 1 1 1 2 2 2 3

125

∇ l (x , u )]; So now, let’s look at the action. Little bit more

F (·) =

If I want to compute: [∇ 2 2 2

x

x 2

2 2

complicated. But not so much: the Action. Let’s compute every single term. L(w , p ) For

k k

w

> 2

the Action, I can forget p x = 0. I have to look only at the H(x , u , p ). L(x, u, p).

t+1 t t t+1 x x

t+1 t τ

2

6 ⇐⇒ ∇

Different instants (t = τ ). It depends only on x ! ( L(w , p ) = 0). What about the

t k k

x xτ

t

other quantities? 2 n×n

2

 ∇ ∈

= L(x, u, p) ] := Q

[∇ H (x , u , p ) R t

t kt kt k t+1 x x

x x t t

t t

 >

2 m×n 2

 ∈ ∇

(∇ L(x, u, p) ) := S = H (x , u , p )

 R t kt kt k t+1

x u t x u

t t t t

2 n×m 2

∈ ∇

(∇ L(x, u, p) ) = S = H (x , u , p )

R t t kt kt k t+1

 u x u x

t t t t

 2 m×m 2

∈ ∇

(∇ L(x, u, p) ) := R = H (x , u , p )

R t t kt kt k t+1

u u u u

t t t t

Where the first underlined equation’s LHS is not diagonal, and for the RHS: gli unici termini

{x }

che rimangono sono quelli che hanno solo i termini derivativi rispetto a u , u x , u u , x x

t t t t t t t t

dell’Hamiltoniano al tempo t.  

 

 (x )

1 1

..

 

 

 .

 

 

 

 

 

 

 

x (x )

1 1 n

 

 

 

 

.. ..

   

   

. .

   

   

   

x x

N N

   

x    

. .

. .

=

w = =

   

. .

u    

   

 

   

u (u )

0 0 1

   

   

.. ..

 

   

   

. . 

 

   

   

 

u (u )

−1 0 m

N 

 

 

 

.. 

 

.

 

 

u −1

N

where N (n + m) = dim w. What about the constraint equation (the dynamics):

 

f (x , u ) x

0 k0 k0 k1

.. = h(w )

 

. k

 

f (x , u ) x )

−1 −1 −1 −1

N k N k N k N

(known vector). #(·) = N n. > > >

∇h(w ∇ ∇

) ∆w = h(x , u ) ∆x + h(x , u ) ∆u

x u

k k k k

k

Just approximate to first order that one:

> >

∇ ∇

∆x = (A := f (x , u ) )∆x + (B := f (x , u ) )∆u =⇒

t+1 t x t t t u t t t t

kt kt

t t

> >

− ∇ ∇ −

=⇒ f (x , u ) x + f (x , u ) ∆x + f (x , u ) )∆u ∆x = 0

t x t t u t t t t t+1

kt kt k t+1 kt kt

t t

Linearizzazione del sistema nell’istante (x , u );

kt kt

−1

N > >

X > >

∇ ∇

min [(a := l (x , u ) )∆x + (b := l (x , u ) )∆u +

x t t t t u t t t t

t t

t t

{∆x } ∧ {∆u }

, ..., ∆x , ..., ∆u −1

1 0

N N t=0 126

> >

1 1

∆x ∆x

Q S

t t

t > 2

t ∇

+ + ∆x m(x )∆x N

kN

N

∆u ∆u

S R

2 2

t t

t t

− −

subj. to: r + A ∆x + B ∆u ∆x = 0, t = 0, . . . , N 1, where the underlined term is

t t t t t t+1

Q .

N [x + ∆x = x ] (APPROSSIMAZIONE ESATTA!) (Stiamo approssimando un

t+1 t+1

k t+1

termine lineare, quindi lo sviluppo in serie è esatto!)

Analizziamo ora l’intero problema da risolvere. Se non ci fosse il termine r , ad ogni itera-

t

zione avremo un problema che sapremmo risolvere facilmente. Basta generalizzare quello che

abbiamo visto l’ultima volta.

1 1 0 0

∆x̃ = ; ∆x̃ = (

à := )∆x + ( B̃ := )∆u

t t+1 t t t t

∆x r A B

t t t t

1 c

t

∆x̃ = , ∆x̃ =

t

0 ∆x ∆x

0 t ∧

(c costante e la posso inizializzare pari ad 1). (Stato aumentato) (Augmented State).

(introduciamo uno stato costantemente uguale ad 1).

Abbiamo riscritto il vincolo con una dinamica esterna. Si risolve cosı̀ un’equazione di Ric-

cati, controllo in feedback (STIMA FEEDBACK)! Riscrivendo anche il costo in questi termini,

abbiamo esattamente un LINEAR QUADRATIC PROGRAM (LQP). (Sappiamo esattamente

come codificare questo problema):

• INITIALIZE:    

x u

01 00

... ...

x = , u =

   

0 0

   

x u −1

0N 0 N

• REPEAT: k = 0, 1, . . . (

x = x + ∆x

k+1 k k

u = u + ∆u

k+1 k k

p = l . (∆x , ∆u ) by solving the QP, l Lagrange multiplier of QP.

k+1 k k k k

Vantaggi e svantaggi: sicuramente, il vantaggio e lo svantaggio principale sono esattamente

invertiti rispetto al precedente. VANTAGGIO: risoluzione del problema ad ogni iterazione,

ottenendo una soluzione stabilizzante in retroazione (metodo con l’equazione di Riccati, la

quale è ben posta). Metodi che si prestano bene anche a sistemi instabili! Qui lo svantaggio

del metodo Shooting è diventato un vantaggio! SVANTAGGIO: [x = x + ∆x ]. Potrebbe

k+1 k k

− 6

non essere una traiettoria! Quindi la differenza f (x , u ) x = 0 in realtà non sarà 0!

t t t t+1

Andando avanti con le iterazioni, si può invece notare che alla fine il vincolo sarà soddisfatto.

Condzioni di matching esatto sui vincoli. Soddisfacimento dei vincoli. Altri eventuali svantaggi

del metodo. La Q e la R le calcoliamo come lo Hessiano dell’Hamiltoniano! In generale,

succede che R non sia definita negativa nello spazio di u. Problema risolto scegliendo anziché

t c

Q 0

t

l’Hessiano, un’altra matrice semidefinita positiva. Altra soluzione: START with: .

c

0 R t

Stiamo sostanzialmente prendendo l’Hessiano del costo e non la parte dell’Hessiano relativa alla

dinamica. Quando sono sufficientemente vicini, possiamo cominciare a mettere l’Hessiano.

( ct 2

≤ ∇

0 Q = l(x , u )

kt kt

x x

t t

c 2

≤ ∇

0 R = l(x , u )

kt kt

t u u

t t

127

(in pos. semidef. sense). Convergenza quadratica in scala logaritmica, differenza di velocità

tangibile se mettiamo uno SWITCH, dopo il quale utilizziamo l’Hessiano 0

Secondo svantaggio quindi facilmente recuperabile. Metodo che funziona anche globalmente

(SWITCH). Come si risolve invece la traiettoria?

PROJECTION OPERATION NEWTON’S METHOD

PROJECTED SQP. IDEA: HANSER (TC) tempo continuo. Invece di utilizzare la dina-

mica ad anello aperto =⇒ x = f (x , u ), utilizzo: [u = µ + k (x α )]. Struttura

t+1 t t t t t t t t

FEEDFORWARD+FEEDBACK. (µ , α ) è una sorta di riferimento che vogliamo inseguire. Se

t t

scegliamo (µ , α ) = (x + ∆x , u + ∆u ), succede che possiamo muoverci su traiettorie! E la

t t k k k k

cosa interessante è che r = 0!

t

Idea of what are other tools: How to control a system with a feedback law. Stabilize an

equilibrium or track a trajectory. What is the problem? We’ve not taking into account any

cost/constraint on the state/input.

MODEL PREDICTING CONTROL

·(x).

(MPC) u = This technique is very powerful. It gives us a sort of optimality criteria.

The drawback is that it is really very computational consuming. Based on optimal control.

ALGORITHMIC IDEA:

• MEASURE/OBSERVE THE CURRENT SYSTEM STATE x (We are at some time t.

t

Look at x );

t

• . . . then predict and optimize the FUTURE BEHAVIOR of the system OVER A FINITE

HORIZON (N steps ahead), design a control input u that optimizes the predicted behavior

of the system over this time horizon;

• IMPLEMENT ONLY THE FIRST INPUT u . Some kind of optimal trajectory (wrt some

t

criteria) starting in t. We find a u(t) that optimizes a predicted trajectory (predicted on

the model). Feedback. We don’t want to run the system into open-loop. We do not

like open-loop controls! Find a way to implement feedback. What is the optimal control

problem we need to apply? −1

t+N

X

min l (x̄ , ū ) + m(x̄ )

τ τ τ t+N

x̄ , ..., x̄

τ t+N τ =t

¯

subj. to: x̄(τ + 1) = f (x̄ , ū ), τ = t, t + 1, . . . , t + N and eventually also subj. to:

τ τ τ

{g(x̄ ≤ ∧ ≤ ∀

, ū ) 0 g (x ) 0}. Computed each t (the same previous structure).

τ τ t+N t+N

¯

We put a bar (·) because this is the model, not the control system! Note that x̄ = x (the

t t

one we’ve measured);

What is the schema? Is the following. Suppose we have the entire state of this plant

system. (suppose we can measure x ). The idea is: solve for the entire time horizon, and set

t

?

[u = ū ]. (Take the first input). Really time computational consuming. We need to run the

t t

algorithm at each interval of time! Problemi tipo stabilità (nel lungo periodo). Questa filosofia

del controllo è anche chiamata RECEDING HORIZON CONTROL (orizzonte temporale che

si muove in avanti). Per tanti anni lo schema è stato applicato soprattutto nell’industria di

processo (dinamica lenta) ma si applicava senza avere delle condizioni di stabilità da soddisfare;

∀t?

Si rispettano i vincoli Problemi anche di ottimalità. L’ottimalità che ho sull’orizzonte

128

limitato non mi garantisce nulla sull’orizzonte infinito =⇒ si scelga un N sufficientemente

⇐⇒

grande N 1. (Comunque abbuiamo tutta una serie di vantaggi) (Soddisfacimento

∀t

dei vincoli dimostrabile). Aspetti numerici per la soluzione efficiente. Ottimalità della

legge di controllo. Setup delle condizioni iniziali dei vari problemi con le precedenti traiettorie

ottime Aspetti sulla velocizzazione =⇒ REAL TIME OPTIMIZATION (RTO); (controllo

+ ottimizzazione);

4.3.2 CONTINUOS-TIME OPTIMAL CONTROL

Teoria del controllo ottimo applicata finora a tempo continuo.

Minimizziamo su (x , . . . , x ) e su (u , . . . , u ). Grandi vettori. A sistemi dinamici tempo

−1

1 0

N N

continuo, non dobbiamo più ottimizzare un vettorone, ma su delle funzioni! Spazi di funzioni,

n

non più . Idea dell’impostazione. Questa volta dobbiamo ottimizzare wrt delle funzioni!

R t

Z f

min )

l(x(τ ), u(τ ), τ )dτ + m(x t

f

x(·),u(·) t

0

subj. to: ẋ(t) = f (x(t), u(t), t), x(t ) = x CINIT. Problemi di controllo ottimo (unconstrai-

0 0

ned). A questo punto, si può sviluppare una teoria (ricerca di condizioni di stazionarietà del I,

II ordine. Condizioni sufficienti del II ordine, etc.). Spazi di funzioni (infinito dimensionali). In

uno spazio infinito-dimensionale, molte delle proprietà non sono ovvie. Operativamente, come

si sviluppano le condizioni? In analogia:

t

Z f > −

J(x, u) = l(x(τ ), u(τ ), τ ) + p(τ ) (

ẋ(τ ) f (x(τ ), u(τ ), τ ))dτ + m(x ) =

t

f

t

0 t

Z f >

−H(x(τ ), u(τ ), τ, p(τ )) + p(τ ) ẋ(τ )dτ + m(x )

= t

f

t

0

HAMILTONIANO: > −

[H(x(t), u(t), p(t)) = p(t) f (x(t), u(t), t) l(x(t), u(t), t)]

(in realtà, si potrebbe definire l’hamiltoniano come la somma di questi due termini se volessimo

utilizzare la precedente convenzione).

Dopodiché si procede andando a calcolare una sorta di derivata prima di questa quantità

(VARIAZIONE PRIMA) e la si ponga uguale a 0. Sviluppare una variazione prima (equivalente

della derivata prima), e porla = 0. Si trovano delle condizioni equivalenti a quelle che abbiamo

trovato a tempo discreto.

FNC OF OPTIMALITY (Stazionarietà)

( ? ? ? ? ? ?

−∇ −∇m(x

ṗ(t) = H(x , u , p (t), t), p (t ) = (t )

x f f

? ? ?

∇ H(x (t), u (t), p (t), t) = 0

u

Va in realtà anche aggiunta l’equazione della dinamica. (Non è la classica integrazione alla

quale siamo abituati). ( ? ? ?

ẋ = H(x (t), u (t), p (t), t)

t p

ẋ = f (x , u , t), x(t ) = x

t t t 0 0

129

(bvp). Il controllo ottimo è un qualcosa per il quale il gradiente dell’Hamiltoniano wrt u

?

{u |

ad un certo istante viene posto uguale a 0. max H}. Si può sviluppare una teoria per il

lineare quadratico.

Gli algoritmi sono tutti basati sulla discretizzazione di questi problemi, e dopodiché si

applichino i precedenti metodi a tempo discreto.

>

−A ∇

[

ṗ(t) = (t)p(t) + l(x(t), u(t), t)]

x

(COSTATE EQUATION). Equazione differenziale matriciale di Riccati. Dualità. Scri-

vere un problema equivalente a quello di partenza. Struttura equivalente parallelizabile e/o

distribuita.

CONTINUOS-TIME OPTIMAL CONTROL

Let’s just recall what we’ve seen: OPTIMAL CONTROL FNC. We can write the Hamilto-

nian H: > −

[H(x, u, p) = p f (x, u, t) l(x, u, t)]

−H.

(We can define also as Just a convention matter). FNC:

? ? ? ? ? ?

 ∇

ẋ = H(x , u , p ), [ ẋ = f (x , u )], x (t ) = x

p 0 0

 ? ? ? ?

−∇ −∇m(x(t

p = H(x , u , p ), p (t ) = ))

x x f f

 ? ? ?

∇ H(x , u , p ) = 0

 u

where the underlined term is just the dynamic. The second equation is the COSTATE EQUA-

TION. As in discrete time, let’s see what a quadratic problem is:

t

Z 1

f > > >

min x (t)Q(t)x(t) + u (t)R(t)u(t)dt + x (t )Q x(t )

f f f

2

x(·),u(·) t

0

subj. to: ẋ(t) = A(t)x(t) + B(t)u(t), x(t ) = x ;

0 0

This is the exact counterpart of the discrete-time version. Much more difficult theory.

Function spaces. Let’s try to write FNC of optimality for the LQP problem! It can be proven

∃ ≥ ≥

that the solution of this problem exists! =⇒ If we assume Q(t) 0, Q (t) 0, R(t) > 0, we

f

have an unique solution of this problem. We can develop a theory, with much complex math.

? ?

[

ẋ(t) = A(t)x (t) + B(t)u (t)]

What is the Hamiltonian in this case? 1 1

> > >

− −

H(x, u, p, t) = p (A(t)x(t) + B(t)u(t)) x Q(t)x u R(t)u(t)

2 2

The derivative of H wrt x is: >

? ? ?

− −

ṗ (t) = A (t)p (t) Q(t)x (t)

(

x(t ) = x

0 0 ?

−Q

p(t ) = x (t )

f f f

This is a two point boundary value problem (TPBV). It’s really the same development.

> ?

∇ −

We have to deal with differential equation. Let’s write H(·) and let it be 0. B(t) p (t)

u

? ? ⇐⇒

R (t)u (t) = 0, and as in discrete time, if R is invertible ( R > 0) =⇒

?

? ?

1

u (t) = R (t)B(t)p (t)

130

A parte il (−) davanti, è la stessa struttura che abbiamo visto a tempo discreto. Equazioni

di Eulero-Lagrange sono ottenute imponendo questa ”variazione prima” uguale a 0. Calcolo

variazionale. In generale anche l’Hamiltoniano è dipendente dal tempo (le matrici A e B possono

? ? ?

esse stesse dipendere dal tempo) =⇒ H := H(x , u , p , t) (funzione di vettori). Now how do

? ?

−Q

we proceed? Exactly at the discrete-time version. At final time, [p (t ) = x (t)]. It can be

f f

? ?

−P 6 −Q

proven that: p (t) = (t)x (t). Sort of P (t) = . It is much more difficult to prove this.

f

Then, we can write it as: >

? ? ?

1

−R −p

(. . . ) = u (t) = (t)B (t)(P (t)x (t) = )

So we can write our optimal control law as:

>

? ? ?

−R(t)B

[u (t) = (t)P (t)x (t) = K (t)x (t)]

0

>

−R(t)B

, where K (t) = (t)P (t). If we have P (t), then we can compute optimal control as a

0 ?

feedback of the state. Let’s differentiate p (t). Some time-dependence:

>

? ? ? ? ? ? ? ?

− − −A − −

[

ṗ = Ṗ (t)x P ẋ ] =⇒ p + Qx = Ṗ x P (Ax + BK x )

0

Now let’s use the fact that: > ? ? ? ? ?

− − −

A P x + Qx = Ṗ x P Ax P BK x

0

We get: > ?

[(

Ṗ + P A + P BK + A P + Q)x = 0]

0 ? ∀

(Just moving everything on the left side and collect x ). Since this condition must hold

?

possible x , we have to set this matrix equation = 0:

> > >

1

Ṗ + P A + P BK + A P + Q = 0 =⇒ Ṗ + A P P BR B P + Q = 0

0

where the underlined term is just K exploded. This is a matrix differential equation =⇒

0

(DIFFERENTIAL RICCATI EQUATION (DRE)). Initialize by P (t ) = Q . It is a differential

f f

equation. What we have to find? P . All other terms are known. Risolvibile numericamente.

P 0. Vettorizzazione della P , operativamente. Soluzione di equazioni differenziali che cal-

colano P a partire da Q . Dopodiché si genera il feedback ottimo in questa maniera. La cosa

f →

− ∞,

interessante è, it can be proven that under controllability conditions as t the feedback

f

>

1

−R

u(t) = (t)B (t)P (t)x(t) stabilizes the linear time-varying system f . (LTV system). More

results. How can we stabilize a trajectory, a LTV system? Linearize a nonlinear system at a

some trajectory and then apply this optimal INPUT using this expression. Ma, dovendo sce-

?

gliere R e Q, otteniamo P e poniamo u(t) = K x (t). Reverse engineering. Rendo le prestazioni

0

che voglio avere ad anello chiuso e scelgo opportunamente la R e la Q, e quindi ottengo una

P apposita. Teorema di Lyapunov. Questo ingresso in feedback stabilizza esponenzialmente il

nostro LTV. ⇐⇒ ∀t,

As in discrete-time, if we have a TIME INVARIANT system P (t) = P̄ then I can

write: > >

1

P̄ A + A P̄ P̄ BR B P̄ + Q = 0

6

and now we find a constant P = P (t), found by the resolution of an algebraic matrix equation

called ALGEBRAIC RICCATI EQUATION (ARE). Perfetta analogia con il tempo discreto. P

costante =⇒ K costante. Feedback di TDS. Autovalori imposti a seconda di come scegliamo

R e Q (Reverse engineering). Matching TD-TC. Strumenti di lavoro molto potenti.

131

4.4 DUALITY (FOR PARALLEL AND DISTRIBUTED OP-

TIMIZATION)

Tool of optimization theory as a tool to solve parallel and distributed optimization problems.

So, let’s go back to our problem. What we have seen? FNC, SNC, algorithms, etc. min f (x)

x∈X

≤ ∀j ∈ {1,

subj. to: g (x) 0 . . . , r}. Started by saying: let’s write the Lagrangian:

j r

X

L(x, µ) = f (x) + µ g (x)

j j

j=1

Remember when we have written KKT’s conditions. What we have to satisfy is:

? ? ?

∇ ≤ ∀j

L(x , µ ) = 0 =⇒ g (x ) 0

x j

?j ?j ?

and then we have set: µ 0 =⇒ µ g (x ) = 0 (Complementary). So now, let’s keep that in

j

mind:

Let’s look at this Lagrangian for a feasible point. Let x̄ be a feasible point:

r

X

L(x̄, µ) = f (x̄) + µ g (x̄)

j j

j=1

? ≥ ∀j ∈ {1, ·(µ).

Suppose to impose µ 0 . . . , r}. and let’s look at L = We have f (x̄), cost at

j rj=1

P ≤

µ g (x̄), which is always 0! Let’s look at what does it mean:

x̄, plus the quantity j j

{L(x̄), f (x̄)} =⇒ [L(x̄, µ) < f (x̄)]

∀j ∀µ ≥ ∈ {1,

0, j . . . , r}.

j

So, L(x̄, µ) becomes a lower bound for the minimum of this function. La L, per ciascun

punto feasible. Abbiamo trovato un lower bound per il costo ottimo. Non solo per i µ ottimi,

?

∃f ∃

ma per tutti i µ! Let’s assume that = inf f (x) and at least a feasible solution. We’re

x∈X

just assuming that the problem is not unbounded. And then let’s define a Lagrange multiplier

?j ? ? ?

≥ |

µ 0 (∀j) f = inf L(x, µ ). In other words, If I plug µ in the Lagrangian, I have exactly

x∈X ∀µ,

the optimal cost. Let’s go back to our inequality. Since it holds it is sure to optimize over

µ. Let’s do it in this way: q(µ) = inf L(x, µ)

x∈X

∀x.

(DUAL FUNCTION). That relation holds Let’s try to maximize q(µ), which is called

DUAL FUNCTION. max q(µ) (DUAL PROBLEM). Trovare il µ migliore che mi alzi il più

µ≥0

possibile questa quantità. L’obiettivo principale è ottenere proprio il costo ottimo! Let’s look

−∞!

at the function q(µ) = inf L(x, µ). Notice that that inf can also be Let’s suppose for

x∈X ≤ −

example, a linear problem (SCALAR): min cx subj. to: ax b =⇒ L(x, µ) = cx + µ(ax b).

Let’s collect: L(x, µ) = (c + µa)x µb. Let’s suppose I fix some µ, it can happen that I get

−∞. Our q(µ), i.e. will have a particular domain:

{µ ≥ | −∞}

D(q) := D = 0 q(µ) >

| −∞

Let’s not just consider all the µ: Let’s consider only µ q(µ) > (finite value). Otherwise

we’ll not have a well defined function. This will be. Only some µ will be possible. In the

example, {µ ≥ |

D = 0 c + µa = 0}

132

6 −∞).

(The only way to get q(µ) = But this set is something we’ll need to take into account.

Quelle µ possibili non sono tutti! Se calcolo inf L su x per una data µ fissata, potrebbe benissimo

−∞. −∞!)

essere i.e. (Il valore più basso che una retta può avere è Se voglio avere una q(µ) ben

·µ

definita, devo dunque scegliere un dominio D(q) = D(q(µ)) = ben definito! Restringiamo

quindi il campo delle µ possibili! So, an interesting property is the following:

Proposition 9. (Very strong property)

(−q is convex on D). The domain D of q is convex, and q is CONCAVE on D

=⇒ (convex problem).

The trick is that If I take a general problem, the dual problem will not give informations

about the original problem. Some structure related.

Proposition 10. (WEAK DUALITY)

? ? ?

q = sup q(µ) =⇒ q f

µ≥0 ?

In other conditions, (under certain assumptions). q fornisce sempre e comunque (su D) un

?

lower bound per f . ( ? ?

q < f =⇒ DU ALIT Y GAP

? ?

q = f =⇒ N O DU ALIT Y GAP

Of course, we have always informations about the original problem

? ?

≥ ). Let’s see under what conditions we have NO DUALITY GAP.

(i.e. a lower bound of f q

Proposition 11. SADDLE-POINT THEOREM

? ? ? ?

∈ ≥

(x , µ ) is an optimal solution-Lagrange multiplier pair such that: (x X), µ 0, and

? ?

(x , µ ) is a saddle point of the Lagrangian, i.e.:

? ? ?

≤ ≤

L(x, µ) L(x , µ ) L(x, µ )

{∀x ∈ ∧ ≥

X µ 0}. ? ?

Really important. Teorema del punto di sella. (x , µ ) è un punto di sella del nostro

property (STRONG

Lagrangiano. So now, the point is when is possible to get strong duality

DUALITY). Of course one can have several conditions. The most common one is the following:

≤ ∈ {1,

min f (x) subj. to: g (x) 0, j . . . , r}. PRIMAL PROBLEM (P). Conditions for

j

x∈X

the STRONG DUALITY:

• ?

ASSUMPTION 1: The primal problem (original problem) P is feasible and f is finite

? n

⇐⇒ ⊂

f < +∞ =⇒ (X ) is convex =⇒ f, g are convex over X. We have basically

R

that our original problem has to be convex and to have a finite optimal value;

• ASSUMPTION 2 (sometimes called SLATER’S CONDITION):

∃x̄ | ∀j ∈ {1,

g (x̄) < 0 . . . , r}

j n

(Exists at least a point strictly inside the convex set X );

R

Proposition 12. Under some conditions (ass. 1 e 2) =⇒ there is NO duality gap and exists

?

∃µ

at least . ? ? ?

(Exists at least µ that in pair with (x , µ ) will be a saddle point of the Lagrangian).

L’inghippo è in questa proposizione (Si possono provare certe altre condizioni, ma questa

è la più comprensibile). Problema duale rappresentativo del primale quando è convesso il

primale. Se questo non è vero, allora il problema duale sarà ancora convesso, risolvibile,

ma ci sarà DUALITY GAP con il problema primale (original problem).

133

EXAMPLE

> >

≤ ≤ ∈

min c x subj. to: Ax b. Linear problem (quadprog-solvible). (P). a x b , j

j

j

{1, . . . , r}. Let’s write the Lagrangian: > > −

L(x, µ) = (c x = f (x)) + µ (Ax b)

Let’s derive the dual problem of this primal problem (P):

> > >

(c + µ A)x µ b

q(µ) = inf d

x∈X⊂R > > 6 −∞

So let’s look at what is the domain of the dual. If (c +µ A = 0), then =⇒ inf q(µ) = =⇒

> > > >

{µ ≥ | −µ

D = 0 c + µ A = 0 =⇒ c = A}

How can we define q(µ)? ( > ∀µ ∈

− b, D

µ

q(µ) = − ∞, otherwise

·µ

q = q(µ) = because we’re already minimized it wrt x.

Let’s write the dual problem: >

−µ

max q(µ) =⇒ max b

µ

µ≥0

> >

{c ∧ ≥

subj. to: + µ A = 0 µ 0}.

← How can we write it? Let’s look at the properties of the dual problem. Linear cost

function. Convex function. Linear inequality constraint (convex constraint) e supponiamo che

∃ almeno un x che faccia valere strettamente il vincolo di disuguaglianza.

The point is: what happens if we have also equality constraints:

( ∈ {1,

h (x) = 0, i . . . , m}

i

min f (x), subj. to : ≤ ∈ {1,

g (x) 0, j . . . , r}

j

m r

X X

[L(x, µ, λ) = f (x) + λ h (x) + µ g (x)]

i i j j

i=1 j=1

q(λ, µ) = inf L(x, λ, µ) =⇒

x∈X

then the dual problem is: max q(λ, µ). DUAL PROBLEM. In terms of strong duality, we

λ,µ≥0

¯

can simply have that x̄ such that il vincolo di disuguaglianza sia soddisfatto. Per i vincoli di

∀i ∈ {1,

uguaglianza, quando abbiamo un punto feasible, automaticamente λ h (x) = 0 . . . , m}.

i i

(Non abbiamo bisogno di imporre nulla). Stessa cosa per il dominio:

{λ, ≥ | −∞}

D = µ 0 q(λ, µ) >

134

4.4.1 DUAL ALGORITHMS: DUAL ASCENT

max q(µ). So now, let’ suppose we have strong duality. Concentrate on how to solve the

µ≥0

dual problem. In some cases, it will be more convenient to setup a dual problem. Assume q(µ)

has some regularity (it will be not always this case), but assume it is at least differentiable.

Maximization =⇒ (GRADIENT ASCENT). We have a constraint (µ 0), so we can apply a

PROJECTED GRADIENT METHOD (PGM) =⇒ +

∇q(µ

µ = [µ + α )] = (. . . )

k+1 k k k

project this quantity on positive µ’s: {0, ∇q(µ

(. . . ) = max µ + α )}

k k k ∇q(µ),

(proiettato sull’ottante positivo); Se disponiamo del gradiente di q(µ) =⇒ allora è una

∇q(µ)

cosa che sappiamo fare. q(µ) is somehow implicitly defined. The problem is = ? It is

now always the case to explicitly write q(µ) in closed form. However,

>

q(µ) = inf L(x, µ) = inf (f (x) + µ g(x))

x∈X x∈X

r

> P ≤

where [µ g(x) = µ g (x)] 0. The temptation is to say: find (x, µ) that for some special

j j

j=1

(x̄, µ̄) gives us the inf: >

x = [arg min (f (x) + µ g(x))]

µ x∈X

Intuitivamente, potremmo trovare x che raggiunge il minimo della minimizzazione. Mettendolo

µ

dentro L e derivando rispetto a µ, potremmo trovare un g(x ).

µ

Proposition 13. Let X be a compact set and f, g continuos over X. Also assume that for

r

∀µ ∈ ∀µ ∃!

every , L(x, µ) is minimized over X at a unique point x . (Assuming that x

R µ µ

1

∈ ∇q(µ) where: x =

that minimizes that). Then =⇒ q(µ) C everywhere, and = g(x ), µ

µ

arg min L(x, µ).

x∈X

So, how does the algorithm become here?

DUAL ASCENT

COMPUTE: x = arg min L(x, µ ) (Start with some µ ) (Possibile inizializzazione a 0

0

kµ k

x∈X

di µ ).

k +

µ = [µ + α g(x )]

k+1 k k kµ

(projected on the positive octant);

DUALITY: {g ≤ ∈ {1, ∧ ∈ {1,

min f (x) subj. to: (x) 0 j . . . , r} h (x) = 0 i . . . , m}} (PRIMAL

j i

x∈X

PROBLEM). m r

X X

q(λ, µ) = inf [f (x) + λ h (x) + µ g (x)]

i i j j

x∈X i=1 j=1 ≥

Write q(λ, µ); CONCAVE q OVER CONVEX D. max q(λ, µ) subj. to: µ 0 (DUAL

PROBLEM). ? ?

q = sup q(λ, µ) f = inf f (x)

x∈X

λ,µ≥0 135


PAGINE

173

PESO

1.29 MB

AUTORE

DekraN

PUBBLICATO

+1 anno fa


DESCRIZIONE APPUNTO

Le seguenti dispense vogliono essere un resoconto didattico del corso di Advanced Control Techniques, il quale docente `e il prof. Giuseppe Notarstefano, presso il CdL in Ingegneria Informatica at Unisalento. Il programma copre tre macro-argomenti: la Stabilit`a dei Sistemi Non Lineari, i Sistemi Distribuiti ed Algoritmi di Ottimizzazione (distribuiti), piu` un piccolo capitolo circa delle Esercitazioni. Il seguente lavoro non ha la pretesa di sistematizzare in maniera esatta ci`o che `e stato fatto a lezione, sebbene l’impostazione di base cerca di seguire fedelmente ogni singola lezione che `e stata erogata. Piuttosto cerca di raccogliere in maniera sintetica ma al contempo esaustiva gli aspetti chiave della Teoria del Controllo Avanzato, di fare ordine tra una quantit`a non indifferente di nozioni, e soprattutto di dare un sussidio alla preparazione di un eventuale esame didattico che richieda la conoscenza di queste nozioni e relativa applicazione per quanto concerne il lato pratico.


DETTAGLI
Corso di laurea: Corso di laurea magistrale in computer engineering
SSD:
A.A.: 2016-2017

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher DekraN di informazioni apprese con la frequenza delle lezioni di Advanced Control Techniques e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Salento - Unisalento o del prof Notarstefano Giuseppe.

Acquista con carta o conto PayPal

Scarica il file tutte le volte che vuoi

Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato

Recensioni
Ti è piaciuto questo appunto? Valutalo!

Altri appunti di Corso di laurea magistrale in computer engineering

Appunti di Robotica
Appunto