Estratto del documento

Autonomous Agents and Multiagent Systems

Notes for A.Y. 2016-2017

Alessandro Vago

February 1, 2017

1

Contents

1 Introduction 5

2 Single Agent Setting 7

2.1 Intelligent agent properties . . . . . . . . . . . . . . . . . . . . 7

2.2 Formal framework for agent definition . . . . . . . . . . . . . . 8

2.2.1 When is an agent a good agent? . . . . . . . . . . . . . 10

2.2.2 Special cases of this general framework . . . . . . . . . 10

3 Markov Decision Processes - MDP 11

3.1 Bellman Equation . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Multiagent systems 18

4.1 Model of knowledge . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Formal definition of Common knowledge . . . . . . . . . . . . 19

4.3 Cooperative Distributed Problem Solving (CDPS) . . . . . . . 20

4.4 Distributed Constraint Satisfaction Problem (DCSP) . . . . . 21

4.4.1 Asynchronous Backtracking Algorithm . . . . . . . . . 22

5 Self-interested agents in multiagent setting 24

5.1 Normal Form Strategic games . . . . . . . . . . . . . . . . . . 24

5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.2.1 Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . 26

5.2.2 Battle of the sexes . . . . . . . . . . . . . . . . . . . . 26

5.2.3 Chicken game . . . . . . . . . . . . . . . . . . . . . . . 27

5.2.4 Rock, Paper, Scissors . . . . . . . . . . . . . . . . . . . 27

5.3 Solution concept . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3.1 Dominant strategy . . . . . . . . . . . . . . . . . . . . 28

5.3.2 Dominated strategy . . . . . . . . . . . . . . . . . . . . 28

5.3.3 Pareto optimality . . . . . . . . . . . . . . . . . . . . . 28

5.3.4 Social welfare . . . . . . . . . . . . . . . . . . . . . . . 29

5.3.5 Nash equilibrium . . . . . . . . . . . . . . . . . . . . . 30

6 Coalitional games 32

6.1 Formal model for Cooperative games . . . . . . . . . . . . . . 33

6.2 Superadditive game . . . . . . . . . . . . . . . . . . . . . . . . 33

6.3 Convex game . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.4 The core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.4.1 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.4.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2

6.4.3 Objection . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.4.4 Definition of Core . . . . . . . . . . . . . . . . . . . . . 36

6.5 Benefit Division . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.5.1 Shapley Values . . . . . . . . . . . . . . . . . . . . . . 37

6.5.2 Properties of Shapley value . . . . . . . . . . . . . . . 38

6.6 Coalition structure generation . . . . . . . . . . . . . . . . . . 39

6.6.1 Distributed algorithm for coalition structure generation 40

7 Negotiation 42

7.1 Elements involved in negotiation . . . . . . . . . . . . . . . . . 42

7.2 Classification of different types of negotiation . . . . . . . . . 42

7.3 Strategic approach for resource division . . . . . . . . . . . . . 43

7.3.1 Rubinstein alternating offer protocol . . . . . . . . . . 44

7.4 Axiomatic approach to negotiation . . . . . . . . . . . . . . . 45

7.4.1 Domination of deals . . . . . . . . . . . . . . . . . . . 46

7.4.2 Egalitarian solution . . . . . . . . . . . . . . . . . . . . 47

7.4.3 Egalitarian social welfare . . . . . . . . . . . . . . . . . 47

7.4.4 Utilitarian solution . . . . . . . . . . . . . . . . . . . . 47

7.4.5 Nash bargaining solution . . . . . . . . . . . . . . . . . 48

7.5 Finding the negotiation solution . . . . . . . . . . . . . . . . . 48

7.5.1 Monotonic concession protocol . . . . . . . . . . . . . . 48

7.5.2 Zeuthen strategy . . . . . . . . . . . . . . . . . . . . . 49

8 Voting 50

8.1 Formal framework . . . . . . . . . . . . . . . . . . . . . . . . . 50

8.2 Plurality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

8.3 Condorcet Winner . . . . . . . . . . . . . . . . . . . . . . . . 52

8.4 Condorcet winner condition . . . . . . . . . . . . . . . . . . . 52

8.5 Sequential majority voting . . . . . . . . . . . . . . . . . . . . 52

8.6 Majority graph . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8.7 Borda count . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

8.8 Theoretical results in voting . . . . . . . . . . . . . . . . . . . 55

8.8.1 Pareto property . . . . . . . . . . . . . . . . . . . . . . 55

8.8.2 Independence of irrelevant alternatives (IIA) . . . . . . 55

8.8.3 Non dictatorship property . . . . . . . . . . . . . . . . 55

8.8.4 Arrow’s theorem . . . . . . . . . . . . . . . . . . . . . 55

9 Auctions 56

9.1 Reservation price . . . . . . . . . . . . . . . . . . . . . . . . . 56

9.2 Winner determination . . . . . . . . . . . . . . . . . . . . . . 57

9.3 English auction . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3

9.4 Dutch auction . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

9.5 First-price sealed-bid . . . . . . . . . . . . . . . . . . . . . . . 58

9.6 Vickey Auction . . . . . . . . . . . . . . . . . . . . . . . . . . 58

9.7 Auctioneer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

9.8 Problems of real world . . . . . . . . . . . . . . . . . . . . . . 59

9.8.1 Collusion fo the bidders . . . . . . . . . . . . . . . . . 59

9.8.2 Honesty of the auctioneer . . . . . . . . . . . . . . . . 59

9.8.3 Shills . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

9.9 Combinatiorial auctions . . . . . . . . . . . . . . . . . . . . . 60

4

1 Introduction

Trends in Computer Science:

1. Ubiquity: Distributed Computing devices are spread everywhere (costs

are lowering every year);

2. Interconnection: These devices are more and more interconnected,

they talk to each other;

{ubiquity →

+ interconnection} Global computing

3. Intelligence: every single device is able of doing complex tasks, like

landing an aircraft;

4. Delegation: we delegate to computers more and more tasks;

{Intelligence →

+ Delegation} Autonomy

5. Human Orientation: Computing devices are coming close to how

humans live and think, they help our lives.

Action a

Agent Environment

Observation o

We talk about Intelligent (Rational) Agents, agents that try to maximize

or optimize something. (we can call it a Utility Maximizer).

When can an agent be called “Autonomous”?

1. I program what the agent will do at design time (I hard code all possible

actions), I know exactly the problem. This is NOT autonomous.

2. I can program the agent in order to make decisions according to per-

ceptions taken from environment. The agent is equipped with Decision

5

making, it is able to run a program that provides the possibilities, and

it chooses the best one based on perception. This is Autonomous.

3. I equip my agent with decision making, but also with another program,

that can modify the first program, so that the action made on a partic-

ular perception can change over time. The second meta-program can,

for example, change some thresholds in the first program. This is called

Learning.

In a Multiagent system we have multiple agents that interact, among them

and with the environment.

Agent Human

Multi-agent system

Just the presence of other agents can modify the behaviour of an agent, they

interact without communication. Common Knowledge is a basis for the

interaction of multiple agents. 6

2 Single Agent Setting

We face two kinds of problems:

• local problem (agent problem);

• social problem (interaction between agents);

Local problem: how to develop a single autonomous agent.

But, what is an agent? There are many definitions about agents, we will

refer to a very general definition:

An agent is a computer system that is able to perform autonomous

actions in an environment in order to satisfy/achieve some goals.

Action a

Agent Environment

Observation o

Perception - Decision - Action loop: the agent perceives the environ-

ment, reasons and reaches a decision and then applies this decision by acting

on the environment.

In this course we will study intelligent agents, that can make complex

reasoning.

2.1 Intelligent agent properties

Reactivity – Being able to respond to changes in the environment in a

reasonable (useful) time. E.g. think about a robot, able to change its

route when it finds an unexpected obstacle.

Proactivity – Being able to act in order to reach a final goal. E.g. if

the previous robot must reach a point B, and it is both proactive and

7

reactive, when it finds obstacles it will avoid them but then it will still

try to reach the final point.

Sociality – The property of being able to interact with other agents in

different ways:

Cooperation : there is a system-wide goal, everyone acts in order to

reach it (e.g. for every member of a team of robot-soccer, the goal

is to win the match)

Competition : agents have conflicting goals (e.g. the two teams of

robot-soccer)

Coordination : agents have different goals, not necessarily conflict-

ing, and they organize in order to not interfere, avoiding some

actions or states.

In some way, we can see an agent as an Object in an Object Oriented lan-

guage. In an OO program, when an object o wants to call a method on

1

object o , there is no native way for o to refuse the method invocation.

2 2

However, an agent may refuse to accept a message from another agent, as

it may be busy doing something else. a asks something to a , but is a to

1 2 2

choose, autonomously, what to do.

2.2 Formal framework for agent definition

We will now define some useful concepts.

• {e · · · }

E = , e , e , It’s the set of all possible states of the environ-

0 1 2

ment that the agent can distinguish. It is a partition of the real

states induced by the sensors of the agent. E.g. if an agent can only

{rainy,

distinguish if it’s sunny or rainy, E = sunny}.

• {α · · · }

Ac = , α , The set of possible actions that the agent can per-

0 1

form in order to (try to) change the environment (and, eventually, reach

the goal). α α

0 1

• · · · −→ −→

run = e , α , e , α , e e e

0 0 1 1 0 1 2

A run, defined as a sequence of interleaved states and actions. It can

be infinite.

• R = set of all possible runs (infinite set) α α

E 0 1

• −→ −→

R = set of all runs that end in a state(e.g. e e e )

0 1 2

8 α α

Ac 0 1

• −→ −→)

R = set of all runs that end with an action (e.g. e e

0 1

• Transition function, can be expressed in different ways:

Deterministic Ac →

τ : R E

Returns the next perception of the environment.

Non-deterministic Ac E

τ : R 2

E

2 is the set of all the possible subsets of E. The deterministic case

is a special case of the non-deterministic one (for every transition

there is only one possible state).

Stochastic or Probabilistic Ac →

τ : R ∆(E)

It returns a probability distribution over the states in E. It is not

a supercase of the non-deterministic one, they are independent

concepts.

Examples: (r is a possible run)

– Deterministic: τ (r) = e 0 00

{e, }

– Non-deterministic: τ (r) = e , e

 e 1/2

 0

e 1/4

– Probabilistic: τ (r) = 00

e 1/4

τ describes how the environment is working.

• hE, i.

Env = e , τ This tuple describes how the environment is per-

0

ceived by the agent.

• Agent function E →

Ag : R Ac

Given the history till some state e, it returns the next agent action α.

AG is the set of all possible agent functions.

9

• Utility function, the way with which we express what we want our

agent to do. It is crafted by the designer of the model. We do not

consider how the agent will reach the goal.

u : E R

In this way, the goal is to reach the state with highest utility.

u : R R

This utility depends on the history.

2.2.1 When is an agent a good agent?

We define p(r), r R, as the probability of a run. We then define the

expected utility as X ·

EU (Ag, Env) = u(r) p(r)

r∈R

The ideal optimal agent maximizes the EU , thus

Ag = arg max EU (Ag, Env)

Ag∈AG

These two formulations are not applicable in practice, because of infinite

sets.

2.2.2 Special cases of this general framework

Purely reactive agents Just react to events, no history involved

Ag : E Ac

Planning → {0,

ψ : R 1}

Achievement: if a run contains a particular state, return 1, else 0;

Maintenance: I don’t want to get in some particular states.

10

3 Markov Decision Processes - MDP

0

hE, i ∈

Environment: Env = e , τ Transitions: T (s, a, s ) [0, 1]

0

E → →

Agent: Ac, Ag : R Ac Policy: π : S A

∈ →

States: S, s S Utility: u : S R 0 0

P

∈ ·

Actions: A, a A EU (u, s, a) = T (s, a, s ) u(s )

0 ∈S

s

The utility in a MDP is a function that gives the benefit of being in one

state: u : S where S is the set of all states. In a Markov Decision

R,

Process, S is an instance of E.

How can an environment be described in a Markov decision process? s 1

is the initial state of the environment, s S. The transition function is

1

0 0

) [0, 1], where s is the current state, a the action taken and s

T (s, a, s

is the possible following state. It returns a probability value in [0, 1], that

0

represents the probability of going into state s from s taking action a. It’s

a stochastic process. s

T(s,a,s ) T(s,a,s )

T(s,a,s )

S S S

0 00 000

Obvious relation: T (s, a, s ) + T (s, a, s ) + T (s, a, s ) = 1

Why is this process a Markov Decision Process? The transition function

τ is history dependent, so given the whole history of interaction between

agent and environment it will return the next state. A special case of this

general definition is when the next state depends only on the last state.

Expected utility is not defined on all the runs, but on the current state. If

I start from s and perform action a, I would expect some utility and this

0

expected utility is calculated as the sum over all possible states s of the

11

0 0

probability of going in s times the utility in state s :

X 0 0

·

EU (u, s, a) = T (s, a, s ) u(s )

0 ∈S

s

The agent’s behavior is captured by a policy which is a mapping from states

to actions. We will use π to denote the agent’s policy. Given a state s, the

policy returns the action a that the agent should perform. The agent goal

then becomes finding its optimal policy, which is the one that maximizes its

expected utility, as we might expect a selfish agent would do. This strategy is

known as the principle of maximum expected utility. The agent optimal

policy can thus be defined as:

π (s) = arg max EU (u, s, a)

a∈A

Def. (Markov Decision Process) An MDP consists of an initial state

0

s taken from a set of states S, a transition function T (s, a, s ) and a

1 →

reward function r : S R.

MDPs are usually represented using graphs:

{s }.

Nodes represent states of the MDP, S = , s , s , s The initial state is s .

1 2 3 4 1

{a }.

Arcs represent transition functions and the set of actions is A = , a , a , a

1 2 3 4

Some actions can be applied in some states, others not.

12

Rewards:

s r(s)

s 0

1

s 0

2

s 1

3

s 0

4

Time is discrete, and in every timestamp the agent should perform an action.

Every time an agent reaches a state, it gets the reward associated with that

state. E.g. the agent is in s , it performs action a and goes in s , getting

3 4 2

reward 0.

Assume to have a fixed policy p, and the agent starts from s . Is the se-

1

quence of states that it will visit fixed? No, because every action is not

deterministic, but stochastic.

An intelligent agent should always follow the optimal policy.

X

∗ 0 0

·

π (s) = arg max T (s, a, s ) u(s )

a∈A 0 ∈S

s

The problem is that we don’t know the utility u!

The reward is called immediate reward because the agent gets the reward

as soon as it reaches the state. The utility is more complex: it explains

the benefit to be in a state (how good is to be in that state now? How good

is to be in this state in the future? ).

The utility is given by:

• the reward of current state;

• the reward that I will get starting from s and following the optimal

policy;

If I look only at the immediate reward, being in state s and s is the same

1 4

because the reward is zero in both cases. Looking at the utilities, I’m not

only looking at the immediate reward, but also at the reward I’ll get in

the future. Looking at the future, its better to be in s becausefrom there

4

I could go in s and get the reward 1.

3 u(s ) = r(s ) + reward I’ll get in the future

1 1

We generally prefer to use discounted rewards which allow us to smoothly

reduce the impact of rewards that are farther off in the future. We do this

13

by multiplying the agent’s future rewards by an ever-decreasing discount

factor represented by γ, which is a number between zero and one.

For example, if an agent using policy π, starts out in state s and visits states

1

· · ·

s , s , s , then we say that its discounted reward is given by

1 2 3 0 1 2 · · ·

γ r(s ) + γ r(s ) + γ r(s ) +

1 2 3 · · ·

Note that we only know s . The rest of the states s , s , depend on the

1 2 3

transition function T.

3.1 Bellman Equation

We define

Anteprima
Vedrai una selezione di 14 pagine su 63
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 1 Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 2
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 6
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 11
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 16
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 21
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 26
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 31
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 36
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 41
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 46
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 51
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 56
Anteprima di 14 pagg. su 63.
Scarica il documento per vederlo tutto.
Autonomous Agents and Multiagent Systems, prof. Amigoni, Polimi Pag. 61
1 su 63
D/illustrazione/soddisfatti o rimborsati
Acquista con carta o PayPal
Scarica i documenti tutte le volte che vuoi
Dettagli
SSD
Ingegneria industriale e dell'informazione ING-INF/05 Sistemi di elaborazione delle informazioni

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher alexvago93 di informazioni apprese con la frequenza delle lezioni di Autonomous Agents and Multiagent Systems e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Politecnico di Milano o del prof Amigoni Francesco.
Appunti correlati Invia appunti e guadagna

Domande e risposte

Hai bisogno di aiuto?
Chiedi alla community