Multiagent System - Appunti

Questi appunti sono stati presi nell'anno accademico 2022/2023 seguendo il corso da lezioni in presenza e slides con i professori Amigoni e Gatti. Gli appunti sono completamente in inglese e …

Esame Multiagent systems

Facoltà Ingegneria dell'informazione

Dal corso del Prof. Amigoni Francesco

Università Politecnico di Milano

Publisher simone_togn

A.A. 2022-2023

69 pagine

Appunti esame

Vota 3,5 / 5 (2)

Scarica

Estratto del documento

01 Introduction

An agent has the following properties:

• Autonomy: it can work without the human help.

• Reactivity: it can react to everything that happen in the environment.

• Pro-activeness: it must be able to take initiative to achieve the predefined

objectives, also without external stimuli.

• Sociality: it must be able to interact with other agents, or with human beings.

Rational agent: it must be able to choose the action that maximizes the expected

value of its performance measure, given its knowledge up to that moment, for every

possible sequence of perceptions. It’s not omniscient, clairvoyant and can explore to

acquire new information and can learn.

An agent chooses the action in order to reach a goal or maximize the performances,

like utility.

An Environment can be:

• Completely (chess) or partial (poker) observable.

• Static (changes only when the agent acts) or dynamic (changes even if the agent

is thinking).

• Discrete (described by discrete time, or variables) or continuous (described with

continuous time or variables).

• Single agent or multiagent.

The Action made by the agent can be deterministic (if I know what will happen after

my action, like for chess), or stochastic (if I don’t know what will happen after my

action).

The Performance measure is the criterion for evaluating the success of the behavior

of an agent and it’s defined by the designer.

We can implement an agent in three ways:

• Rules: if something happen do something. Can be written by the designer or

can learn.

• Plans: you imagine performing different actions and then you perform the ones

that maximize the performances or that make you reach the goal.

• Policies: the agent decides what action perform basing it’s reasoning on the

current state and on its goals.

There exist 4 types of agent architectures (in the pictures the blue box represents

processing and the yellow knowledge):

• Simple reflex agents: like NN or the first version of Roomba. They perceive the

environment, process the perceptions, and decide what to do according to if-

then statements and then they do it.

• Reflex agents with state: like the Dyson 360 Eye. They keep track of the

environment’s states.

• Goal-based agents: like robot with a goal, or games solvers, like Tris. They

imagine what will happen if they perform some actions (every state is seen as

good if it takes you to the goal or bad if it doesn’t).

• Utility-based agents: like autonomous cars. They imagine possible sequence of

actions and use the utility to evaluate (with a real number) every state.

All these agents can improve their performances with learning. Every component of

the decisional process can be modified in order to perform better.

Multiagent systems: are systems that can interact with each other or with the

environment. They are divided in two classes:

• Cooperative systems (benevolent agents)

• Competitive systems (selfish agents)

Other types of interactions are coordination, negotiation, elections, task allocation,

distributed optimization, emerging behavior, …

Coordination: selecting actions that do not interact with negative effects. They can be

of 3 types:

• Pre-programmed: like in space satellites that know where everyone goes.

• Convention-based: if the agent has to follow some rules, like cars that have to

give the priority to the ones on their right.

• Communication-based: agents communicate between them to work together.

Coordination also can be divided into:

• Centralized: when there is a supervisor that decides for everyone and

communicate them what to do, in order to find an optimal solution.

• Distributed: the agents talk only with their neighbors, it’s not possible to find

an optimal solution.

Problems can be:

• Micro problem: how to design individual agents able to act autonomously in

order to reach a goal.

• Macro problem: how to design systems in which more agents interact in a

useful way.

The agent architecture can be reactive (reacts to the environment, without

maintaining any internal representation of the world), deliberative (agents maintains

an internal representation of the world and plans actions based on that) or hybrid.

The interaction mechanisms between agents can be: planning, coordination,

matchmaking, auctions, negotiations, strategies, etc.

02 Agent Architectures

In this part we focus on the micro problem.

We want to build agents that enjoy the properties of autonomy, reactiveness, pro-

activeness and social ability.

The agent architecture can be defined as: the agent can be decomposed into the

construction of a set of component modules and how these modules should be made

to interact. They have to provide an answer to the question of how the sensor data

and the current internal state of the agent determine the actions and future internal

state of the agent.

Reasoning agents: we see agents as knowledge-based systems. This paradigm is

known as symbolic AI. A deliberative agent is the one that contains an explicitly

represented, symbolic model for the world and makes decisions (what action to

perform) via symbolic reasoning.

To get the deliberative agent to realize some theory we have to give him the

representation of this theory in rules, or symbolic knowledge. Then there are two

important problems to solve:

• Transduction problem: translate the real world into an accurate, adequate

symbolic description. (solved working on vision, learning, speech

understanding, etc).

• Representation/reasoning problem: how to symbolically represent

information about complex real-world entities and processes and how to get

agents to reason with this information in time for the results to be useful.

These kind of problems seems to be trivial, but are extremely difficult: many search

based symbol manipulation algorithms are intractable.

Planning agents: is the design of a course of actions (the plan) that will achieve some

desired goal (es: Shakey the robot or moving block algorithm). The reasoning is

complex, so it’s not scalable.

Reactive architectures: these architectures have been created believing that the

assumptions made in mainstream AI were wrong.

Brooks thesis are:

• Intelligent behavior can be generated without explicit representations

proposed by symbolic AI.

• Intelligent behavior can be generated without explicit abstract reasoning

proposed by symbolic AI.

• Intelligence is an emergent property of certain complex systems.

He had two key ideas:

• Situatedness and embodiment: real intelligence is situated in the world, so

agents must be situated in the real world.

• Intelligence and emergence: intelligent behaviors arise as a result of an agent’s

interaction with its environment, it’s not one of its innate properties.

Brooks build a subsumption architecture, that is a hierarchy of task-accomplishing

behaviors. Each behavior is a rather simple rule-like structure and completes with

others to exercise control over the agent. The lower layers are more primitive kind of

behaviors, like avoiding obstacles and have the precedence over the further up in the

hierarchy. These kinds of systems are computationally speaking extremely simple.

Steels created a Mars explorer system using the subsumption architecture, that had

for objective to explore a distant planet and to collect a precious rock. It doesn’t know

the location of the samples but knows that they are clustered. Its behavior were:

1. Obstacle avoidance (lowest layer).

2. Path attraction: if you see a heavily trodden path, follow it away from spaceship.

3. Explore movement: if I have no samples move away from spaceship.

4. Return movement: if I have samples and I can’t see any denser concentration

then head towards spaceship.

Markov Decision Processes:

We can use search to solve problems only in deterministic domain (es: A* doesn’t

work in stochastic environments). So we introduce MDPs for encoding problems in

stochastic environment.

An MDP is an architecture that is a mix between planning and reactive.

An MDP can be defined formally as M = (S, A, T, R) where:

• S is the state set.

• A is the Action set.

• T: S x A x S -> R is the transition function: the probability of going from s to s’

when executing action a (the sum for a state s to go to all the states s’ of S is 1).

• R: S x A -> R is the reward function

The goal is to perform actions in order to maximize the future rewards. The strategy

of the agent is called policy.

A policy can be:

• Deterministic: π(s) : S --> A the agent will always perform the same action from

a given state

• Stochastic: π(s,a) : S x A --> R the agent selects an action to execute by drawing

from a probability distribution encoded by the policy.

In MDP we use the Markov assumptions, so X depends only on X and not on X , X

t t-1 t-2 t-

, etc.

Expecting future rewards:

The problem of this formulation

is that it can grow to infinite.

Two possible solutions are:

If gamma is 0 we are looking only to the next reward, if it’s close to 1 we are looking

very far.

Value function V(s): is the expected discounted reward that an agent expects to gain

if he acts optimally starting form state s.

The game plan is:

1. Calculate the optimal value function.

2. Calculate the optimal policy from optimal value function.

In order to calculate the optimal value function we use value iteration:

Let’s focus on the update rule:

This is the Bellman Equation (the equation is written without the i because it’s not an

update rule). Can be proof that this V will converge to the optimal value.

Value iteration is a slow method: O (S A) per iteration, the max rarely change at each

state and the policy converges after long time.

Another formulation of the Bellman Equation is the one used in the example:

Now we have to understand when we stop the algorithm. We will stop when the

maximum of all the values is very little, so:

This is called the Termination Condition.

We use this formula, because it can be proved that when we stop the error that we

have will be:

So the values that we have found are approximately the same of the optimal. Of

course, if we choose a little epsilon we will find better values, but the convergence will

be slower. So we set epsilon doing a trade-off between quality of solution and time.

The more gamma is small the more is easy to converge.

Given the optimal value function V* we can calculate the optimal policy:

If we don’t know V*(s ) we can calculate them making a system, that has n equations

and n unknown variables. The problem is that the equations are not linear because

they contain the max function, so we have to make assumptions to solve it (assume

that one equation is greater than another one). Then we have to verify that the

assumptions were correct and if it they were not we have to change them.

Game Theory

The basic model for games is the Normal Form Representation: (N, A, U), where N are

the players, A the actions and U the utilities.

The two players zero-sum games are games with 2

players and the sum of the utilities equal to 0 for each

couple of actions (in the picture Rock-Paper-Scissor

modeled as a 0-sum game). This kind of games are

solvable with linear programming.

Another type of games,

more general is the two

players general-sum game

(in the picture the Bach or

Stravinsky concert game, with two players and the utilities

that are not 0-sum).

When we have more than 2 players, has no sense to have a 0-sum game.

To solve this kind of problem we use strategies: G : A --> [0,1] that represents the

i i

probability that every action will be played.

For example we can have a mixed strategy: in Rock-Paper-Scissor we have 1/3 for every

action (we are randomizing the strategy). Or we can also have a pure strategy, for

example in Rock-Paper-Scissor we can have 1 for Paper and 0 for the other two.

Of course the sum of the probabilities for all the action must be 1.

Then we can compute the expected utility of agents playing the actions: E[Ui] = p1 *

u1 + … + p * u . For example for the mixed strategy of the R-P-S with p = 1/3 against

n n i

the pure strategy with G = (1, 0, 0), the expected utility is: E = 1/3 * 0 + 1/3 * 1 + 1/3

Anteprima

Vedrai una selezione di 15 pagine su 69