Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
Y
1 t
use of to predict .
Y Y
t t+1
Then: ∣ ...Y ) = ∣ )
P r(Y Y P r(Y Y
1
t+1 t t+1 t
05 - Dynamic models 2
We can write the so called Markov model as:
) = ) ∣ )...P ∣ )
P r(Y P r(Y P r(Y Y r(Y Y
1:T 1 2 1 −1
T T
This is a first order Markov Model. , ...,
If we assume that each depends on the previous N values than we can
Y Y Y
t t−1 t−N
write the N-th order markov model as:
∣ , ..., ) = ∣ , ..., )
P r(Y Y Y P r(Y Y Y
1 +1
t+1 t t+1 t t−N
Example
In temporal data there is the auto-regressive model of order N.
The model is formulated as:
where is the noise that can’t be explained by the model.
e(t)
5.1.4 Dynamic models as graphical models
Dynamic models are also called Dynamic Bayesian Networks.
They may involve both hidden and observed variables, which can have
complex interdependencies.
The graphical structure provides an easy way to specify these conditional
independencies, and hence to provide a compact parameterization of the model.
We need to add time to the graphical representation and in order to do that typically samples
are divided into time slices that then can be interconnected.
So, to specify a dynamic model as graphical models we need to define:
Intra-slice topology
Inter-slice topology
The model parameters for the first two slices
Two example of dynamic models are:
Hidden Markov Models
05 - Dynamic models 3
Example
It’s the simplest type of dynamic model.
There is an observable continue variable (grey one) and a non-observable categorical
variable (white one) which is time dependent.
The Q variable at a given time depends on the value of the Q variable at the previous
time. This is a first order Markov Model. These variables are often called state
variables.
The observable variables are not time dependent, so there is no memory at this level.
These are often called output variables.
Linear Dynamical Systems
5.1.5 Time invariance
We assume that the parameters do not change. So we’ll deal with time-invariant model.
If model parameters change with time we need to change the model structure over time.
5.2 Hidden Markov Models
A tool for representing probability distributions over sequences of observations.
The observation of time is denoted as and it can be discrete (categorical variable) or
t Y
t
continuous.
5.2.1 Markov property
There are three properties:
Observation at time t was generated by some dynamic stochastic variable which is
Y S
t t
hidden.
is discrete-valued and can take one of K values. The hidden variable represents the
S S
t t
underlying state of the system or process. These states often represent different
configurations, conditions, or modes of the system being modeled.
The state of the hidden process satisfies the Markov property: given the value the
S t−1
− 1
current state is independent of all the states prior to .
S t
t
This means that the current state encapsulates all relevant information from the past
S t
and serves as a sufficient statistic for predicting the future.
In other words, the hidden state is a Markov model.
05 - Dynamic models 4
Example
Imagine deploying a wearable device onto an elderly person with the primary objective of
closely monitoring their daily activities—specifically, the durations of walking, lying down,
standing and sitting.
In this context, the "state variable" refers to the person's activity state at any given
moment, which can take on one of three values: walking, lying, standing, or sitting.
To infer these activity states, the wearable device collects data from an accelerometer,
capturing the person's movements over time.
The accelerometer provides a continuous stream of information, and by analyzing these
values, we can deduce the person's current activity state—whether they are actively
walking, in a resting position, or engaged in a seated posture.
Importantly, there exists a logical relationship between these activity states. Specifically,
there is a dependence or sequence in the transitions between different movements. For
instance, before engaging in a walking activity, it is typical for the person to be in a
standing position. This dependency reflects a real-world scenario where certain activities
often follow a logical order or pattern. In this case, the person is more likely to transition
from a standing state to walking, establishing a connection between consecutive activities.
5.2.2 HMM: joint probability
The joint probability of the S and Y can be written as:
T T
∏ ∏
, ) = ) ∣S ) ∣S )
P r(S Y P r(S P r(Y P r(S
1:T 1:T 1 t t t t−1
t=1 t=2
State 1 doesn’t depend on anything
)
is a vector of K values and it’s the probability of initial state.
P r(S 1
Importantly, it doesn't depend on any prior states. It's a vector with values, where is the
K K
number of possible states. Each value in the vector corresponds to the probability of the
system starting in a specific state.
∣S ) ×
is the probability of state transition from state to state . It’s a
P r(S S S K
t t−1 t−1 t
(i,
matrix, where each entry corresponds to the probability of transitioning from state
K j)
05 - Dynamic models 5
to . Every state depends only on the previous state.
i j ∣ )
is the probability of observing a particular output (also called emission
P r(Y S Y
t t t
probability) given the current state . It specifies the relation between the observation
S t
and the state.
If the observation is a continuous variable, we have to consider the mean and the variance
for each State. ×
If the observation is a discrete variable, it can be represented as a matrix , where
L K
is the number of possible observations, while are the different states.
L K
5.2.3 HMM: time invariance
Usually the state transition matrix and the output model are assumed to be independent on t,
i.e. we assume that the model is time-invariant.
5.2.4 HMM with external input
There may be also an external input that could be observable or not.
U
t
If it’s not observable, we can just forget about it.
We will not consider this case.
5.2.5 Variants
5.2.6 Learning in a HMM {Y , ..., }
Suppose we have a data sequence .
Y
1 T
Let’s define the set of all the parameters (transition probability matrix, initial state probability,
θ
parameters of the output model).
In order to estimate it from the data sequence, we can use the log form of likelihood.
T T
∏ ∏
∣θ) ∣S , ∣S ,
= , ∣θ) =
L(θ) P r(S Y P r(S P r(Y θ) P r(S θ)
1:T 1:T 1 t t t t−1
t=1 t=2
05 - Dynamic models 6
Log form: T T
∑ ∑
= ∣ + ∣S , + ∣S ,
logL(θ) logP r(S θ) logP r(Y θ) logP r(S θ)
1 t t t t−1
t=1 t=2
We can then procede as usual taking the derivative and put it equal to 0.
The difficulty arises from the fact that the components of the HMM (initial state probabilities,
state transition probabilities, and emission probabilities) are distinct and involve different
calculations.
A significant challenge in the HMM learning process is the unobservable nature of the hidden
states ( ). Unlike the observed data sequence, the states are not directly known.
S
It’s the exactly same problem that we dealt in mixture gaussians where there were an hidden
categorical and unknown variable.
Solution:
To address these challenges, the process begins by taking the logarithm of the
probabilities to simplify calculations.
One-hot encoding is then introduced to represent the hidden states ( ). This encoding
S
transforms each state into a k-dimensional column unit vector, where is the number of
k
possible states.
For instance, this is the state taking the value 2 at time t:
T
= [0 1 0... 0]
s
t
In fact:
Using the unit vector notation, to express the fact that:
⎧ = 1∣θ) =
P r(S π
1 1
⎨
...
⎩
= =
P r(S K∣θ) π
1 K
where is the value of the probability, we can simply write:
π
i K
∏ ,
is i
= ∣ = 1
P r(S s θ) π
1 !
i=1
NB: is the i-th component of the unit vector and it’s used as exponent.
s s
1,i 1
= 1
Only one peer is different from 0, so in the end we obtain the element for which .
s S
This is a more compact way to write the system above.
05 - Dynamic models 7
Taking the log of this compact form we obtain a more simple expression, that we can
vectorize to obtain k
∑ ⋅
= T
= ∣θ) = (logπ)
logP r(S s s logπ s
1 1 1,i 1
i
i=1
NB: both and are vectors and log is applied element wise.
s π
1
Similarly, if the probability of transition from state j to state i is expressed as follows:
= = = = 1∣S = 1) =
P r(S i∣S j, θ) P r(S ϕ
t t−1 t,i t−1,j ij
We get: K K
∏ ∏ s s
= ∣ = , = (ϕ )
P r(S s S s θ) t,i t−1,j
t t t−1 t−1 ij
i=1 j=1
NB: In this double product there is just one element that is different from 0.
Same trick as before.
We can take the logarithm: K K
∑ ∑ = tT
= ∣ = , = (logΦ)S
ogϕ
log(P r(S s S s θ) s s l S
t t t−1 t−1 t,i t−1,j ij t−1
i=1 j=1
Φ
where is the matrix of elements .
ϕ
ij
5.2.7 HMM: Emission probability
We need now to consider the emission probability, which depends on the nature of
observations .
Y
t
We have to distinguish the two cases:
discrete-valued
Y
t
Using a coding similar to that used for we can write the emission probabilities as
S t L K
∏ ∏ y s
= ∣ = , = (E )
P r(Y y S s θ) t,i t,j
t t t t ij
i=1