Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
vuoi
o PayPal
tutte le volte che vuoi
SIN
- Strategy 2: Stability Section algorithm: it looks for a stable graph structure using
resampling. E.g.: package stabs in R
- Strategy 3: use a regularised estimator that shrinks towards zero small partial
correlation coefficients. E.g.: lasso, elastic net, adaptive lasso estimators, package
glmnet in R −1
Σ Σ
We can estimate , then invert it obtaining . From that we can obtain the correlation
between two edges given the rest:
−σ
ρ =
.
σ σ
(
−1
)
Ω
obtaining the matrix with parameters, which are the values under the diagonal.
2
(
−1
)
Since there are different tests, this leads to a multiple tests issue.
2
The test for partial correlation is stated by the null hypothesis:
: ρ = 0
0 .
against : ρ ≠ 0
1 .
and the statistic to compute this test is a Student:
(−2)−(−2)
. ∼ =
(−2)−(−2) −
2
1−
.
This statistic is derived using the pairwise correlation coefficient .
.
To improve this test there is a transformation to apply, the Firsher’s transformation which
asymptotically converges under to a standard Gaussian:
0 1+
⎡⎢ ⎤⎥
1 .
( − 3) − ( − 3) · ∼ (0, 1) → ∞
as
2 1−
⎣ ⎦
.
It is proven that the test derived using the Fisher transformation converges to the standard
Gaussian faster than the previous statistic converges to the .
(−2)−(−2)
As we define a statistic for the test, we can also define some confidence intervals for the
partial correlation. We define the Fisher transformation with
1+
⎡⎢ ⎤⎥
1 .
=
2 1−
⎣ ⎦
.
and then from the Fisher transformation we obtain:
( )
1+ρ
⎡⎢ ⎤⎥
1 1
.
∼ ,
2 1−ρ −3−
⎣ ⎦
.
= − 2
where , dimensions of the conditioning set.
The intervals for the Fisher transformed parameter are:
= [
, ]
with α
= − 2
−3−
α
= + 2
−3−
In the context of graphical models we call the saturated model the one corresponding to a
complete graph.
An adjacency matrix is a matrix composed by 0s and 1s formed this way:
= 1
- if there is an edge connecting the nodes and then the
= 0
- if there is no edge connecting the nodes and then the
The adjacency matrix is symmetric for undirected graphs, the opposite is not always true.
If we have an independence like 1 ⊥ 3 | 2, 4
−1
Σ
the variance/covariance matrix will be indicated as (obtained by solving (= inverting)
1234
−1
( )
Σ [1, 3] [3, 1]
the matrix ) and there will be zeros in the entries and .
1234
In this case in order to check a sub-independence like
1 ⊥ 3 | 2 −1
Σ
we have to consider the variance/covariance matrix indicated as obtained by inverting
123
Σ
the matrix which is the matrix obtained by removing the 4th column and the 4th row.
123
Then compute the same test explained before considering this submatrix. If the null
hypothesis is rejected the partial correlation coefficient is most likely not 0, which means
1 3 2
that is not independent from given .
Recap:
We have variables and we want to study their conditional independence structure using a
graph, learning it from the data. In this situation we’re assuming that the data comes from a
multivariate Gaussian distribution and the models resulting from this data are called
Concentration Graph Models.
In this context a missing edge represents conditional independence between those two
variables. We construct the graph based on the Pairwise Markov properties. Since the joint
Gaussian distribution is strictly positive, the Pairwise Markov property will imply the Global
Markov property, so all the conditional independences that we can read off the graph using
the Global Markov properties are implied by the construction of the graph.
There are two cases:
- when we know the graph we want to do inference, estimating the parameters given
→
the graph iterative procedures (Iterative Fitting Algorithm), very rare situation
since usually we don’t know the graph
- inference when we don’t know the graph, 3 possible strategies:
- start with the maximum likelihood estimator for the complete graph
variance/covariance matrix and then test with partial correlation
= 0 →
possibly taking into account the multiple testing issue we compute
Pairwise Markov properties
- Stability Section algorithm
- use a regularised estimator that shrinks towards zero small partial correlation
coefficients
Multiple testing issue:
the probability of committing an error (first type or second type) on the whole graph, not
just on a specific edge. Since the number of edges grows as
(
−1
)
2
the number of tests grows really quickly as the number of variables grows.
Consider for example a scenario in which we have a multivariate regression:
= β + ∑ β + ε
0
=1 β
here a possible solution to know if a given coefficient is zero is to compute the Wald
t-test. If you do this though, you lose the “big picture” since here you need to compute
tests. For this reason you actually need to compute the instead, where you compare
the model with and the model without the -th variable.
Multiple testing issue
Type 1 and 2 errors:
- type 1 error = reject when is true. The probability of committing a type 1 error
0 0
α
is
- type 2 error = don’t reject when is false. The probability of committing a type
0 0
β
2 error is
The multiple testing problem concerns a situation in which we want to consider many
hypothesis at the same time.
Some notation:
We set as the number of possible edges:
(
−1
)
= 2
where is the number of variables.
Consider then the set of null hypothesis: { }
= ,...,
0 0,1 0,
we define the Intersection Null (or Global Null) as
= ⋂
0 0,
=1
The Global Null is rejected if at least one single is rejected.
0,
We then denote the set of p-values of the set of tests as
{ }
,...,
1
We compare the scenarios of a single test against multiple tests:
- when we compute a single test we consider the following probabilities:
( )
| = α
0
( )
| = 1 − α
0
- when we compute multiple tests we consider the following probabilities:
( )
| = ( 1 − α )
0
( )
| = 1 − (
1 − α )
0
where we assume each test is independent from the others
α
The bigger the value of (e.g. 0.1, 0.05, 0.01) the faster the probability of at least one type 1
error reaches 1.
The Bonferroni global test is a way to test the Global Null hypothesis:
α
- choose the overall significance level α
- test each null hypothesis at level
0,
- accept if
0 α
> ∀ = 1,...,
- reject if
0 α
≤
∈[1,] α
The idea here is to set the significance level of each test to since we compute tests.
The overall significance level is ( )
( )
α α α
( 1 ) = ⋃ < ≤ ∑ < = ∑ = α
=1 =1 =1
0 0 0
∼ (0, 1)
since under .
0
This bound is now considered too strong, expecially for graphical models, since in some way
we are looking for independences. Whenever we put an edge where it is not present, we are
considering an overparametrised model (that won’t be wrong in any sense, but still). Instead
when we put an edge which is actually present and we’re estimating a model putting a 0 to
that partial correlation coeffcient we’re causing a bias on the others, and so the model will
be wrong.
This means that when we commit a type 1 error in graphical models we’re instead α
committing a way harder error than the opposite kind of error. This means that fixing to be
β
too small, we will have that will be too high, and for graphical models this is a big problem.
In the context of graphical models it is wrong to say that there are more independences than
the real amount. In particular we need to control the type 1 errors, but we can’t be too hard
α β
on , otherwise will be too high.
Multiple testing classical scheme Actual situation
true false
0 0
Decision −
Don’t reject 0
Reject 0 −
0 0
We consider the following notation:
(
−1
)
→ →
total number of tests total number of possible edges 2
→
number of true null hypothesis number of independences (missing edges on the
0
graph)
− number of total edges on the graph in the model
0
number of rejected null hypothesis →
number of rejected true null hypothesis type 1 errors
type 2 errors
Remember that we only know the last column of the table, that is and .