The larger is the second one so our prediction is that vector: the churner.
The values are not complementary because we are maximizing the numerator forgetting P(x)
since it does not depend on y. Moreover, the notation is not saying equal but proportional.
We have a lot of vectors so everyone of them has a very small probability of x. The confidence
is expressed as the ratio between the value maximising the expression and the sum of the two
(~ 3): the confidence level is approximately 2.5/3=5/6 and 1/6 is the probability of error.
Last remark on this method: is it possible to make a less strong assumption about
independence? Some partial dependence? Yes: which allow to define
dependency relationships between (small) subset of attributes.
The idea is a partial dependence: reticular hierarchical links are in fact introduced though
which it is possible to assign selected stochastic dependencies.
The two core elements are:
Acyclic oriented graph
- in which the nodes correspond to predictive variables and the
arc indicates relationships of stochastic dependence.
Conditional probability table
- that combine effects of target variables: the idea is making
small limited combinations of variables.
The main application is in health care: people think they are good in knowing the relationship
between data (e.g. relationship with smoke or family history and cancer). The consequence of
partial relationship: a small subset of variables there is a dependence but in a large group of
variables is not possible; we can combine groups of two or three variables together (joined
conditional distribution is possible for small subset of data).
It is another technique which coverts binary classification problems into linear regression ones.
Regression is pretty much like classification but for numerical target variables. Logistic
regression is a way to adapt regression to classification, it is based on parametric assumptions
(all the others were not).
The idea is to calculate posterior probability, but differently from Bayesian method, it is
parametrical (we make an assumption on the shape of probability).
A also called
This kind of expression is a sigmoid (or logistic function), for
example to describe technology development: when we have
early technology the efficiency increases slowly, then the
improvement is bigger because of the technology, finally the
technology becomes mature and further improvement become
very hard. The behaviour is S shaped. 47
The logistic regression postulates that the posterior probability of the response variable
Given this assumption, it is possible to calculate the odds ratios:
Instead of explaining a binary variable, we explain da logarithm which in continuous variable
so we moved from categorical variable (1,0: binary) to a numerical quantity which is the ratio
above. At this point, we have a standard linear regression problem.
Logistic regression is a good technique with a very small number of variables. When the number
of variables increases it becomes unstable and it is difficult to obtain a solution.
Example: explosion of space shuttle.
Basing on past data was observed that when the temperature was lower, there was a failure in
the chains blocking the gasoline which came out before having consumed it: the risk of burning
From the data: with lower temperature, Y was more likely. But there was pressure to start the
mission and NASA wanted more financial support. They decided to start the mission and there
was an explosion which led to a trial. One of the element considered was the regression model:
the single explanatory variable is the temperature. The odds ratios have been calculated and
the parameters have been esteemed.
The temperature at the time of explosion was very low: what was the probability of having a
failure? Very close to 100%. With such a high probability of the failure, the damage happened
at the very beginning and the gasoline started to come out from the very beginning.
In this case we have one single variable, so the model is applicable and can lead to successful
It is a technique introduced very early (1956) which has been used for predictive purposes.
Basically, a neural network is an oriented graph in which nodes represent neurons and are
connected by arc, which represent synapses. Every arc has associated a weight and each node
an activation function.
Through the perceptron, represented in the figure below, we can simulate the electric
behaviour of a single neuron simulating the level of energy and the output; so we are far from
the real human brain. The analogy is based on the fact that there are n values entering
input layer. weights
(independent variables: input node for each variable): Then there are
vectors provided by coefficient associated to each input connection (linear combination even if
our neurons do not perform linear combinations of the inputs): then we calculate
sum. distortion activation function
We have a constant
10 mesi fa
I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher franciig_ di informazioni apprese con la frequenza delle lezioni di Business Intelligence e Data Mining e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Politecnico di Milano - Polimi o del prof Vercellis Carlo.
Acquista con carta o conto PayPal
Scarica il file tutte le volte che vuoi
Paga con un conto PayPal per usufruire della garanzia Soddisfatto o rimborsato