WEIBULL
The Weibull distribution is characterized by two parameters: the shape k and the scale It resembles an
λ.
exponential distribution, raised to the power of k.
It is has the following mean, variance and moments:
It is used for modeling the lifetime of components subject to aging.
PARETO
The Pareto distribution has a simple monomial expression that is characterized by a shape parameter a
and a scale parameter m:
Since the integral of a PDF must equal to 1, the Pareto is a proper distribution only for α > 0, and it
generates values in the range [m,oo].
The distribution has a finite mean only for α > 1.
It has finite mean, but infinite variance for 1 < α <= 2. The distribution has finite variance only for α > 2.
It is used to model systems where very large values can have a nonnegligible probability to occur. This
phenomenon is called Heavy tail, and it can shown plotting 1-F(x) in log-log-scale.
Heavy tail behavior has been observed in Internet traffic, where sometimes very large files are transferred.
GENERATION AND ADVANCED DISTRIBUTIONS
PSEUDO RANDOM NUMBERS
Most of mathematical packages and programming languages have primitives for generating numbers
uniformly distributed in the zero-one interval. In order to produce a trace, samples from the random
variables that describe the service times and the inter-arrival times in the systems must be drawn. This
requires the generation of random numbers distributed according to the given distributions.
Let’s start focusing on algorithms that generate pseudo random numbers in the interval [0,1]. Since the
behavior of a computing machine is deterministic, random numbers are emulated by generating
sequences of values that seems to be completely unrelated one-another. The sequences however are
deterministic and there is a point after which they are reproduced identically.
Sequences are identified by their initial value, which is called the seed. When the same seed is used, the
sequence is reproduced identically. In order to produce arbitrary sequences, the seed is set to an always
changing value (usually the system time) at the beginning of the experiment.
The ability of setting a specific seed allows studies to repeat identically in different runs. This feature is
especially useful when debugging models where some events can occur only rarely.
LINEAR CONGRUENTIAL GENERATOR
Many algorithms have been defined to generate random numbers. The simplest, yet effective one, is the
Linear Congruential Generator. Let us call Xn the n-th number of the sequence, with X0 corresponding to
the seed. The algorithm computes Xn in the following way: Where a, c and m
are constants that identify the sequence. In particular, m is the maximum period: the maximum number of
samples after which the sequence repeats.
The algorithm generates numbers in the range [0, m-1]. To obtain a sufficiently dense approximation of a
uniform distribution in the [0,1) or [0,1] range, the value is divided by either m or m-1, depending on
whether the limit value 1 should be included or not in the generation process. The most common
implementation, excludes the upper bound from the generation.
The choices of the parameters is crucial for the algorithm. Wikepedia reports the values used in the most
known libraries.
RANDOM VARIABLES GENERATION
For finite discrete random variables, the uniform value ui ~ Unif can be used in a simple algorithm to select
the sampled value:
Usually, an array C(k) with N elements is constructed
The simplified procedure on the right can then be used.
The particular case of an integer uniform discrete distribution, between two extremes a and b, can be
generated with a simpler expression:
The minimum is necessary only if the uniform random variable can also generate its upper bound 1. Since in
general this is not the case, most of the times the minimum is omitted, leading to a much simpler
expression:
The Inverse transform sampling algorithm allows generating samples according to any distributions,
starting from a set of random numbers uniformly distributed between [0,1]. It is based on the fact that for
every CDF, the values on the y-axis are uniformly distributed in the [0,1] range. If FX(x) is the CDF of the
considered distribution, then X can be computed from a number u uniformly distributed between [0,1] in
the following way:
If vi is Unif, then, also ui = 1-vi is Unif<0,1> and we have:
Although the Inverse transform sampling algorithm is theoretically always applicable, it might not be the
best solution in many cases. In particular, if the inverse of the CDF, function FX -1(u), does not have an
analytical closed form expression, or it is particularly complex to compute, the Inverse transform sampling
algorithm can be particularly inefficient. In many cases, taking into account the definitions of the
considered distributions, we can determine procedures for generating samples in a more efficient way.
For instance, an Erlang distribution with k stages of rate , can be computed by summing up k samples
λ
generated from an exponential distribution with parameter λ.
The HypoExponential distribution, can be generated by summing up n samples from n exponential
distributions, each one characterized by its rate .
λ
i
The Hyper exponential can be generated by first using a discrete random variable, characterized by
probabilities p1 ... pn, to select a stage i. Then, the corresponding exponential distribution of parameter λi
can be used to determine the final sample. This technique always requires two samples, u1 and u2,
regardless of the number of stages n.
A similar procedure can be used also for the HyperErlang, combining both the discrete distribution to select
the component, and the technique seen for the Erlang distribution to determine the final sample from the
selected stage.
DISTRIBUTIONS WITH GIVEN STATISTICS
In some cases, we want to find a distribution with a given average and variance, maybe even skewness and
kurtosis. We will now see a set of distributions whose parameters can be directly derived from these
values. In this case, it is generally not necessary performing fitting, but we compute parameters starting
from the statistics of the trace.
NORMAL
The Normal (or Gaussian) distribution is defined by its mean and its variance.
, erf=error function
It is very important since, for the central limit theorem, the sum of a large number of independent random
variables tends to a Normal distribution.
When µ=0 and =1, the distribution is called Standard Normal. If N is a
Standard Normal distribution, then we can obtain normal distributions with different average and variance
with a simple linear transform:
NORMAL RANDOM VARIABLES GENERATION
The Box-Muller method can be used to generate two independent samples of a Standard Normal
distribution starting from two random variables u and v in the range [0,1].
TRUNCATED NORMAL
The main problem with the Normal distribution is that it has an infinite support which can produce
negative numbers that are not meaningful as service or inter-arrival times.
The Truncated Normal distribution discards all the values that are negative when sampling a conventional
Normal distribution. The truncated distribution, however, has different mean and variance
with respect to the original Normal distribution. It can still be a good
approximation if the c.v. is not too large, since the difference between
the two values can be negligible. To be more precise, solving a non-linear system of equations,
it would be possible to obtain parameters of a truncated
Normal distribution to match a given average and standard
deviation (if possible).
14/10/2022
Generation of samples from a truncated normal distribution can be done with a simple rejection
algorithm:
LOG NORMAL
The Log Normal distribution, is computed by using a Normal distribution sample (with a given average and
standard deviation) as the exponent for the exponential function.
Thanks to the properties of the exponential, it produces only positive numbers. Its PDF, CDF, average and
variance are:
Given a target average E[X] and coefficient of variation cv, its parameters can be computed in the following
way: Random numbers can then be generated by simply applying its
definition, starting from a normally distributed random value (generated for example with the Box-Muller
technique).
THE PEARSON FAMILY OF DISTRIBUTIONS
As the Normal distribution is defined by its first two moments, the Pearson family of distributions can be
defined by the first four moments. It is a family of distributions that includes many other cases (such as the
Exponential, the Gamma, and the Uniform), which are selected depending on their parameters.
Distributions belonging to the family are defined by four parameters (a, b0, b1 and b2), and their PDF
satisfies the following differential equation:
Parameters a, b0, b1 and b2 can be related to the first four standardized moments with analytical formulas
that considers two values:
• b1 corresponding to the square of the Skewness
• b2 denoting the Kurtosis (fourth standardized moment, corresponding to the excess Kurtosis without the
“-3”)
The proposed differential equation can be easily solved, and a closed form expression for the distribution
can be obtained:
However, the integral inside the exponential, can assume many different expressions, depending on the
roots r1 and r2 (which might also be complex numbers) of , that can simplify the expression in
many cases.
Since the probability distribution must always be positive, the roots (whether they are real or complex)
determine the support of the distribution, which can vary between [-oo,+oo], [l, +oo] or [-oo,u], or even [l,
u] . Twelve different special cases have been outlined, and they are usually represented in a chart as
functions of the moments b1 and b2.
CHECKING RESULTS AND CONFIDENCE INTERVALS
CHECKING DISTRIBUTIONS
Once a distribution to match data acquired in trace has been identified, and the corresponding
parameters have been fit, it must be analyzed to ensure that the process returned a correct result.
Goodness of fit techniques aim at quantifying the effectiveness of fitting process. Many techniques exist,
each one with its goals and use case. However, in performance evaluation, the choice of proper one
depends on the specific type of system being modelled: a complete description is thus too advanced and
outside the scope of this course. Here we will only present a couple of simple graphical procedures.
Q-Q PLOT AND P-P PLOT
The Q-Q plot is a graph that allows comparing two distributions X and Y using their CDF. In particular, the
(x,y) coordinates of the graph are defined in this way:
Similarly, the P-P plot is defined as follows:
If the two distributions are identical, the graphs are aligned over the 45° line y = x.
P-P plots have both axis in the [0,1] range, while Q-Q plots considers the domain of the distributions.
If the distributions are diff
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
Scarica il documento per vederlo tutto.
-
Probiotic science and applications
-
Appunti di Evaluation and Management of Real Estate
-
Istologia - Appunti
-
Informatica - Appunti