Appunti per l'esame di Performance evaluation and applications

Appunti presi durante il corso di Performance evaluation and applications, essenziali per la mia preparazione e il mio successo. Ho ottenuto una votazione finale di 30/30.Hanno fornito una …

Esame Performance evaluation and applications

Facoltà Ingegneria dell'informazione

Dal corso del Prof. Gribaudo Marco

Università Politecnico di Milano

Publisher nicole_perrotta

A.A. 2022-2023

107 pagine

Appunti esame

Vota

Scarica

Estratto del documento

WEIBULL

The Weibull distribution is characterized by two parameters: the shape k and the scale It resembles an

λ.

exponential distribution, raised to the power of k.

It is has the following mean, variance and moments:

It is used for modeling the lifetime of components subject to aging.

PARETO

The Pareto distribution has a simple monomial expression that is characterized by a shape parameter a

and a scale parameter m:

Since the integral of a PDF must equal to 1, the Pareto is a proper distribution only for α > 0, and it

generates values in the range [m,oo].

The distribution has a finite mean only for α > 1.

It has finite mean, but infinite variance for 1 < α <= 2. The distribution has finite variance only for α > 2.

It is used to model systems where very large values can have a nonnegligible probability to occur. This

phenomenon is called Heavy tail, and it can shown plotting 1-F(x) in log-log-scale.

Heavy tail behavior has been observed in Internet traffic, where sometimes very large files are transferred.

GENERATION AND ADVANCED DISTRIBUTIONS

PSEUDO RANDOM NUMBERS

Most of mathematical packages and programming languages have primitives for generating numbers

uniformly distributed in the zero-one interval. In order to produce a trace, samples from the random

variables that describe the service times and the inter-arrival times in the systems must be drawn. This

requires the generation of random numbers distributed according to the given distributions.

Let’s start focusing on algorithms that generate pseudo random numbers in the interval [0,1]. Since the

behavior of a computing machine is deterministic, random numbers are emulated by generating

sequences of values that seems to be completely unrelated one-another. The sequences however are

deterministic and there is a point after which they are reproduced identically.

Sequences are identified by their initial value, which is called the seed. When the same seed is used, the

sequence is reproduced identically. In order to produce arbitrary sequences, the seed is set to an always

changing value (usually the system time) at the beginning of the experiment.

The ability of setting a specific seed allows studies to repeat identically in different runs. This feature is

especially useful when debugging models where some events can occur only rarely.

LINEAR CONGRUENTIAL GENERATOR

Many algorithms have been defined to generate random numbers. The simplest, yet effective one, is the

Linear Congruential Generator. Let us call Xn the n-th number of the sequence, with X0 corresponding to

the seed. The algorithm computes Xn in the following way: Where a, c and m

are constants that identify the sequence. In particular, m is the maximum period: the maximum number of

samples after which the sequence repeats.

The algorithm generates numbers in the range [0, m-1]. To obtain a sufficiently dense approximation of a

uniform distribution in the [0,1) or [0,1] range, the value is divided by either m or m-1, depending on

whether the limit value 1 should be included or not in the generation process. The most common

implementation, excludes the upper bound from the generation.

The choices of the parameters is crucial for the algorithm. Wikepedia reports the values used in the most

known libraries.

RANDOM VARIABLES GENERATION

For finite discrete random variables, the uniform value ui ~ Unif can be used in a simple algorithm to select

the sampled value:

Usually, an array C(k) with N elements is constructed

The simplified procedure on the right can then be used.

The particular case of an integer uniform discrete distribution, between two extremes a and b, can be

generated with a simpler expression:

The minimum is necessary only if the uniform random variable can also generate its upper bound 1. Since in

general this is not the case, most of the times the minimum is omitted, leading to a much simpler

expression:

The Inverse transform sampling algorithm allows generating samples according to any distributions,

starting from a set of random numbers uniformly distributed between [0,1]. It is based on the fact that for

every CDF, the values on the y-axis are uniformly distributed in the [0,1] range. If FX(x) is the CDF of the

considered distribution, then X can be computed from a number u uniformly distributed between [0,1] in

the following way:

If vi is Unif, then, also ui = 1-vi is Unif<0,1> and we have:

Although the Inverse transform sampling algorithm is theoretically always applicable, it might not be the

best solution in many cases. In particular, if the inverse of the CDF, function FX -1(u), does not have an

analytical closed form expression, or it is particularly complex to compute, the Inverse transform sampling

algorithm can be particularly inefficient. In many cases, taking into account the definitions of the

considered distributions, we can determine procedures for generating samples in a more efficient way.

For instance, an Erlang distribution with k stages of rate , can be computed by summing up k samples

generated from an exponential distribution with parameter λ.

The HypoExponential distribution, can be generated by summing up n samples from n exponential

distributions, each one characterized by its rate .

The Hyper exponential can be generated by first using a discrete random variable, characterized by

probabilities p1 ... pn, to select a stage i. Then, the corresponding exponential distribution of parameter λi

can be used to determine the final sample. This technique always requires two samples, u1 and u2,

regardless of the number of stages n.

A similar procedure can be used also for the HyperErlang, combining both the discrete distribution to select

the component, and the technique seen for the Erlang distribution to determine the final sample from the

selected stage.

DISTRIBUTIONS WITH GIVEN STATISTICS

In some cases, we want to find a distribution with a given average and variance, maybe even skewness and

kurtosis. We will now see a set of distributions whose parameters can be directly derived from these

values. In this case, it is generally not necessary performing fitting, but we compute parameters starting

from the statistics of the trace.

NORMAL

The Normal (or Gaussian) distribution is defined by its mean and its variance.

, erf=error function

It is very important since, for the central limit theorem, the sum of a large number of independent random

variables tends to a Normal distribution.

When µ=0 and =1, the distribution is called Standard Normal. If N is a

Standard Normal distribution, then we can obtain normal distributions with different average and variance

with a simple linear transform:

NORMAL RANDOM VARIABLES GENERATION

The Box-Muller method can be used to generate two independent samples of a Standard Normal

distribution starting from two random variables u and v in the range [0,1].

TRUNCATED NORMAL

The main problem with the Normal distribution is that it has an infinite support which can produce

negative numbers that are not meaningful as service or inter-arrival times.

The Truncated Normal distribution discards all the values that are negative when sampling a conventional

Normal distribution. The truncated distribution, however, has different mean and variance

with respect to the original Normal distribution. It can still be a good

approximation if the c.v. is not too large, since the difference between

the two values can be negligible. To be more precise, solving a non-linear system of equations,

it would be possible to obtain parameters of a truncated

Normal distribution to match a given average and standard

deviation (if possible).

14/10/2022

Generation of samples from a truncated normal distribution can be done with a simple rejection

algorithm:

LOG NORMAL

The Log Normal distribution, is computed by using a Normal distribution sample (with a given average and

standard deviation) as the exponent for the exponential function.

Thanks to the properties of the exponential, it produces only positive numbers. Its PDF, CDF, average and

variance are:

Given a target average E[X] and coefficient of variation cv, its parameters can be computed in the following

way: Random numbers can then be generated by simply applying its

definition, starting from a normally distributed random value (generated for example with the Box-Muller

technique).

THE PEARSON FAMILY OF DISTRIBUTIONS

As the Normal distribution is defined by its first two moments, the Pearson family of distributions can be

defined by the first four moments. It is a family of distributions that includes many other cases (such as the

Exponential, the Gamma, and the Uniform), which are selected depending on their parameters.

Distributions belonging to the family are defined by four parameters (a, b0, b1 and b2), and their PDF

satisfies the following differential equation:

Parameters a, b0, b1 and b2 can be related to the first four standardized moments with analytical formulas

that considers two values:

• b1 corresponding to the square of the Skewness

• b2 denoting the Kurtosis (fourth standardized moment, corresponding to the excess Kurtosis without the

“-3”)

The proposed differential equation can be easily solved, and a closed form expression for the distribution

can be obtained:

However, the integral inside the exponential, can assume many different expressions, depending on the

roots r1 and r2 (which might also be complex numbers) of , that can simplify the expression in

many cases.

Since the probability distribution must always be positive, the roots (whether they are real or complex)

determine the support of the distribution, which can vary between [-oo,+oo], [l, +oo] or [-oo,u], or even [l,

u] . Twelve different special cases have been outlined, and they are usually represented in a chart as

functions of the moments b1 and b2.

CHECKING RESULTS AND CONFIDENCE INTERVALS

CHECKING DISTRIBUTIONS

Once a distribution to match data acquired in trace has been identified, and the corresponding

parameters have been fit, it must be analyzed to ensure that the process returned a correct result.

Goodness of fit techniques aim at quantifying the effectiveness of fitting process. Many techniques exist,

each one with its goals and use case. However, in performance evaluation, the choice of proper one

depends on the specific type of system being modelled: a complete description is thus too advanced and

outside the scope of this course. Here we will only present a couple of simple graphical procedures.

Q-Q PLOT AND P-P PLOT

The Q-Q plot is a graph that allows comparing two distributions X and Y using their CDF. In particular, the

(x,y) coordinates of the graph are defined in this way:

Similarly, the P-P plot is defined as follows:

If the two distributions are identical, the graphs are aligned over the 45° line y = x.

P-P plots have both axis in the [0,1] range, while Q-Q plots considers the domain of the distributions.

If the distributions are diff

Anteprima

Vedrai una selezione di 20 pagine su 107