DESCRIPTIVE STATISTICS
DEFINITION
-
- Population : Total collection of all the elements that we are interested in
- Statistical units : Single elements of the population
- Variable : Features or characteristics of statistical units
- Sample : Subgroup of the population that we are able to study in detail
VARIABLES TYPE
↙ ↘
Quantitative (numerical) Qualitative (categorical)
↙
↙ ↘
↘
continuous discrete ordinal nominal
(weight, height, age) (number of) (disag., neut., ag.) (alive or dead)
mean yes no no
median yes yes no
mode yes yes yes
graph line graph, bar plot, frequency polygon bar chart (abs. or rel. f.) or pie chart (rel. f.)
CLASSES
- Pairs of values that have some relationship to each other → (x, y)
x qualitative and y quantitative → distinct histograms, one for each category of x
→ x and y quantitative → scatter plot (↗ positive | ↘ negative correlation)
→
- GROUPED DATA : Classes interval use : [ ; ) → “having n to n means that n - n is the interval”
1 2 2 1
Take the mid value in order to compute the sample mean [n = mid-value = average of the extremes]
→ FREQUENCY
SYMMETRY - ↘
↙
x is symmetric : Frequency : Relative frequency :
0
- f
frequencies x - c = x + c for any c f = n · w
→ 0 0 =
w
mode = m = ×
→ n
x SAMPLE STATISTICS
- CENTRALITY MEASURES VARIATION - SPREAD
- -
↘
↓
↙ SAMPLE MEDIAN SAMPLE MODE
SAMPLE MEAN SAMPLE VARIANCE STANDARD DEVIATION
- the data value that
N ni=1 2 s2
Σ x Order from x2i
(x + ... + xn) 1.
i
i=1 s =
1 Σ - n×
occurs most
× = = 2
smallest to larger s =
n n frequently n-1
n odd : x
→ -
1
n/2 +1
n - 1 +1
2 POPULATION
n even : sample covariance
→ SAMPLE RANGE
weighted mean n n×y2
× → µ maximum - minimum
x + x Σ x y -
k out
n Othello
n/2 n/2(+1)
n i
i=1 i
+1
Σ f x 2 2
i i
i=1 s → σ
ki=1 sxy =
2 2 larger range → l. var.
× = = Σ wi xi 2 n-1
P → p
^
n MEAN - MEDIAN RELATIONSHIP CORRELATION
sxy Σ xi yi - n · × y
symmetric right-skewed lef-skewed r = =
× = m × > m × < m sx · sy
x x x 2) y2)
x2i
(Σ - n · × · (Σ yi2 - n ·
BOX-PLOT
- variability index
1. Median 2. First-third quartile 3. IQR 4. Whiskers
→ n/2 or n · 0.5 → 25 p. = Q = n · 0.25 → IQR = 75 p. - 25 p. → LW = 25 p. - 1.5 · IQR
th th th th
1
→ 75 p. = Q = n · 0.75 → UW = 75 p. + 1.5 · IQR
th th
3
SAMPLE PERCENTILES LINEAR TRANSFORMATIONS
- -
- NORMAL DATA
-
- Data set normal if histogram has :
Highest at the middle interval
→ (mode = sample mean = median)
Bell-shaped
→ Symmetry in middle interval
→ PROBABILITY THEORY
RANDOM VARIABLES
- Support S : set of possible values which X can take
x
- DISCRETE RANDOM VARIABLES : take only integer values CONTINUOUS RANDOM VARIABLES : real (decimal) values
Probability (discrete r.v) function Density (continuous r.v) function
→ →
Pr(X = x,Y = y) = Pr(X = x)·Pr(Y = y) f (x,y) = f (x) · f (y)
→ → X,Y X Y
PROPERTIES E(·) and Var(·) NORMAL R.V.
-
Statistics for experiments
-
Statistics for business decision making + Formulario
-
Appunti Statistics for experiments in italiano
-
Appunti Statistics for experiment in inglese