vuoi
o PayPal
tutte le volte che vuoi
SECTION 2 [max 20 points]
Use the dataset "pharmacy_2a" and the related questionnaire "pharmacy_questionnaire" (ID=ID)
Exercise 1:
Consider the variables from d11_1 to d11_19:
[max 2 points] Considering only the question d11_1, what is the most frequent value expressed?
a) Report the frequency and motivate how that value is obtained.
D11_1
D11_1 Frequency Percent Cumulative Cumulative
Frequency Percent
19 9.84 19 9.84
1 8 4.15 27 13.99
2 10 5.18 37 19.17
3 15 7.77 52 26.94
4 20 10.36 72 37.31
5 24 12.44 96 49.74
6 34 17.62 130 67.36
7 40 20.73 170 88.08
8 10 5.18 180 93.26
9 13 6.74 193 100.00
10
Through the proc freq procedure we find out that the most frequent value expressed is 8 (relative
nj
frequency= 20.73). The relative frequency (Fj) is given by . Where nj indicates the number of
n
times the category 8 appears in the data and N is the sum of all the absolute frequencies. Fj indicates
the proportion for each category. While the absolute frequency (n ) is equal to 40.
j
[max 2 points] Motivate and document why in this case a PCA could be performed.
b) proc corr data=base.pharmacy_2a out=corr;
var d11_1d11_19;
run;
PCA allows to reduce the number of variables with a minimum loss of information, obtaining new
variables (PC) different from the originals one.