# Esercitazione pharmacy

Esame di Data mining docente Prof. I. D'Attoma

SECTION 2 [max 20 points]

Use the dataset "pharmacy_2a" and the related questionnaire "pharmacy_questionnaire" (ID=ID)

Exercise 1:

Consider the variables from d11_1 to d11_19:

[max 2 points] Considering only the question d11_1, what is the most frequent value expressed?

a) Report the frequency and motivate how that value is obtained.

D11_1

D11_1 Frequency Percent Cumulative Cumulative

Frequency Percent

19 9.84 19 9.84

1 8 4.15 27 13.99

2 10 5.18 37 19.17

3 15 7.77 52 26.94

4 20 10.36 72 37.31

5 24 12.44 96 49.74

6 34 17.62 130 67.36

7 40 20.73 170 88.08

8 10 5.18 180 93.26

9 13 6.74 193 100.00

10

Through the proc freq procedure we find out that the most frequent value expressed is 8 (relative

nj

frequency= 20.73). The relative frequency (Fj) is given by . Where nj indicates the number of

n

times the category 8 appears in the data and N is the sum of all the absolute frequencies. Fj indicates

the proportion for each category. While the absolute frequency (n ) is equal to 40.

j

[max 2 points] Motivate and document why in this case a PCA could be performed.

b) proc corr data=base.pharmacy_2a out=corr;

var d11_1­­d11_19;

run;

PCA allows to reduce the number of variables with a minimum loss of information, obtaining new

variables (PC) different from the originals one.

Esame: Data mining
Corso di laurea: Corso di laurea magistrale in economia e management
SSD:
Università: Bologna - Unibo
A.A.: 2017-2018

