Appunti per l'esame di Recommender Systems

Appunti presi durante il corso di Recommender Systems, che ho completato con successo con una votazione finale di 26. Ho seguito attentamente le lezioni e preso appunti dettagliati per catturare …

Esame Recommender systems

Facoltà Ingegneria dell'informazione

Dal corso del Prof. Cremonesi Paolo

Università Politecnico di Milano

Publisher nicole_perrotta

A.A. 2022-2023

42 pagine

Appunti esame

Vota

Scarica

Estratto del documento

AVARAGE RECIPROCAL HIT-RANK

The Average Reciprocal Hit-Rank, ARHR, is a modified version of recall, the denominator is the same, for

the numerator, each item is "weighted" by a fraction.

1 divided by the ranking of i, that is the position of the item in the list. The weight is equal to 1 if the item is

in position 1; 0.5 if the item is in position 2; and so on. This is a useful

metric, but technically speaking it is not a ranking in strict sense, since it

does not compare the ranking provided by the user with the ranking

provided by the system.

22/09/2022

CONTENT BASED FILTERING

In content-based filtering, we recommend items based on their attributes. In other words, we want to

understand how similar are two items based on their attributes. Compare items based on their

attribute (liking films with the same film). The main assumption, at the foundation of content-based

methods, is that user that expressed a preference for an item, will probably like similar items.

In order to represent the attributes of a pool of items, we use

the Item Content Matrix., or ICM (row=item, column=attribute

of the items). For example, an attribute can be the presence of

the actor Harrison Ford, for the movie Star Wars. In each cell of

the matrix, we can find a value that is either zero, or one.

If a cell contains a one, it means that the item i has the attribute a.

MEASURING SIMILARITY

How do we measure the similarity between two items i and j, based on their attributes? We can look at the

items i and j, as two vectors. Each vector has a number of elements, which is equal to the total number of

attributes, available in the ICM. Vectors are binary vectors. In other words, values in the vectors can be

either 0, or 1. The value 1 means that the item has such an attribute, while the value 0 means that the item

doesn’t have that attribute. In this example the 0 value are left empty, to maintain the figure clearer.

If two items have many attributes in common,

we can assume that they are very similar. In a

more formal way, the number of common

elements between two binary vectors can be

computed by using the Dot Product, where, the

result of the similarity is the product of the two

items’ vector.

COSINE SIMILARITY In same cases, the similarity can be improved by normalizing

the dot product. We can take the dot product between the

two vectors and normalize it with the length of the two

vectors. The similarity computed this way is the cosine of the

two vectors of attributes. In a graphical way, we can see that

the angle of the cosine similarity is the angle included between the two vectors, that are the two items. The

more i and j are similar, the more the cosine is large

SUPPORT: THE SIZE OF THE SAMPLE Another important concept is the Support. The support

of a vector is the number of non-zero elements in the

vector. For example, in the matrix on the left, there is a

small support. While in the matrix on the right, the

support is larger. Looking at the similarity between the

items in the two matrixes, trough the cosine similarity

given before, we say that the items in the first matrix

are more similar to each other, compared to the items

in the second matrix. But is this true? Or we have to take in account something else? (sx: small support, dx:

large support)

SHRINKING

In order to reduce similarity, so that we can take into account only most similar with large support, we

introduce a new variable in the cosine similarity. This new variable is a constant, and it is called Shrink

Term.

Coming back to the previous matrixes, and choosing 3 as a Shrink Term, we can now notice that the two

similarity have changed. 0.25<0.5, i=rows, j=columns

Now, the items into the second matrix are more similar than the one contained in the first one. This

happens because, if the denominator of the cosine similarity is large, the shrink term will have a smaller

effect, with respect to the situation in which the denominator is small.

SIMILARITY MATRIX

The similarities computed across all pairs of items constitute the similarity matrix. An

element i j in the matrix describes how much item i is similar to item j. The similarity

matrix, computed with the dot product or the shrinked cosine, is a symmetric matrix.

In other words, the similarity between i and j is equal to the similarity between j and i.

ESTIMATING RATING

Once you have the similarity matrix, you can estimate the rating of user u on item i, by relying on the past

ratings from the same user on other items j. This estimation can be done by using this equation. Let’s

analyse it! The rating that user u will give to an item i is equal to the summation of the

user’s past rating on items j, multiplied by the similarity between the item j

and i, and is all divided by the normalization. This normalization is useful if

we want to estimate the ratings as accurately as possible. However, if your

goal is a Top-N recommendation, you may leave the normalization out. Top-N task: recommend the N

items with the highest rating.

MATRIX NOTATION

In this section, we will describe content based filtering, by using a matrix notation that helps us in writing

the equations of our recommender models in a more compact way. As we have seen, we can estimate the

rating of user u on item i, by relying on the past ratings from the same user on other items j. We can

rewrite the formula saying that the summation of the ratings can be seen as a vector, and the similarity as a

Matrix. Where the vector is the whole user profile, not just for the element j like before. Also the similarity,

now, is no more just the similarity between item i and j, indeed, it is the entire items similarity matrix. We

can generalize this formula even more. As it is shown, we can extend this formula for all the users. Where,

the estimated rating Matrix is equal to the product of the User Rating Matrix and the Similarity Matrix.

K-NEAREST NEIGHBOURS (KNN)

The K-Nearest Neighbours: technique that is used to simplify and get the most out of the use of a similarity

matrix. The similarity matrix is a dense matrix (few empty cells). This causes the matrix to be heavy in a

computational sense. Having a big data structure is expensive, in both terms of memory and time.

Moreover, the similarity matrix contains values that are mostly small. This can cause the presence of a lot

of noise in the data. In fact, the similarity values are for the most part small, and very similar to each other.

In this way, it becomes difficult to distinguish items through the similarity values. Overall, these presented

problems lead to a lower quality when making recommendations using a similarity matrix.

The solution that we introduce is the K Nearest-

Neighbors Technique. This method consists in

keeping only the K most similar couples of items in

the similarity matrix. As K, we intend an integer

number, of course. As pointed out in the example,

with K equals to two, only the green similarity

values are kept, in the matrix. The formula for the estimation of the ratings now keeps track of the “K

nearest neighbours”. In practice, the estimated ratings are calculated on the basis of the items j, that are

part of the “k nearest neighbours” of item i.

THE CHOICE OF K

The quality of the recommender algorithm depends on the value of K. If K is too small, the model doesn’t

have enough data to make reliable estimations. On the other hand, if the value of K is taken too big, the

data will contain too much noise.

NON-BINARY ATTRIBUTES

One main improvement can be done by introducing Non-Binary attributes. We have seen that an item can

have or not specific attributes. However, there could be some middle cases. What if, for example, Back to

the Future 3 is both a Science fiction and Western film? Using Binary weight, we have to say that it has the

attribute of Science fiction as well the one of Western. But is it right? I think that is more Science fiction

than Western, but how can I depict this difference in the ICM? Using Non Binary Weights. Now, I can

specify how much “Back to the future 3” is a Science fiction and how much is a western. The Item-Content

Matrix can be further improved, by the introduction of the "attribute weights".

As we can see, movie 1 and 3 has some

attributes in common: similar title, actors and

directors. But different year of production and

cost. Are these attributes so important in the

comparison of movies? In this same example,

there are movies 1 and 2 that are similar for the

year and cost, however, they are really different

for title, actors and directors. So, is the first one

more similar with the second or with the third one? We can decide it, if we add attribute weights. Here, for

example, we consider the attribute "title", which has 0.8 of weight, more important than the "year"

attribute, that has only 0.5 of weight.

TF-IDF TECHNIQUES

The techniques presented are used to automatically adjust the weights of attributes in the Item Content

Matrix. The TF-IDF is given by the product of two terms, the Term Frequency (TF), and the Inverse

Document Frequency (IDF). The main principle of these techniques is to balance the weights of the

attributes depending on their frequency of appearance in the items.

The first component is the TF (Term Frequency): with N the apparances of attribute a in item

a,i

i that is often equal to 1 and this can give, as result, a lot of attributes with very little term frequency values

and N the total number of attribute of item i.

The second component is the IDF (Inverse Document Frequency): with N the total

items

number of items and N the number of items with attribute a. It aims at solving the problem of little values

of Term Frequency, for rare attributes.

Example using TF as technique: For the highlighted item I (3° rows), the TF for the attribute a

has value of 1/3, but in this other case, for the highlighted

item i, the TF for the attribute c has value of zero. with the

item j, the TF for the attribute a has value of 1 divided by 6.

If the analyzed item has many attributes, the weight of the

single attribute becomes small.

Example using TF as technique for normalizing the weights:

Let's start by computing the IDF for the

attribute a. The result is zero. In fact, if the

attribute has value 1 in all the items, it has no

informative content! Analyzing the IDF for the

attribute b, we can see that the value becomes

different from zero. Finally, for the attribute c,

the value of IDF is 0.6. In conclusion, the

product between TF and IDF gives a more balanced values for the weights of the attributes.

12/10/2022

ITEM SIMILARITY: IMPLICIT RATINGS

In the item-based collaborative filtering technique, the idea is to calculate the similarity between each pair

of items according to how many users have rated them both. Then we use ratings specified by the user for

that item, to predict if he or she will like the target items.

Anteprima

Vedrai una selezione di 10 pagine su 42