CPSC 330 Lecture 17: Recommendation systems

Andrew Roth (Slides adapted from Varada Kolhatkar and Firas Moosvi)

Announcements

Select all of the following statements which are True

iClicker join link: https://join.iclicker.com/HTRZ

1. In hierarchical clustering we do not have to worry about initialization.
1. Hierarchical clustering can only be applied to smaller datasets because dendrograms are hard to visualize for large datasets.
1. In all the clustering methods we have seen (K-Means, DBSCAN, hierarchical clustering), there is a way to decide the granularity of clustering (i.e., how many clusters to pick).
1. To get robust clustering we can naively ensemble cluster labels (e.g., pick the most popular label) produced by different clustering methods.
1. If you have a high Silhouette score and very clean and robust clusters, it means that the algorithm has captured the semantic meaning in the data of our interest.

What percentage of watch time on YouTube do you think comes from recommendations?

This question is based on this source. The statistics might have changed now.

Select all of the following statements which are True

1. In the context of recommendation systems, the shapes of validation utility matrix and train utility matrix are the same.
1. RMSE perfectly captures what we want to measure in the context of recommendation systems.
1. It would be reasonable to impute missing values in the utility matrix by taking the average of the ratings given to an item by similar users.
1. In KNN type imputation, if a user has not rated any items yet, a reasonable strategy would be recommending them the most popular item.

Select all of the following statements which are True

In content-based filtering we leverage available item features in addition to similarity between users.
In content-based filtering you represent each user in terms of known features of items.
In the set up of content-based filtering we discussed, if you have a new movie, you would have problems predicting ratings for that movie.
In content-based filtering if a user has a number of ratings in the training utility matrix but does not have any ratings in the validation utility matrix then we won’t be able to calculate RMSE for the validation utility matrix.