CPSC 330 Lecture 15: K-Means

Andrew Roth (Slides adapted from Varada Kolhatkar and Firas Moosvi)

Announcements

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all of the following statements which are TRUE.

1. I’m happy with my progress and learning in this course.
1. I find the course content interesting, but the pace is a bit overwhelming. Balancing this course with other responsibilities is challenging
1. I’m doing okay, but I feel stressed and worried about upcoming assessments.
1. I’m confused about some concepts and would appreciate more clarification or review sessions.
1. I’m struggling to keep up with the material. I am not happy with my learning in this course and my morale is low :(.

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all of the following statements which are TRUE.

1. L2 regularized logistic regression sets correlated coefficients to zero *
1. Lasso regression predicts values using ŷ_i = ∑_jw_jx_ij + b *
1. Lasso and Ridge regression optimize different measures of “error” between predicted and observe target values *
1. KNN and SVM RBF can be used with SelectFromModel
1. I saw way to much math today! *

* Denotes optional details only covered in our section

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all of the following statements which are TRUE.

1. Simple association-based feature selection approaches do not take into account the interaction between features.
1. You can carry out feature selection using linear models by pruning the features which have very small weights (i.e., coefficients less than a threshold).
1. The order of features removed given by rfe.ranking_ is the same as the order of original feature importances given by the model.
1. If you remove 10 features in a single step based on feature importance, the same 10 features would be removed if we performed sequential removal calculating feature importance after removing each feature.
1. Forward search is guaranteed to find the best feature set. *

* Denotes optional question

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all that are true.

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all that are true.

1. K-Means is sensitive to initialization and the solution may change depending upon the initialization.
1. K-means terminates when the number of clusters does not increase between iterations.
1. K-means terminates when the centroid locations do not change between iterations.
1. K-Means is guaranteed to find the optimal solution.

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all that are TRUE.

1. If you train K-Means with n_clusters= the number of examples, the inertia value will be 0.
1. The elbow plot shows the tradeoff between within cluster distance and the number of clusters.
1. Unlike the Elbow method, the Silhouette method is not dependent on the notion of cluster centers.
1. The elbow plot is not a reliable method to obtain the optimal number of clusters in all cases.
1. The Silhouette scores ranges between -1 and 1 where higher scores indicates better cluster assignments.