CPSC 330 Lecture 15: K-Means

Andrew Roth (Slides adapted from Varada Kolhatkar and Firas Moosvi)

Announcements

  • MT2 registartion open. See Piazza.
  • Final exam dates and registration posted. See Piazza.
  • MT1 marked. Viewing in CBTF bookings open. See Piazza.
  • HW5 is due March 10th (start now!)
  • HW6 is posted

(iClicker) Midterm poll

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all of the following statements which are TRUE.

    1. I’m happy with my progress and learning in this course.
    1. I find the course content interesting, but the pace is a bit overwhelming. Balancing this course with other responsibilities is challenging
    1. I’m doing okay, but I feel stressed and worried about upcoming assessments.
    1. I’m confused about some concepts and would appreciate more clarification or review sessions.
    1. I’m struggling to keep up with the material. I am not happy with my learning in this course and my morale is low :(.

Class notebook (recap)

(iClicker) Exercise 14.2

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all of the following statements which are TRUE.

    1. L2 regularized logistic regression sets correlated coefficients to zero *
    1. Lasso regression predicts values using i = ∑jwjxij + b *
    1. Lasso and Ridge regression optimize different measures of “error” between predicted and observe target values *
    1. KNN and SVM RBF can be used with SelectFromModel
    1. I saw way to much math today! *

* Denotes optional details only covered in our section

(iClicker) Exercise 14.3

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all of the following statements which are TRUE.

    1. Simple association-based feature selection approaches do not take into account the interaction between features.
    1. You can carry out feature selection using linear models by pruning the features which have very small weights (i.e., coefficients less than a threshold).
    1. The order of features removed given by rfe.ranking_ is the same as the order of original feature importances given by the model.
    1. If you remove 10 features in a single step based on feature importance, the same 10 features would be removed if we performed sequential removal calculating feature importance after removing each feature.
    1. Forward search is guaranteed to find the best feature set. *

* Denotes optional question

Class notebook

15.1 Select all of the following statements which are True (iClicker)

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all that are true.

    1. K-Means algorithm always converges to the same solution.
    1. K in K-Means should always be # of features.
    1. In K-Means, it makes sense to have K # of examples.
    1. In K-Means, in some iterations some points may be left unassigned.

15.2 Select all of the following statements which are True (iClicker)

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all that are true.

    1. K-Means is sensitive to initialization and the solution may change depending upon the initialization.
    1. K-means terminates when the number of clusters does not increase between iterations.
    1. K-means terminates when the centroid locations do not change between iterations.
    1. K-Means is guaranteed to find the optimal solution.

Class notebook

15.3 Select all of the following statements which are True (iClicker)

iClicker cloud join link: https://join.iclicker.com/HTRZ

Select all that are TRUE.

    1. If you train K-Means with n_clusters= the number of examples, the inertia value will be 0.
    1. The elbow plot shows the tradeoff between within cluster distance and the number of clusters.
    1. Unlike the Elbow method, the Silhouette method is not dependent on the notion of cluster centers.
    1. The elbow plot is not a reliable method to obtain the optimal number of clusters in all cases.
    1. The Silhouette scores ranges between -1 and 1 where higher scores indicates better cluster assignments.

Class demo