Lecture 4: \(k\)-nearest neighbours and SVM RBFs
Andrew Roth (Slides adapted from Varada Kolhatkar and Firas Moosvi)
Announcements
- Homework 2 due Jan 20
- Syllabus quiz due date Jan 24
- The lecture notes within these notebooks align with the content presented in the videos. Even though we do not cover all the content from these notebooks during lectures, it’s your responsibility to go through them on your own.
Learning outcomes
From this lecture, you will be able to
- Describe the curse of dimensionality
- Explain the notion of similarity-based algorithms
- Describe how \(k\)-NNs and SVMs with RBF kernel work
- Describe the effects of hyper-parameters for \(k\)-NNs and SVMs
Recap
Which of the following scenarios do NOT necessarily imply overfitting?
- Training accuracy is 0.98 while validation accuracy is 0.60.
- The model is too specific to the training data.
- The decision boundary of a classifier is wiggly and highly irregular.
- Training and validation accuracies are both approximately 0.88.
Recap
Which of the following statements about overfitting is true?
- Overfitting is always beneficial for model performance on unseen data.
- Some degree of overfitting is common in most real-world problems.
- Overfitting ensures the model will perform well in real-world scenarios.
- Overfitting occurs when the model learns the training data too closely, including its noise and outliers.
Recap
How might one address the issue of underfitting in a machine learning model.
- Introduce more noise to the training data.
- Remove features that might be relevant to the prediction.
- Increase the model’s complexity, possibly by adding more parameter or features
- Use a smaller dataset for training.
Overfitting and underfitting
- An overfit model matches the training set so closely that it fails to make correct predictions on new unseen data.
- An underfit model is too simple and does not even make good predictions on the training data
![]()
Source
Recap
- Why do we split the data? What are train/valid/test splits?
- What are the benefits of cross-validation?
- What’s the fundamental trade-off in supervised machine learning?
- What is the golden rule of machine learning?
Cross validation
Summary of train, validation, test, and deployment data
| Train |
✔️ |
✔️ |
✔️ |
| Validation |
|
✔️ |
✔️ |
| Test |
|
once |
once |
| Deployment |
|
|
✔️ |
Recap: The fundamental tradeoff
As you increase the model complexity, training score tends to go up and the gap between train and validation scores tends to go up.
Motivation
![]()