
From this lecture, you will be able to
max_depth hyperparameter and this relates to model complexityIn machine learning we want to learn a mapping function from labeled data so that we can predict labels of unlabeled data.
For example, suppose we want to build a spam filtering system. We will take a large number of spam/non-spam messages from the past, learn patterns associated with spam/non-spam from them, and predict whether a new incoming message in someone’s inbox is spam or non-spam based on these patterns.
So we want to learn from the past but ultimately we want to apply it on the future email messages.
Select the TRUE statement.

| ml_experience | class_attendance | lab1 | lab2 | lab3 | lab4 | quiz1 | quiz2 | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 92 | 93 | 84 | 91 | 92 | A+ |
| 1 | 1 | 0 | 94 | 90 | 80 | 83 | 91 | not A+ |
| 2 | 0 | 0 | 78 | 85 | 83 | 80 | 80 | not A+ |
| 3 | 0 | 1 | 91 | 94 | 92 | 91 | 89 | A+ |




Rule 1: If class_attendance == 1 then grade is A+.
Rule 2: If lab3 > 83.5 and quiz1 <= 83.5 and lab2 <= 88 then quiz2 grade is A+
Think about these questions on your own or discuss them with your friend/neighbour.
To generalize beyond what we see in the training examples
We only have access to limited amount of training data and we want to learn a mapping function which would predict targets reasonably well for examples beyond this training data.
… But we do not have access to the entire distribution 😞
A common way is data splitting.
fit (train) a model on the training portion only.score (assess) the trained model on this set aside data to get a sense of how well the model would be able to generalize.train_test_split| Data portion | Shape |
|---|---|
| X | (21, 7) |
| y | (21,) |
| X_train | (16, 7) |
| y_train | (16,) |
| X_test | (5, 7) |
| y_test | (5,) |
max_depth=2)
Train error: 0.125
Test error: 0.400
max_depth=6)
Train error: 0.000
Test error: 0.600
fit |
score |
predict |
|
|---|---|---|---|
| Train | ✔️ | ✔️ | ✔️ |
| Validation | ✔️ | ✔️ | |
| Test | once | once | |
| Deployment | ✔️ |
You can typically expect \(E_{train} < E_{validation} < E_{test} < E_{deployment}\).
iClicker cloud join link: https://join.iclicker.com/HTRZ
Select all of the following statements which are TRUE.
max_depth in sklearn) is likely to perform very well on the deployment data.scikit-learnfrom sklearn.model_selection import cross_val_score, cross_validate
model = DecisionTreeClassifier(max_depth=4)
cv_scores = cross_val_score(model, X_train, y_train, cv=4)
cv_scoresarray([0.5 , 0.75, 0.5 , 0.75])
Average cross-validation score = 0.62
Standard deviation of cross-validation score = 0.12
train_test_split.
X and target yX_train, y_train, X_test, y_testX_train and y_train.X_test and y_test.Imagine that your train and validation errors do not align with each other. How do you diagnose the problem?
We’re going to think about 4 types of errors:
Train error: 0.229
Validation error: 0.438
Train error: 0.000
Validation error: 0.438
As you increase model complexity, \(E_\textrm{train}\) tends to go down but \(E_\textrm{valid}-E_\textrm{train}\) tends to go up.
There are many subtleties here and there is no perfect answer but a common practice is to pick the model with minimum cross-validation error.
Splitting: Before doing anything, split the data X and y into X_train, X_test, y_train, y_test or train_df and test_df using train_test_split.
Select the best model using cross-validation: Use cross_validate with return_train_score = True so that we can get access to training scores in each fold. (If we want to plot train vs validation error plots, for instance.)
Scoring on test data: Finally score on the test data with the chosen hyperparameters to examine the generalization performance.
Again, there are many subtleties here we’ll discuss the golden rule multiple times throughout the course.
iClicker cloud join link: https://join.iclicker.com/HTRZ
Select all of the following statements which are TRUE.
Copy this notebook to your working directory and follow along.
