From this lecture, students are expected to be able to:
predict works for linear regressionalpha hyperparameter of Ridge is related to the fundamental tradeoffscikit-learn’s Ridge modelscikit-learn’s LogisticRegression model and predict_proba to get probability scoressklearn CountVectorizerscikit-learn’s CountVectorizer to encode text dataCountVectorizer: Transforms text into a matrix of token countsmax_features: Control the number of features used in the modelmax_df, min_df: Control document frequency thresholdsngram_range: Defines the range of n-grams to be extractedstop_words: Enables the removal of common words that are typically uninformative in most applications, such as “and”, “the”, etc.iClicker cloud join link: https://join.iclicker.com/HTRZ
Select all of the following statements which are TRUE.
ColumnTransformer object to cross_validate.fit_transform on a ColumnTransformer object, you get a numpy ndarray.iClicker cloud join link: https://join.iclicker.com/HTRZ
Select all of the following statements which are TRUE.
handle_unknown="ignore" would treat all unknown categories equally.max_features hyperparameter of CountVectorizer the training score is likely to go up.CountVectorizer. If you encounter a word in the validation or the test split that’s not available in the training data, we’ll get an error.cross_validate, each fold might have slightly different number of features (columns) in the fold.X and y is linear.\[ y_{hat} = w_1 \times \text{# hours studied} + w_0\]
Ridge vs. LinearRegressionRidge adds a parameter to control the complexity of a model. Finds a line that balances fit and prevents overly large coefficients.LinearRegression
Ridge
Ridge.iClicker cloud join link: https://join.iclicker.com/HTRZ
Select all of the following statements which are TRUE.
alpha of Ridge is likely to decrease model complexity.Ridge can be used with datasets that have multiple features.Ridge, we learn one coefficient per training example.iClicker cloud join link: https://join.iclicker.com/HTRZ
Select all of the following statements which are TRUE.
C hyperparameter increases model complexity.sklearn because it tends to work better in many cases.kernel="linear" to create a linear SVM.predict method of linear SVM and logistic regression works the same way.coef_ associated with the features and intercept_ using a Linear SVM model