Why?
How could we do this?
What is the difference? Which is typically larger?
💡 Let’s instead find a way to estimate the test error by holding out a subset of the training observations from the model fitting process, and then applying the statistical learning method to those held out observations
If we have a quantitative predictor what metric would we use to calculate this test error?
If we have a qualitative predictor what metric would we use to calculate this test error?
\[\Large\color{orange}{MSE_{\texttt{test-split}} = \textrm{Ave}_{i\in\texttt{test-split}}[y_i-\hat{f}(x_i)]^2}\]
\[\Large\color{orange}{Err_{\texttt{test-split}} = \textrm{Ave}_{i\in\texttt{test-split}}I[y_i\neq \mathcal{\hat{C}}(x_i)]}\]
Auto example:
mpg
from horsepower
.\(\color{orange}{MSE_{\texttt{test-split}}}\)
\(\color{orange}{MSE_{\texttt{test-split}}}\)
\(\color{orange}{MSE_{\texttt{test-split}}}\)
\(\color{orange}{MSE_{\texttt{test-split}}}\)
Auto example:
mpg
from horsepower
.💡 The idea is to do the following:
\(\color{orange}{MSE_{\texttt{test-split-1}}}\)
\(\color{orange}{MSE_{\texttt{test-split-2}}}\)
\(\color{orange}{MSE_{\texttt{test-split-3}}}\)
\(\color{orange}{MSE_{\texttt{test-split-4}}}\)
Take the mean of the \(k\) MSE values
Application Exercise
If we use 10 folds:
02:00
\[\dots\]
Auto example:
mpg
from horsepower
Dr. Lucy D’Agostino McGowan adapted from slides by Hastie & Tibshirani