Very similar to regression trees except it is used to predict a qualitative response rather than a quantitative one
We predict that each observation belongs to the most commonly occuring class of the training observations in a given region
We use recursive binary splitting to grow the tree
Instead of RSS, we can use:
Gini index: \(G = \sum_{k=1}^K \hat{p}_{mk}(1-\hat{p}_{mk})\)
This is a measure of total variance across the \(K\) classes. If all of the \(\hat{p}_{mk}\) values are close to zero or one, this will be small
The Gini index is a measure of node purity small values indicate that node contains predominantly observations from a single class
In R
, this can be estimated using the gain_capture()
function.
Age
, Sex
, Chol
, etc)How many folds do I have?
What \(\alpha\)s am I trying?
Dr. Lucy D’Agostino McGowan adapted from slides by Hastie & Tibshirani