What are some examples of classification problems?
eye color
\(\in\) {blue, brown, green}
email
\(\in\) {spam, not spam}
set.seed(1)
Default |>
sample_frac(size = 0.25) |>
ggplot(aes(balance, income, color = default)) +
geom_point(pch = 4) +
scale_color_manual(values = c("cornflower blue", "red")) +
theme_classic() +
theme(legend.position = "top") -> p1
p2 <- ggplot(Default, aes(x = default, y = balance, fill = default)) +
geom_boxplot() +
scale_fill_manual(values = c("cornflower blue", "red")) +
theme_classic() +
theme(legend.position = "none")
p3 <- ggplot(Default, aes(x = default, y = income, fill = default)) +
geom_boxplot() +
scale_fill_manual(values = c("cornflower blue", "red")) +
theme_classic() +
theme(legend.position = "none")
grid.arrange(p1, p2, p3, ncol = 3, widths = c(2, 1, 1))
We can code Default
as
\[Y = \begin{cases} 0 & \textrm{if }\texttt{No}\\ 1&\textrm{if }\texttt{Yes}\end{cases}\] Can we fit a linear regression of \(Y\) on \(X\) and classify as Yes
if \(\hat{Y}> 0.5\)?
We can code Default
as
\[Y = \begin{cases} 0 & \textrm{if }\texttt{No}\\ 1&\textrm{if }\texttt{Yes}\end{cases}\] Can we fit a linear regression of \(Y\) on \(X\) and classify as Yes
if \(\hat{Y}> 0.5\)?
What may do a better job?
Default <- Default |>
mutate(
p = glm(default ~ balance, data = Default, family = "binomial") |>
predict(type = "response"),
p2 = lm(I(default == "Yes") ~ balance, data = Default) |> predict(),
def = ifelse(default == "Yes", 1, 0)
)
Default |>
sample_frac(0.25) |>
ggplot(aes(balance, p2)) +
geom_hline(yintercept = c(0, 1), lty = 2, size = 0.2) +
geom_line(color = "cornflower blue") +
geom_point(aes(balance, def), shape = "|", color = "orange") +
theme_classic() +
labs(y = "probability of default") -> p1
Default |>
sample_frac(0.25) |>
ggplot(aes(balance, p)) +
geom_hline(yintercept = c(0, 1), lty = 2, size = 0.2) +
geom_line(color = "cornflower blue") +
geom_point(aes(balance, def), shape = "|", color = "orange") +
theme_classic() +
labs(y = "probability of default") -> p2
grid.arrange(p1, p2, ncol = 2)
Which does a better job at predicting the probability of default?
What if we have \(>2\) possible outcomes? For example, someone comes to the emergency room and we need to classify them according to their symptoms
\[ \begin{align} Y = \begin{cases} 1& \textrm{if }\texttt{stroke}\\2&\textrm{if }\texttt{drug overdose}\\3&\textrm{if }\texttt{epileptic seizure}\end{cases} \end{align} \]
What could go wrong here?
stroke
and drug overdose
is the same as drug overdose
and epileptic seizure
)What if we have \(>2\) possible outcomes? For example, someone comes to the emergency room and we need to classify them according to their symptoms
\[ \begin{align} Y = \begin{cases} 1& \textrm{if }\texttt{stroke}\\2&\textrm{if }\texttt{drug overdose}\\3&\textrm{if }\texttt{epileptic seizure}\end{cases} \end{align} \]
\[ p(X) = \frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}} \]
\[ p(X) = \frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}} \]
We can rearrange this into the following form:
\[ \log\left(\frac{p(X)}{1-p(X)}\right) = \beta_0 + \beta_1 X \]
What is this transformation called?
Logistic regression ensures that our estimates for \(p(X)\) are between 0 and 1 🎉
Refresher: How did we estimate \(\hat\beta\) in linear regression?
Refresher: How did we estimate \(\hat\beta\) in linear regression?
In logistic regression, we use maximum likelihood to estimate the parameters
\[\mathcal{l}(\beta_0,\beta_1)=\prod_{i:y_i=1}p(x_i)\prod_{i:y_i=0}(1-p(x_i))\]
R
do the heavy lifting here# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -10.7 0.361 -29.5 3.62e-191
2 balance 0.00550 0.000220 25.0 1.98e-137
logistic_reg()
function in R with the glm
engineWhat is our estimated probability of default for someone with a balance of $1000?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.6513306 | 0.3611574 | -29.49221 | 0 |
balance | 0.0054989 | 0.0002204 | 24.95309 | 0 |
\[ \hat{p}(X) = \frac{e^{\hat{\beta}_0+\hat{\beta}_1X}}{1+e^{\hat{\beta}_0+\hat{\beta}_1X}}=\frac{e^{-10.65+0.0055\times 1000}}{1+e^{-10.65+0.0055\times 1000}}=0.006 \]
What is our estimated probability of default for someone with a balance of $2000?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.6513306 | 0.3611574 | -29.49221 | 0 |
balance | 0.0054989 | 0.0002204 | 24.95309 | 0 |
\[ \hat{p}(X) = \frac{e^{\hat{\beta}_0+\hat{\beta}_1X}}{1+e^{\hat{\beta}_0+\hat{\beta}_1X}}=\frac{e^{-10.65+0.0055\times 2000}}{1+e^{-10.65+0.0055\times 2000}}=0.586 \]
Let’s refit the model to predict the probability of default given the customer is a student
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -3.5041278 | 0.0707130 | -49.554219 | 0.0000000 |
studentYes | 0.4048871 | 0.1150188 | 3.520181 | 0.0004313 |
\[P(\texttt{default = Yes}|\texttt{student = Yes}) = \frac{e^{-3.5041+0.4049\times1}}{1+e^{-3.5041+0.4049\times1}}=0.0431\]
How will this change if student = No
?
\[P(\texttt{default = Yes}|\texttt{student = No}) = \frac{e^{-3.5041+0.4049\times0}}{1+e^{-3.5041+0.4049\times0}}=0.0292\]
\[\log\left(\frac{p(X)}{1-p(X)}\right)=\beta_0+\beta_1X_1+\dots+\beta_pX_p\] \[p(X) = \frac{e^{\beta_0+\beta_1X_1+\dots+\beta_pX_p}}{1+e^{\beta_0+\beta_1X_1+\dots+\beta_pX_p}}\]
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.8690452 | 0.4922555 | -22.080088 | 0.0000000 |
balance | 0.0057365 | 0.0002319 | 24.737563 | 0.0000000 |
income | 0.0000030 | 0.0000082 | 0.369815 | 0.7115203 |
studentYes | -0.6467758 | 0.2362525 | -2.737646 | 0.0061881 |
student
negative now when it was positive before?What is going on here?
\[P(Y=k|X) = \frac{e ^{\beta_{0k}+\beta_{1k}X_1+\dots+\beta_{pk}X_p}}{\sum_{l=1}^Ke^{\beta_{0l}+\beta_{1l}X_1+\dots+\beta_{pl}X_p}}\]
How would you get the odds from the log(odds)?
Form | Model |
---|---|
Logit form | \(\log\left(\frac{\pi}{1-\pi}\right) = \beta_0 + \beta_1x\) |
Probability form | \(\Large\pi = \frac{e^{\beta_0 + \beta_1x}}{1+e^{\beta_0 + \beta_1x}}\) |
probability | odds | log(odds) |
---|---|---|
\(\pi\) | \(\frac{\pi}{1-\pi}\) | \(\log\left(\frac{\pi}{1-\pi}\right)=l\) |
⬅️
log(odds) | odds | probability |
---|---|---|
\(l\) | \(e^l\) | \(\frac{e^l}{1+e^l} = \pi\) |
A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches.
What is the explanatory variable?
A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches.
What type of variable is this?
A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches.
What is the outcome variable?
A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches.
What type of variable is this?
A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches.
TMS | Placebo | Total | |
---|---|---|---|
Pain-free two hours later | 39 | 22 | 61 |
Not pain-free two hours later | 61 | 78 | 139 |
Total | 100 | 100 | 200 |
TMS | Placebo | Total | |
---|---|---|---|
Pain-free two hours later | 39 | 22 | 61 |
Not pain-free two hours later | 61 | 78 | 139 |
Total | 100 | 100 | 200 |
What if we wanted to calculate this in terms of Not pain-free (with pain-free) as the referent?
TMS | Placebo | Total | |
---|---|---|---|
Pain-free two hours later | 39 | 22 | 61 |
Not pain-free two hours later | 61 | 78 | 139 |
Total | 100 | 100 | 200 |
What changed here?
TMS | Placebo | Total | |
---|---|---|---|
Pain-free two hours later | 39 | 22 | 61 |
Not pain-free two hours later | 61 | 78 | 139 |
Total | 100 | 100 | 200 |
In general, it’s more natural to interpret odds ratios > 1, you can flip the referent to do so
TMS | Placebo | Total | |
---|---|---|---|
Pain-free two hours later | 39 | 22 | 61 |
Not pain-free two hours later | 61 | 78 | 139 |
Total | 100 | 100 | 200 |
\(\Large OR = \frac{78/22}{61/39} = \frac{3.545}{1.564} = 2.27\)
the odds for still being in pain for the placebo group are 2.27 times the odds of being in pain for the TMS group
Application Exercise
5000 women were enrolled in a study and were randomly assigned to receive either letrozole or a placebo. The primary response variable of interest was disease-free survival.
letrozole | placebo | total | |
---|---|---|---|
death or disease | 185 | 341 | 526 |
no death or disease | 2390 | 2241 | 4631 |
total | 2575 | 2582 | 5157 |
04:00
Let’s look at some Titanic data. We are interested in whether the passenger reported being female is related to whether they survived.
Female | Male | Total | |
---|---|---|---|
Survived | 308 | 142 | 450 |
Died | 154 | 709 | 863 |
Total | 462 | 851 | 1313 |
What are the odds of surviving for females versus males?
Female | Male | Total | |
---|---|---|---|
Survived | 308 | 142 | 450 |
Died | 154 | 709 | 863 |
Total | 462 | 851 | 1313 |
\[\Large OR = \frac{308/154}{142/709} = \frac{2}{0.2} = 9.99\]
How do you interpret this?
Female | Male | Total | |
---|---|---|---|
Survived | 308 | 142 | 450 |
Died | 154 | 709 | 863 |
Total | 462 | 851 | 1313 |
\[\Large OR = \frac{308/154}{142/709} = \frac{2}{0.2} = 9.99\] the odds of surviving for the female passengers was 9.99 times the odds of surviving for the male passengers
What if we wanted to fit a model? What would the equation be?
Female | Male | Total | |
---|---|---|---|
Survived | 308 | 142 | 450 |
Died | 154 | 709 | 863 |
Total | 462 | 851 | 1313 |
\[\Large \log(\textrm{odds of survival}) = \beta_0 + \beta_1 \textrm{Female}\]
\[\Large \log(\textrm{odds of survival}) = \beta_0 + \beta_1 \textrm{Female}\]
How do you interpret this result?
How do you interpret this result?
How do you interpret this result?
logistic_reg() |>
set_engine("glm") |>
fit(Survived ~ Sex, data = Titanic) |>
tidy(exponentiate = TRUE)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.200 0.0919 -17.5 1.70e-68
2 Sexfemale 9.99 0.135 17.1 2.91e-65
[1] 9.99
the odds of surviving for the female passengers was 9.99 times the odds of surviving for the male passengers
What if the explanatory variable is continuous?
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -19.2 5.63 -3.41 0.000644
2 GPA 5.45 1.58 3.45 0.000553
A one unit increase in GPA yields a 5.45 increase in the log odds of acceptance
What if the explanatory variable is continuous?
logistic_reg() |>
set_engine("glm") |>
fit(Acceptance ~ GPA, data = MedGPA) |>
tidy(exponentiate = TRUE)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.56e-9 5.63 -3.41 0.000644
2 GPA 2.34e+2 1.58 3.45 0.000553
A one unit increase in GPA yields a 234-fold increase in the odds of acceptance
How could we get the odds associated with increasing GPA by 0.1?
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -19.2 5.63 -3.41 0.000644
2 GPA 5.45 1.58 3.45 0.000553
A one-tenth unit increase in GPA yields a 1.73-fold increase in the odds of acceptance
How could we get the odds associated with increasing GPA by 0.1?
MedGPA <- MedGPA |>
mutate(GPA_10 = GPA * 10)
logistic_reg() |>
set_engine("glm") |>
fit(Acceptance ~ GPA_10, data = MedGPA) |>
tidy(exponentiate = TRUE)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.00000000456 5.63 -3.41 0.000644
2 GPA_10 1.73 0.158 3.45 0.000553
A one-tenth unit increase in GPA yields a 1.73-fold increase in the odds of acceptance
Application Exercise
Using the Default
data from the ISLR
package, fit a logistic regression model predicting whether a customer defaults
with whether they are a student
and their current balance
.
Here is some code to get you started:
05:00
Dr. Lucy D’Agostino McGowan adapted from slides by Hastie & Tibshirani