https://github.com/sta-363-s23/lab-03-logistic.git
Lab 03 - Logistic Regression
Due: Tuesday 2023-02-21
Getting started
- Find the lab instructions under the course syllabus on our website bit.ly/sta-363-s23
- Go to our RStudio Pro workspace and create a new project using my template.
For this assignment, go to RStudio Pro and click:
Step 1. File > New Project
Step 2. “Version Control”
Step 3. Git
Step 4. Copy the following into the “Repository URL”:
Packages
In this lab we will work with four packages: ISLR
which is a package that accompanies your textbook, tidyverse
which is a collection of packages for doing data analysis in a “tidy” way, tidymodels
a collection of packages for statistical modeling, and GGally
, a package to help us visualize the data.
library(tidyverse)
library(tidymodels)
library(ISLR)
library(GGally)
Data
For this lab, we are using the Smarket
data from the ISLR
package.
Exercises
Conceptual questions
- Using algebra, rearrange equation (1) below to get equation (2). Show all steps. Be sure to use Latex, remember you can insert a Latex equation by wrapping it in
$$
.
Equation (1)
\[p(X)=\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}\]
Equation (2)
\[\textrm{log}\left(\frac{p(X)}{1-p(X)}\right)=\beta_0+\beta_1X\]
- Suppose that an individual has a 23% chance of defaulting on their credit card payment. What are the odds that they will default?
Logistic Regression
For this lab we are using the
Smarket
data. Examine this data set - how many observations are there? How many columns? What are the variables?Let’s look at the correlation between all of the variables. To do this, if you haven’t done so already, we need to install the
GGally
package. Run the following code in your Console one time.
install.packages("GGally")
Then add the code below to your .qmd file. What can you learn from this visualization? Which pair of variables have the highest correlation?
ggpairs(Smarket,
lower = list(combo = wrap(ggally_facethist, binwidth = 0.5)),
progress = FALSE)
Inference Fit a logistic regression model to predict
Direction
usingLag1
,Lag2
,Lag3
,Lag4
,Lag5
, andVolume
. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 6 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.Inference Exponentiate the results from Exercise 5. Interpret the odds ratio for the same predictor you selected in Exercise 5.
Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 5. What is the test Accuracy for this model? Interpret this result.
Inference Fit a logistic regression model to predict
Direction
using onlyLag1
andLag2
. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 2 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.Inference Exponentiate the results from Exercise 8. Interpret the odds ratio for the same predictor you selected in Exercise 8.
Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 8. What is the test Accuracy for this model? Interpret this result.
If you had to choose between the model fit in Exercise 5 and the one fit in Exercise 8, which would you choose? Why?