Appex 11 – Penalized Regression in R

STA 363 - Spring 2023

Set up

Click File > New Project

Click the third option.

Click the first option

Paste this link in the top box (Repository url):

https://github.com/sta-363-s23/11-appex.git

Examine the Hitters dataset by running ?Hitters in the Console
We want to predict a major league player’s Salary from all of the other 19 variables in this dataset. Create a visualization of Salary.
Create a recipe to estimate this model.
Add a preprocessing step to your recipe, scaling each of the predictors

Add a preprocessing step to your recipe to convert nominal variables into indicators
Add a step to your recipe to remove missing values for the outcome
Add a step to your recipe to impute missing values for the predictors using the average for the remaining values NOTE THIS IS NOT THE BEST WAY TO DO THIS WE WILL LEARN BETTER TECHNIQUES!

Set a seed set.seed(1)
Create a cross validation object for the Hitters dataset
Using the recipe from the previous exercise, fit the model using Ridge regression with a penalty \(\lambda\) = 300
What is the estimate of the test RMSE for this model?

Using the Hitters cross validation object and recipe created in the previous exercise, use tune_grid to pick the optimal penalty and mixture values.
Update the code below to create a grid that includes penalties from 0 to 50 by 1 and mixtures from 0 to 1 by 0.5.
Use this grid in the tune_grid function. Then use collect_metrics and filter to only include the RSME estimates.
Create a figure to examine the estimated test RMSE for the grid of penalty and mixture values – which should you choose?

Using the final model specification, extract the coefficients from the model by creating a workflow
Filter out any coefficients exactly equal to 0