What have we used so far to deal with non-linear relationships?
\[y_i = \beta_0 + \beta_1x_i + \beta_2x_i^2+\beta_3x_i^3 \dots + \beta_dx_i^d+\epsilon_i\]
This is data from the Columbia World Fertility Survey (1975-76) to examine household compositions
\[y_i = \beta_0 + \beta_1x_i + \beta_2x_i^2+\beta_3x_i^3 \dots + \beta_dx_i^d+\epsilon_i\]
\[\hat{f}(b) -\hat{f}(a) =\hat\beta_1(b-a) + \hat\beta_2(b^2-a^2)+\hat\beta_3(b^3-a^3)+\hat\beta_4(b^4-a^4)\]
How do you pick \(a\) and \(b\)?
Application Exercise
\[pop = \beta_0 + \beta_1age + \beta_2age^2 + \beta_3age^3 +\beta_4age^4+ \epsilon\]
Using the information below, write out the equation to predicted change in population from a change in age from the 25th percentile (24.5) to a 75th percentile (73.5).
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1672.0854 | 64.5606 | 25.8995 | 0.0000 |
age | -10.6429 | 9.2268 | -1.1535 | 0.2516 |
I(age^2) | -1.1427 | 0.3857 | -2.9627 | 0.0039 |
I(age^3) | 0.0216 | 0.0059 | 3.6498 | 0.0004 |
I(age^4) | -0.0001 | 0.0000 | -3.6540 | 0.0004 |
03:00
\[y_i = \beta_0 + \beta_1x_i + \beta_2x_i^2+\beta_3x_i^3 \dots + \beta_dx_i^d+\epsilon_i\]
Why?
Polynomials have notoriously bad tail behavior (so they can be bad for extrapolation)
What does this mean?
Another way to create a transformation is to cut the variable into distinct regions
\[C_1(X) = I(X < 35), C_2(X) = I(35\leq X<65), C_3(X) = I(X \geq 65)\]
\[C_1(X) = I(X < 35), C_2(X) = I(35\leq X<65), C_3(X) = I(X \geq 65)\]
What is the predicted value when \(age = 25\)?
\[C_1(X) = I(X < 15), C_2(X) = I(15\leq X<65), C_3(X) = I(X \geq 65)\]
What is the predicted value when \(age = 25\)?
Instead of a single polynomial in \(X\) over it’s whole domain, we can use different polynomials in regions defined by knots
\[y_i = \begin{cases}\beta_{01}+\beta_{11}x_i + \beta_{21}x^2_i+\beta_{31}x^3_i+\epsilon_i& \textrm{if } x_i < c\\ \beta_{02}+\beta_{12}x_i + \beta_{22}x_i^2 + \beta_{32}x_{i}^3+\epsilon_i&\textrm{if }x_i\geq c\end{cases}\]
What could go wrong here?
A linear spline with knots at \(\xi_k\), \(k = 1,\dots, K\) is a piecewise linear polynomial continuous at each knot
\[y_i = \beta_0 + \beta_1b_1(x_i)+\beta_2b_2(x_i)+\dots+\beta_{K+1}b_{K+1}(x_i)+\epsilon_i\]
Application Exercise
Let’s create data set to fit a linear spline with 2 knots: 35 and 65.
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
04:00
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
\(b_1(x)\) | \(b_2(x)\) | \(b_3(x)\) |
---|---|---|
4 | 0 | 0 |
15 | 0 | 0 |
25 | 0 | 0 |
37 | 2 | 0 |
49 | 14 | 0 |
66 | 31 | 1 |
70 | 35 | 5 |
80 | 45 | 15 |
Application Exercise
Below is a linear regression model fit to include the 3 bases you just created with 2 knots: 35 and 65. Use the information here to draw the relationship between \(x\) and \(y\).
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.3 | 0.2 | -1.3 | 0.3 |
b1 | 2.0 | 0.0 | 231.3 | 0.0 |
b2 | -2.0 | 0.0 | -130.0 | 0.0 |
b3 | -3.0 | 0.0 | -116.5 | 0.0 |
07:00
\(b_1(x)\) | \(b_2(x)\) | \(b_3(x)\) |
---|---|---|
4 | 0 | 0 |
15 | 0 | 0 |
25 | 0 | 0 |
37 | 2 | 0 |
49 | 14 | 0 |
66 | 31 | 1 |
70 | 35 | 5 |
80 | 45 | 15 |
A cubic splines with knots at \(\xi_i, k = 1, \dots, K\) is a piecewise cubic polynomial with continuous derivatives up to order 2 at each knot.
Again we can represent this model with truncated power functions
\[y_i = \beta_0 + \beta_1b_1(x_i)+\beta_2b_2(x_i)+\dots+\beta_{K+3}b_{K+3}(x_i) + \epsilon_i\]
\[\begin{align}b_1(x_i)&=x_i\\b_2(x_i)&=x_i^2\\b_3(x_i)&=x_i^3\\b_{k+3}(x_i)&=(x_i-\xi_k)^3_+, k = 1,\dots,K\end{align}\]
where
\[(x_i-\xi_k)^{3}_+=\begin{cases}(x_i-\xi_k)^3&\textrm{if }x_i>\xi_k\\0&\textrm{otherwise}\end{cases}\]
Application Exercise
Let’s create data set to fit a cubic spline with 2 knots: 35 and 65.
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
05:00
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
b1 | b2 | b3 | b4 | b5 |
---|---|---|---|---|
4 | 16 | 64 | 0 | 0 |
15 | 225 | 3375 | 0 | 0 |
25 | 625 | 15625 | 0 | 0 |
37 | 1369 | 50653 | 8 | 0 |
49 | 2401 | 117649 | 2744 | 0 |
66 | 4356 | 287496 | 29791 | 1 |
70 | 4900 | 343000 | 42875 | 125 |
80 | 6400 | 512000 | 91125 | 3375 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.172 | 8.282 | 0.141 | 0.900 |
b1 | 1.520 | 1.565 | 0.971 | 0.434 |
b2 | 0.040 | 0.075 | 0.528 | 0.650 |
b3 | -0.001 | 0.001 | -0.855 | 0.483 |
b4 | 0.001 | 0.002 | 0.635 | 0.590 |
b5 | -0.006 | 0.007 | -0.860 | 0.480 |
A natural cubic spline extrapolates linearly beyond the boundary knots
This adds 4 extra constraints and allows us to put more internal knots for the same degrees of freedom as a regular cubic spline
Here is a comparison of a degree-14 polynomial and natural cubic spline (both have 15 degrees of freedom)
Dr. Lucy D’Agostino McGowan adapted from slides by Hastie & Tibshirani