In linear regression, what are we minimizing? How can I write this in matrix form?
What is \(\mathbf{X}\)?
How did we get \(\mathbf{(X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\)?
\[ \begin{align} -2\mathbf{X}^T\mathbf{y}+2\mathbf{X}^T\mathbf{X}\hat\beta &= 0\\ 2\mathbf{X}^T\mathbf{X}\hat\beta & = 2\mathbf{X}^T\mathbf{y} \\ \mathbf{X}^T\mathbf{X}\hat\beta & =\mathbf{X}^T\mathbf{y} \\ (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}\hat\beta &=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\\ \underbrace{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}}_{\mathbf{I}}\hat\beta &=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\\ \mathbf{I}\hat\beta &= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\\ \hat\beta & = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \end{align} \]
Let’s try to find an \(\mathbf{X}\) for which it would be impossible to calculate \(\hat\beta\)
y | x |
---|---|
4 | 1 |
3 | 2 |
1 | 5 |
3 | 1 |
5 | 5 |
y | x |
---|---|
4 | 1 |
3 | 2 |
1 | 5 |
3 | 1 |
5 | 5 |
y | x |
---|---|
4 | 1 |
3 | 2 |
1 | 5 |
3 | 1 |
5 | 5 |
Application Exercise
In R, find a design matrix X
where it is not possible to calculate \(\hat\beta\)
05:00
\(\hat\beta = \mathbf{(X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\)
Under what circumstances is this equation not estimable?
\[\mathbf{X} = \begin{bmatrix}1 & 2 & 3 & 1 \\ 1 & 3 & 4& 0 \end{bmatrix}\]
What is \(n\) here? What is \(p\)?
\[\mathbf{X} = \begin{bmatrix}1 & 2 & 3 & 1 \\ 1 & 3 & 4& 0 \end{bmatrix}\]
Is \(\mathbf{(X^TX)^{-1}}\) going to be invertible?
\[\mathbf{X} = \begin{bmatrix} 1 & 3 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
Is \(\mathbf{(X^TX)^{-1}}\) going to be invertible?
\[\mathbf{X} = \begin{bmatrix} 1 & 3 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
What was the problem this time?
What is a sure-fire way to tell whether \(\mathbf{(X^TX)^{-1}}\) will be invertible?
What is a sure-fire way to tell whether \(\mathbf{(X^TX)^{-1}}\) will be invertible?
It looks funky, but it follows a nice pattern!
\[\mathbf{A} = \begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}\] \[|\mathbf{A}| = a(ei-fh)-b(di-fg) +c(dh-eg)\]
It looks funky, but it follows a nice pattern!
\[\mathbf{A} = \begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}\] \[|\mathbf{A}| = a(ei-fh)-b(di-fg) +c(dh-eg)\]
\[|\mathbf{A}| = a \left|\begin{matrix}e&f\\h&i\end{matrix}\right|-b\left|\begin{matrix}d&f\\g&i\end{matrix}\right|+c\left|\begin{matrix}d&e\\g&h\end{matrix}\right|\]
Application Exercise
Calculate the determinant of the following matrices in R using the det()
function:
\[\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 4 & 5 \end{bmatrix}\]
\[\mathbf{B} = \begin{bmatrix} 1 & 2 & 3 \\ 3 & 6 & 9 \\ 2 & 5 & 7\end{bmatrix}\] Are these both invertible?
01:00
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
Is \(\mathbf{(X^TX)^{-1}}\) going to be invertible?
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
Is \(\mathbf{(X^TX)^{-1}}\) going to be invertible?
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
Is \(\mathbf{(X^TX)^{-1}}\) going to be invertible?
\[\begin{bmatrix}\hat\beta_0\\\hat\beta_1\\\hat\beta_2\end{bmatrix} = \begin{bmatrix}1.28\\-114.29\\57.29\end{bmatrix}\]
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
What is the equation for the variance of \(\hat\beta\)?
\[var(\hat\beta) = \hat\sigma^2(\mathbf{X}^T\mathbf{X})^{-1}\]
\[var(\hat\beta) = \begin{bmatrix} \mathbf{0.91835} &-24.489 & 12.132\\-24.48943 & \mathbf{4081.571} & -2038.745 \\12.13247 & -2038.745 &\mathbf{1018.367}\end{bmatrix}\]
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
\[var(\hat\beta) = \begin{bmatrix} \mathbf{0.91835} &-24.489 & 12.132\\-24.48943 & \mathbf{4081.571} & -2038.745 \\12.13247 & -2038.745 &\mathbf{1018.367}\end{bmatrix}\]
What is the variance for \(\hat\beta_0\)?
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
\[var(\hat\beta) = \begin{bmatrix} \color{blue}{\mathbf{0.91835}}&-24.489 & 12.132\\-24.48943 & \mathbf{4081.571} & -2038.745 \\12.13247 & -2038.745 &\mathbf{1018.367}\end{bmatrix}\]
What is the variance for \(\hat\beta_0\)?
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
\[var(\hat\beta) = \begin{bmatrix} \mathbf{0.91835} &-24.489 & 12.132\\-24.48943 & \mathbf{4081.571} & -2038.745 \\12.13247 & -2038.745 &\mathbf{1018.367}\end{bmatrix}\]
What is the variance for \(\hat\beta_1\)?
\[\mathbf{X} = \begin{bmatrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\1 & 5& 10\\ 1 & 2 & 4\end{bmatrix}\]
\[var(\hat\beta) = \begin{bmatrix} \mathbf{0.91835} &-24.489 & 12.132\\-24.48943 & \color{blue}{\mathbf{4081.571}} & -2038.745 \\12.13247 & -2038.745 &\mathbf{1018.367}\end{bmatrix}\]
What is the variance for \(\hat\beta_1\)? 😱
Why?
What happens when \(\lambda=0\)? What happens as \(\lambda\rightarrow\infty\)?
Let’s solve for the \(\hat\beta\) coefficients using Ridge Regression. What are we minimizing?
\[(\mathbf{y}-\mathbf{X}\beta)^T(\mathbf{y}-\mathbf{X}\beta)+\lambda\beta^T\beta\]
Try it!
Find \(\hat\beta\) that minimizes this:
\[(\mathbf{y}-\mathbf{X}\beta)^T(\mathbf{y}-\mathbf{X}\beta)+\lambda\beta^T\beta\]
02:00
\[\hat\beta_{ridge} = (\mathbf{X}^T\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y}\]
How do you think ridge regression fits into the bias-variance tradeoff?
What would this be if \(\lambda\) was 0?
Is this bigger or smaller than \(\sigma^2(\mathbf{X}^T\mathbf{X})^{-1}\)? What is this when \(\lambda = 0\)? As \(\lambda\rightarrow\infty\) does this go up or down?
Why?