Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 6 Notes - Linear Regression Continued

Simple linear regression

Last time we spoke about regression models where we have two parameters, β0\beta_0 and β1\beta_1 that we use to estimate the relationship between our response variable yy and our covariate xx:

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon

For example, yy is height of an adult and xx is the height of their parent, or yy is the price of chicken and xx is time.

Revisiting OLS

We also talked about ordinary least squares (OLS), which is where we want to solve:

Q=i=1n(yiy^i)2=i=1n(yiβ0β1xi)2Q = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2

We can do this by differentiating Q with respect to each parameter and setting it equal to zero:

First, for β0\beta_0:

δQδβ0=i=1n2(yiβ0β1xi)(1)=i=1n2(β0+β1xiyi)=0=2i=1nβ0+2i=1nβ1xi2i=1nyi=2nβ0+2nβ1xˉ2nyˉ\begin{aligned} \frac{\delta Q}{\delta \beta_0} &= \sum_{i=1}^n 2(y_i - \beta_0 - \beta_1 x_i)(-1) \\ &= \sum_{i=1}^n 2(\beta_0 + \beta_1 x_i - y_i)= 0\\ &= 2\sum_{i=1}^n \beta_0 + 2 \sum_{i=1}^n \beta_1 x_i - 2\sum_{i=1}^n y_i\\ &= 2n \beta_0 + 2n \beta_1 \bar{x} - 2n \bar{y} \end{aligned}
β0=yˉβ1xˉ\therefore \beta_0 = \bar{y} - \beta_1 \bar{x}

And for β1\beta_1:

Q=i=1n(yiβ0β1xi)2=i=1n(yiyˉ+β1xˉβ1xi)2δQδβ1=i=1n2(yiyˉ+β1xˉβ1xi)(xˉxi)=2i=1n[((yiyˉ)+β1(xˉxi))(xˉxi)]=2i=1n(yiyˉ)(xixˉ)+2β1i=1n(xixˉ)2\begin{aligned} Q &= \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2\\ &= \sum_{i=1}^n (y_i - \bar{y} + \beta_1 \bar{x} - \beta_1 x_i)^2\\ \frac{\delta Q}{\delta \beta_1} &= \sum_{i=1}^n 2(y_i - \bar{y} + \beta_1 \bar{x} - \beta_1 x_i)(\bar{x}-x_i)\\ &= 2 \sum_{i=1}^n \left[((y_i - \bar{y}) + \beta_1(\bar{x}-x_i) )(\bar{x}-x_i)\right]\\ &= -2\sum_{i=1}^n (y_i-\bar{y})(x_i-\bar{x}) + 2\beta_1\sum_{i=1}^n (x_i - \bar{x})^2 \end{aligned}

Then it follows that:

β1=i=1n(yiyˉ)(xixˉ)i=1n(xixˉ)2=Cov(x,y)Var(x)\begin{aligned} \beta_1 &= \frac{\sum_{i=1}^n (y_i-\bar{y})(x_i-\bar{x})}{\sum_{i=1}^n (x_i-\bar{x})^2} &= \frac{\operatorname{Cov}(x,y)}{\operatorname{Var}(x)} \end{aligned}

Notice now how our solution for the slope, β1\beta_1, is related to the covariance of xx and yy and the variance of xx!

Another note here is that to calculate xˉ\bar{x} and yˉ\bar{y}, we often must use the sample means (e.g. across all time points), because we do not have multiple samples for each time point.

Maximum likelihood estimation (MLE)

We can also calculate these terms using MLE, In this case, we normally assume that the errors are normally distributed i.i.d, e.g.:

ϵ1,,ϵni.i.dN(0,σ2)\epsilon_1, \dots, \epsilon_n \overset{i.i.d}\sim N(0,\sigma^2)

With this assumption, we can rewrite our linear regression model as:

yiindependentN(β0+β1xi,σ2)y_i \overset{\text{independent}}\sim N(\beta_0 + \beta_1 x_i, \sigma^2)

The likelihood is expressed as:

fy1,,ynβ0,β1,σ(y1,,yn)=i=1n12πσexp((yiβ0β1x1)22σ2)f_{y_1,\dots, y_n | \beta_0,\beta_1,\sigma}(y_1,\dots,y_n) = \displaystyle\prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma} \exp \left({-\frac{(y_i-\beta_0-\beta_1 x_1)^2}{2\sigma^2} }\right)

To maximize the likelihood, we can maximize the log-likelihood, which is easier to deal with:

log-likelihood=i=1nlog12πσi=1n(yiβ0β1x1)22σ2=n2log(2π)nlogσ12σ2i=1n(yiβ0β1x1)2\begin{aligned} \text{log-likelihood} &= \displaystyle\sum_{i=1}^n \log \frac{1}{\sqrt{2\pi}\sigma} -\displaystyle\sum_{i=1}^n\frac{(y_i-\beta_0-\beta_1 x_1)^2}{2\sigma^2}\\ &= -\frac{n}{2}\log(2\pi) - n\log\sigma - \frac{1}{2\sigma^2}\displaystyle\sum_{i=1}^n (y_i-\beta_0-\beta_1 x_1)^2 \end{aligned}

Since we know the sum of squared error Q=i=1n(yiβ0β1xi)2=S(β0,β1)Q = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 = S(\beta_0,\beta_1), we can take the derivative with respect to unknown parameters β0,β1\beta_0, \beta_1, and σ\sigma:

δδβ0log-likelihood=12σ2δδβ0S(β0,β1)=0    δδβ0S(β0,β1)=0δδβ1log-likelihood=12σ2δδβ1S(β0,β1)=0    δδβ1S(β0,β1)=0δδσlog-likelihood=nσ+S(β0,β1)σ3=0    σ=S(β0,β1)n\begin{aligned} \frac{\delta}{\delta \beta_0} \text{log-likelihood} &= -\frac{1}{2\sigma^2}\frac{\delta}{\delta\beta_0}S(\beta_0, \beta_1) = 0 \implies \frac{\delta }{\delta \beta_0} S(\beta_0, \beta_1)= 0 \\ \frac{\delta}{\delta \beta_1} \text{log-likelihood} &= -\frac{1}{2\sigma^2}\frac{\delta}{\delta\beta_1}S(\beta_0, \beta_1) = 0 \implies \frac{\delta }{\delta \beta_1} S(\beta_0, \beta_1)= 0\\ \frac{\delta}{\delta \sigma} \text{log-likelihood} &= -\frac{n}{\sigma}+\frac{S(\beta_0, \beta_1)}{\sigma^3}=0 \implies \sigma = \sqrt{\frac{S(\beta_0, \beta_1)}{n}} \end{aligned}

These first two equations are the same as what we derived before. However, we now have a new term in which we get the MLE estimate for σ\sigma from the third equation (with β0\beta_0 and β1\beta_1 replaced by β^0\hat{\beta}_0 and β^1\hat{\beta}_1, respectively):

σ^MLE=S(β^0,β^1)n\hat{\sigma}_{\text{MLE}} = \sqrt{\frac{S(\hat{\beta}_0,\hat{\beta}_1)}{n}}

From this, we can get σ^MLE2=1nS(β^0,β^1)\hat{\sigma}^2_{\text{MLE}}=\frac{1}{n}S(\hat{\beta}_0,\hat{\beta}_1).

But as it turns out, this is a biased estimator (unlike those for β0\beta_0 and β1\beta_1, which are not! We often instead use a corrected, unbiased estimator of σ^2\hat{\sigma}^2 where we divide by npn-p, where pp is the number of parameters estimated (here, p=2p=2, one for β0\beta_0 and one for β1\beta_1). That is:

σ^unbiased2=S(β^0,β^1)n2\hat{\sigma}^2_{\text{unbiased}}=\frac{S(\hat{\beta}_0,\hat{\beta}_1)}{n-2}

Let’s look at a simulation to show how this happens!

Next time, we will look at how this extends to multiple linear regression. Later this bias will also become important when we think about regularization.

A note on assumptions