Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 7 Notes - Multiple Linear Regression

Today

Simple and multiple linear regression

Last time we spoke about regression models where we have just two parameters, β0\beta_0 and β1\beta_1:

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon

For example, yy is height of an adult and xx is the height of their parent, or yy is the price of chicken and xx is time.

However, often we have multiple independent series that may be contributing to yy. We can then express this through Multiple Linear Regression

yi=β0+β1xi1+β2xi2++βnxip+wiy_i = \beta_0 + \beta_1 x_{i_1} + \beta_2 x_{i_2} + \cdots + \beta_n x_{i_p} + w_i

Here we will still be estimating β0,,βn\beta_0, \dots, \beta_n.

We can do this by rewriting the equation above as:

yi=xiβ,y_i = x_i^\intercal\beta,

for i=1,,ni=1,\dots,n. To add an intercept, we can redefine each vector xix_i so that it has a 1 prepended to it:

x=(1,x1,x2,,xp)x=(1,x_1, x_2, \dots, x_p)

Now, the first entry of β\beta will be the intercept, while the rest are the coefficients for each xjx_j.

We now have the least squares problem:

minβRpi=1n(yixiβ)2\underset{{\beta \in \mathbb{R}^p}}{\min} \displaystyle\sum_{i=1}^n (y_i - x_i^\intercal \beta)^2

We can then set the derivatives equal to zero to obtain:

β^=(i=1nxixiT)1i=1nxiyi\hat{\beta} = \left( \displaystyle\sum_{i=1}^n x_i x_i^T \right)^{-1} \displaystyle\sum_{i=1}^n x_i y_i

Matrix notation

We can also write this equation in a more general matrix form:

yn×1=Xn×pβp×1\underset{n\times 1}{y} = \underset{n \times p}{X} \underset{p \times 1}{\mathbb{\beta}}
y=[y1y2yn],X=[x11x12x1px21x22x2pxn1xn2xnp],β=[β1β2βp]y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}, \quad X = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & & & \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}, \quad \beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix}

Here nn might represent the number of time points and pp is the number of parameters. We can then write our least squares problem as:

minβRpyXβ22\underset{{\beta \in \mathbb{R}^p}}{\min} || y - X\beta||^2_2

Recall that the 2\ell_2 norm 2|| \cdot ||_2 of a vector aRda \in \mathbb{R}^d is defined as a22=i=1dai2||a||^2_2 = \sum_{i=1}^d a_i^2. Then, we can solve to get an estimate of β^\hat{\beta}:

β^p×1=(XX)p×p1Xp×nyn×1\underset{p \times 1}{\hat{\beta}} = \underset{p \times p}{(X^\intercal X)}^{-1}\underset{p \times n}{X^\intercal} \underset{n \times 1}{y}

An important note here is that we assume that the columns of XX (sometimes called our features) are linearly independent. This can only happen for pnp\leq n - where we have no more features than samples. Otherwise, XXX \intercal X will not have an inverse, but we will be able to deal with this using regularization (which we will cover later).

Finger tapping demo

Next we will try collecting some data in class and fitting a multiple linear regression model to the data. We will collect data based on a finger tapping task, which is sometimes used as a clinical diagnostic tool to assess fine motor speed, coordination, and brain function. For these tests, typically the experimenter assesses the number of taps that a person makes over a given period of time. As you might imagine, we can also look at whether the number of taps is fairly steady over time, or whether the person is showing some fatigue (slowing down their taps over time).

We will use the following website to test your finger taps, then analyze the resulting data.

We will also chat about what external factors might influence the data and which might make the most sense to add to our model predicting the data.