Reading: Chapter 2 – Shumway and Stoffer
Review¶
Week 3 survey:

Or go to pollev.com/stat153
Notes on covariance¶
To add to your materials from last week, I’ve created some visualizations.
Auto-correlation of fMRI data¶
This shows an animation of calculating the ACF for the fmri1 dataset from astsa from lags .
Cross-correlation of fMRI data¶
This shows an animation of calculating the CCF for the fmri1 dataset from astsa from lags .
Autocovariance on random walk after differencing¶
Today¶
Simple linear regression¶
Let’s say we want to learn the relationship between two variables and - the goal is to predict given . For example, we might want to predict the height of an adult man () given the height of his father (). is called the response variable or dependent variable, i the covariate or independent variable. We will start with the more general scenario and then extend this concept specifically to time series.
In simple linear regression, we are predicting given one covariate . In the case of multiple , say , this is called multiple regression.
For the simple case, we are predicting from one covariate . For example:
and are parameters that we will estimate from the data, where is the intercept, which corresponds to the value of when , and represents the change in when changes by one unit.
We could observe, for example, pairs of data where each of these samples is a pair of fathers and sons. We then might write:
.
We can then estimate our and from the data, then use these to predict the value of the response variable (height of a new adult man) for a new covariate . We will talk about how to do this using the Python library statsmodels, but we will also talk about how to derive the solution mathematically.
For time series regression, we often have some observed data that are observations for a single variable . In this case, we may try two main ways of predicting :
Use time as a covariate (many of the examples we’ve seen already do this -- for example, the DJIA data, fMRI data, population data over time)
Use lagged version of as the covariate. For this case, we might take . In this case, we are predicting a current value based on a prior time point. This is called Lagged Regression or AutoRegression.

How do we estimate and ?¶
There are a couple of ways to estimate and . Standard libraries such as statsmodels use the least squares method. In this method, we minimize the error sum of squares:
Ordinary Least Squares (OLS)¶
In ordinary least squares, we can solve for:
, which will be the best fitting line at the population level.
Solving this gives:
and
where
and
Let’s look at what this looks like in statsmodels with an example.