Reading: Ch 3 - Shumway and Stoffer
Autoregressive Moving Average Models¶
Before the break, we started talking about AR models:
AR(p) process:
We showed that AR models are stationary iff the roots of their characteristic polynomial equation lie outside the unit circle.
If this is satisfied, we can rewrite our AR(p) model as:
for some and coefficients satisfying . This is sometimes also referred to as a causal process (though not in the same way that the word causal is used in a lot of statistics -- here it just means that the model can be expressed purely as a function of current and past information).
As an alternative, we will now introduce the idea of the moving average model of order (MA()), which is defined as:
MA(): where and are parameters. We can also write this equivalently as:
using the backshift operator, which we also defined last time as
An aside - why does the backshift operator make sense?¶
The backshift operator allows us to express a shift in time as a linear operator, which allows us to form expressions like the characteristic polynomial to tell us whether a signal is stationary.
Linear operators¶
A function is a linear operator if it satisfies two properties for all inputs , , and scalars :
Additivity: f(x+y) = f(x) + f(y)
Scalar homogeneity: f(ax) = a f(x)
Sometimes these are combined into one: .
The backshift operator is a linear operator because shifting a scaled or summed series is the same as scaling or summing the shifted series. This is what allows us to use tricks like the characteristic polynomial as a diagnostic tool for calculating the stationarity of an AR model and (as we will discuss later) the invertibility of the MA part of the model.
Back to the MA model¶
Let’s consider an example MA(1) process:
The MA model, unlike the AR model, is stationary for any values of . We have , and the autocovariance is:
and the ACF is:
Note that is correlated with , but not . On the other hand, in an AR(1) model, the correlation between and is never 0. An example below is shown for an MA(1) model for and . The time series is smoother for than , but we don’t have the same correlation structure as an AR model.

Invertibility of the MA model¶
Another important characteristic of the MA model is its invertibility. Invertibility means can be inverted as a power series, giving .
Let’s first look at an important property - for an MA(1) model, is the same for and . For example, you could try calculating this for and . Similarly, if we have and , this will yield the same autocovariance as and . This means that we cannot distinguish between the two MA(1) processes:
and
they are stochastically equal because of normality. Because we only observe the time series or , we do not observe the noise independently and cannot distinguish between these models, so we have to choose one. We want to choose the version where , because this is the version we can invert into an infinite AR representation:
This is the MA equivalent of requiring AR models to be causal/stationary by having their roots outside the unit circle. So in this particular case, we’d choose and .
ARMA models¶
We can now use these two concepts (the AR and MA models) to extend our models to Autoregressive Moving Average (ARMA) models, which contain elements of both. The autocovariance of an AR model generally decays away from , while the autocovariance of an MA process becomes exactly 0 after a certain lag.
Definition: A time series is ARMA(p,q) if it is stationary and:
The parameters and are called the autoregressive and moving average orders. We also often set and write:
We can also write the ARMA(p,q) more precisely as
Since ARMA is just an extension of AR and MA models, ARMA(p,0) = AR(p) and ARMA(0,q) = MA(q).
When do we use ARMA models?¶
To forecast stationary time series data by combining previous values (AR) and previous error terms (MA)
Ex: sales, temperature, financial time series
When not to use ARMA models?¶
When data are nonstationary / have trends
When the data have strong seasonal patterns
If the data have complex, nonlinear relationships