import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as smSome simulations from the MA(1) model¶
Below are some simulated data from the MA(1) model. Recall that the MA(1) model is:
where is i.i.d . Note that if , the model is just which is i.i.d so the standard i.i.d normal model is a special case of MA(1). Below is simulated data from i.i.d with and (in other words, this is MA(1) with ). We call this the Gaussian White Noise.
#Simulating from Gaussian white noise:
n = 600
seed = 43
rng = np.random.default_rng(seed)
sig = 1
y_wn = rng.normal(loc = 0, scale = sig, size = n)
plt.figure(figsize = (12, 6))
plt.plot(y_wn)
plt.xlabel('time')
plt.ylabel('Data')
plt.title('Simulated White Noise')
plt.show()

Next we shall simulate observations from the MA(1) model: with non-zero . These data will have some dependence between neighboring data points.
#Simulating MA(1)
y_0 = np.concatenate(([0], y_wn))
theta = 0.8
y_ma = y_0[1:] + theta * y_0[:-1]
plt.figure(figsize = (12, 6))
plt.plot(y_ma)
plt.xlabel('time')
plt.ylabel('Data')
plt.title(f'Simulated MA(1) with theta = {theta: .2f}')
plt.show()

#Here is a scatter plot of y_t vs y_{t-1} for the simulated MA(1) data
plt.scatter(y_ma[:-1], y_ma[1:], s = 10, color = 'black')
plt.xlabel('$y_{t-1}$')
plt.ylabel('$y_{t}$')
plt.title(f'Scatter plot of $y_t$ vs $y_{{t-1}}$ for Simulated MA(1) with theta = {theta: .2f}')
plt.show()
This positive dependence between and will make the simulated dataset look smoother compared to white noise. On the other hand, if , then the simulated dataset will look more rough compared to white noise.
Below we plot the white noise observations, together with the simulated MA(1) observations for and . When , the autocorrelation at lag one is positive making the data look smoother (compared to white noise). When , the autocorrelation at lag one is negative making the data look more wiggly (compared to white noise).
fig, axes = plt.subplots(nrows = 3, ncols = 1, figsize = (12, 6))
axes[0].plot(y_wn)
axes[0].set_title('Simulated White Noise')
theta = 0.8
y_ma_1 = y_0[1:] + theta * y_0[:-1]
axes[1].plot(y_ma_1)
axes[1].set_title(f'Simulated MA(1) with theta = {theta: .2f}')
theta = -0.8
y_ma_2 = y_0[1:] + theta * y_0[:-1] #y_ma_2 is simulated data from MA(1) with negative theta
axes[2].plot(y_ma_2)
axes[2].set_title(f'Simulated MA(1) with theta = {theta: .2f}')
plt.tight_layout()
plt.show()
Sample ACF¶
We can distinguish between these datasets by computing the sample ACF which is defined as follows. Given a time series dataset and a lag h, define
for . Here is simply the mean of . The acf plot graphs on the x-axis and on the y-axis. The quantity is known as the sample autocorrelation of the data at lag (acf stands for “Autocorrelation Function”).
Below we compute the sample ACF.
def sample_acf(y, h_max):
n = len(y)
y_mean = sum(y) / n
denominator = sum((y_t - y_mean) ** 2 for y_t in y)
autocorr = []
for h in range(h_max + 1):
numerator = sum((y[t] - y_mean) * (y[t + h] - y_mean) for t in range(n - h))
r_h = numerator / denominator
autocorr.append(r_h)
return autocorr
h_max = 50
sample_acf_vals = sample_acf(y_wn, h_max)
print(sample_acf_vals[0]) #the sample_acf at lag = 0 always equals 1
plt.plot(range(0, h_max + 1), sample_acf_vals)
plt.show()1.0

Visually, one gets nicer plots for the sample ACF by using the stem plot.
h_max = 50
sample_acf_vals = sample_acf(y_wn, h_max)
plt.figure(figsize = (12, 6))
markerline, stemline, baseline = plt.stem(range(0, h_max + 1), sample_acf_vals)
markerline.set_marker("None")
plt.title('Sample AutoCorrelation')
plt.show()
Note that the sample ACF always takes the value 1 at so the first spike is not giving us any information. We only look at the sample ACF for lages . From the above ACF plot, it appears that sample ACF values at all are negligible.
Next we plot the sample ACF for MA(1) for positive as well as negative .
h_max = 50
sample_acf_vals = sample_acf(y_ma_1, h_max)
plt.figure(figsize = (12, 6))
markerline, stemline, baseline = plt.stem(range(0, h_max + 1), sample_acf_vals)
markerline.set_marker("None")
plt.title('Sample AutoCorrelation')
plt.show()
Observe the peak at lag one above. This is indicative of the MA(1) model.
fig, axes = plt.subplots(nrows = 3, ncols = 1, figsize = (12, 6))
sample_acf_vals_wn = sample_acf(y_wn, h_max)
markerline, stemline, baseline = axes[0].stem(range(0, h_max + 1), sample_acf_vals_wn)
markerline.set_marker("None")
axes[0].set_title('Sample ACF of Simulated White Noise')
sample_acf_vals_ma_1 = sample_acf(y_ma_1, h_max)
markerline, stemline, baseline = axes[1].stem(range(0, h_max + 1), sample_acf_vals_ma_1)
markerline.set_marker("None")
theta = 0.8
axes[1].set_title(f'Simulated ACF of MA(1) with theta = {theta: .2f}')
sample_acf_vals_ma_2 = sample_acf(y_ma_2, h_max)
markerline, stemline, baseline = axes[2].stem(range(0, h_max + 1), sample_acf_vals_ma_2)
markerline.set_marker("None")
theta = -0.8
axes[2].set_title(f'Simulated ACF of MA(1) with theta = {theta: .2f}')
plt.tight_layout()
plt.show()
Clearly, there are spikes at lag one for the second and third datasets. The remaining ACF values are negligible.
There is an inbuilt function in statsmodels for plotting the sample_acf (and another inbuilt function for plotting the sample_acf). Its use is demonstrated below. Let us check that it gives identical answers to our computation.
from statsmodels.tsa.stattools import acf
acf_values_wn = acf(y_wn, nlags=h_max)
print(np.column_stack([sample_acf_vals_wn, acf_values_wn]))[[ 1.00000000e+00 1.00000000e+00]
[-6.18986853e-02 -6.18986853e-02]
[-3.16190776e-02 -3.16190776e-02]
[ 3.37684035e-02 3.37684035e-02]
[-5.08194511e-02 -5.08194511e-02]
[ 3.25331941e-02 3.25331941e-02]
[ 8.74858962e-03 8.74858962e-03]
[-1.16498128e-02 -1.16498128e-02]
[ 3.26472220e-02 3.26472220e-02]
[ 8.33178480e-02 8.33178480e-02]
[-3.27430872e-02 -3.27430872e-02]
[ 1.14743957e-02 1.14743957e-02]
[-6.46748355e-02 -6.46748355e-02]
[-9.26355222e-03 -9.26355222e-03]
[-1.48030902e-03 -1.48030902e-03]
[ 5.98188096e-03 5.98188096e-03]
[-4.20248527e-02 -4.20248527e-02]
[ 5.47736551e-02 5.47736551e-02]
[ 9.57612253e-03 9.57612253e-03]
[-5.55307936e-02 -5.55307936e-02]
[ 7.32655261e-02 7.32655261e-02]
[ 8.66336997e-03 8.66336997e-03]
[ 5.21590566e-02 5.21590566e-02]
[-1.51121264e-02 -1.51121264e-02]
[-2.06604710e-02 -2.06604710e-02]
[-2.69178655e-02 -2.69178655e-02]
[ 3.49261541e-02 3.49261541e-02]
[ 3.86381603e-02 3.86381603e-02]
[-2.07370593e-02 -2.07370593e-02]
[ 3.56664414e-02 3.56664414e-02]
[ 3.73042789e-02 3.73042789e-02]
[-1.58643239e-03 -1.58643239e-03]
[-2.62720921e-02 -2.62720921e-02]
[-7.90709599e-03 -7.90709599e-03]
[ 1.72123791e-02 1.72123791e-02]
[ 1.64248945e-02 1.64248945e-02]
[-4.29481074e-02 -4.29481074e-02]
[-4.08484611e-02 -4.08484611e-02]
[ 2.76715612e-02 2.76715612e-02]
[-1.97488528e-02 -1.97488528e-02]
[ 2.31475734e-03 2.31475734e-03]
[-4.12288416e-02 -4.12288416e-02]
[-1.05493982e-03 -1.05493982e-03]
[-1.60649886e-02 -1.60649886e-02]
[ 8.55000861e-06 8.55000861e-06]
[-2.36297467e-02 -2.36297467e-02]
[ 6.36622276e-02 6.36622276e-02]
[ 3.89111470e-02 3.89111470e-02]
[ 1.70688163e-02 1.70688163e-02]
[ 5.94731208e-02 5.94731208e-02]
[-1.23706214e-02 -1.23706214e-02]]
The following function plots the sample_acf values.
from statsmodels.graphics.tsaplots import plot_acf
fig, axes = plt.subplots(nrows = 3, ncols = 1, figsize = (12, 6))
sample_acf_vals_wn = sample_acf(y_wn, h_max)
plot_acf(y_wn, lags = h_max, ax = axes[0])
axes[0].set_title('Sample ACF of Simulated White Noise')
sample_acf_vals_ma_1 = sample_acf(y_ma_1, h_max)
plot_acf(y_ma_1, lags = h_max, ax = axes[1])
theta = 0.8
axes[1].set_title(f'Simulated ACF of MA(1) with theta = {theta: .2f}')
sample_acf_vals_ma_2 = sample_acf(y_ma_2, h_max)
plot_acf(y_ma_2, lags = h_max, ax = axes[2])
theta = -0.8
axes[2].set_title(f'Simulated ACF of MA(1) with theta = {theta: .2f}')
plt.tight_layout()
plt.show()
There are also real datasets whose sample ACF displays the above MA(1) behavior. Here is one example.
Varve Dataset¶
This dataset is from the Shumway and Stoffer book. It records the thickness of yearly glacial varves—layers of sediment deposited by melting glaciers—collected from a single site in Massachusetts. Each varve represents one year of sedimentation, formed when meltwater streams carried sand and silt into a glacial lake during the spring and summer melt seasons at the end of the last Ice Age. The dataset contains 634 annual measurements, covering a continuous period that began about 11,834 years ago, when the glaciers in New England were retreating. Thicker varves indicate warmer years with more melting and heavier sediment flow, while thinner varves reflect colder years with less melting. Although these deposits stopped forming thousands of years ago when the glacial lake drained, the preserved layers now provide scientists with a high-resolution record of past climate variations, helping reconstruct temperature and melting patterns during the planet’s transition out of the Ice Age.
See Example 2.6 of the Shumway-Stoffer book, 4th edition, for more details on the dataset.
varve_data = pd.read_csv("varve.csv")
yraw = varve_data['x']
plt.plot(yraw)
plt.xlabel('Time')
plt.ylabel('Thickness')
plt.title('Glacial Varve Thickness')
plt.show()
We shall work with the logarithms of the raw data.
#We shall work with the logarithms:
ylog = np.log(yraw)
plt.plot(ylog)
plt.xlabel('Time')
plt.ylabel('log(Thickness)')
plt.title('Logarithm of Glacial Varve Thickness')
plt.show()
Fitting a stationary model such as MA() is not appropriate for the logarithmed data. This is because the data does not seem to have constant mean. To deal with this, we difference the data as follows.
ylogdiff = np.diff(ylog)
plt.plot(ylogdiff)
plt.xlabel('Time')
plt.ylabel('diff(log(Thickness))')
plt.title('Differenced Logarithm of Glacial Varve Thickness')
plt.show()
Now let us look at the sample acf of this dataset.
h_max = 50
fig, ax = plt.subplots()
plot_acf(ylogdiff, lags = h_max, ax = ax)
ax.set_title("Sample ACF of log differenced varve series")
plt.show()
It is clear that the sample ACF shows one negative nonnegligible spike at lag one. The other spikes seem insignificant. This is MA(1) behaviour which suggests that the MA(1) should be appropriate for this data (which is differenced logarithms of the varve data).
Fitting MA models using the ARIMA function¶
Fitting MA models to data is more complicated compared to AR models. We use an inbuilt function in statsmodels called ARIMA for this. ARIMA models (which we will define later) are determined by three orders . denotes the AR order, denotes the MA order, and denotes differencing. To fit the MA(1) model, we take and .
from statsmodels.tsa.arima.model import ARIMA
mamod = ARIMA(ylogdiff, order = (0, 0, 1)).fit()
print(mamod.summary())
print(mamod.params)
mu_hat = mamod.params[0]
theta_hat = mamod.params[1]
sigma_hat = np.sqrt(mamod.params[2])
print(f"mu_hat = {mu_hat: .4f}, theta_hat = {theta_hat: .4f}, sigma_hat = {sigma_hat: .4f}") SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 633
Model: ARIMA(0, 0, 1) Log Likelihood -440.678
Date: Sat, 15 Nov 2025 AIC 887.356
Time: 22:14:29 BIC 900.707
Sample: 0 HQIC 892.541
- 633
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0013 0.004 -0.280 0.779 -0.010 0.008
ma.L1 -0.7710 0.023 -33.056 0.000 -0.817 -0.725
sigma2 0.2353 0.012 18.881 0.000 0.211 0.260
===================================================================================
Ljung-Box (L1) (Q): 9.16 Jarque-Bera (JB): 7.58
Prob(Q): 0.00 Prob(JB): 0.02
Heteroskedasticity (H): 0.95 Skew: -0.22
Prob(H) (two-sided): 0.69 Kurtosis: 3.30
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[-0.00125667 -0.77099236 0.23528045]
mu_hat = -0.0013, theta_hat = -0.7710, sigma_hat = 0.4851
This ARIMA function can be applied on the log varve data also, instead of first differencing and then applying it on the differenced log data. If we are not using differencing data, we have to use d = 1 in the ARIMA function (i.e., = ).
mamod_ylog = ARIMA(ylog, order = (0, 1, 1)).fit()
print(mamod_ylog.summary())
SARIMAX Results
==============================================================================
Dep. Variable: x No. Observations: 634
Model: ARIMA(0, 1, 1) Log Likelihood -440.718
Date: Sat, 15 Nov 2025 AIC 885.435
Time: 22:14:30 BIC 894.336
Sample: 0 HQIC 888.892
- 634
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.7705 0.023 -32.948 0.000 -0.816 -0.725
sigma2 0.2353 0.012 19.124 0.000 0.211 0.259
===================================================================================
Ljung-Box (L1) (Q): 9.10 Jarque-Bera (JB): 7.59
Prob(Q): 0.00 Prob(JB): 0.02
Heteroskedasticity (H): 0.95 Skew: -0.22
Prob(H) (two-sided): 0.69 Kurtosis: 3.30
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
Note now that there is no constant. This ARIMA function, by default, fits (with no i.e., with ) if the differencing order is strictly positive.
MA(2) model¶
The MA(2) model is: . For this dataset, there are autocorrelations between and as well as between and . To understand these autocorrelations, we can look at the theoretical ACF of MA(2).
from statsmodels.tsa.arima_process import arma_acf
ar = [1]
ma = [1, 0.2, 0.2] #this is for MA(2) with theta_1 = 0.2 and theta_2 = 0.2
lag_max = 50
acf_vals = arma_acf(ar, ma, lags = lag_max)
print(acf_vals)
plt.stem(range(len(acf_vals)), acf_vals)
plt.xlabel("Lag")
plt.ylabel("ACF")
plt.title("Theoretical ACF of MA(2)")
plt.show()[1. 0.22222222 0.18518519 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. ]

If we simulate data from this MA(2) model, the resulting data will have a sample ACF that looks like a noisy version of the above theoretical ACF.
theta1 = 0.2
theta2 = 0.2
n = 500
#First generate \epsilon_t
np.random.seed(42)
eps = np.random.normal(0, 1, n+2)
y = np.zeros(n)
for t in range(2, n+2):
y[t-2] = eps[t] + theta1 * eps[t-1] + theta2 * eps[t-2]
# Plot the simulated series
plt.plot(y, color='black')
plt.title("Manually Simulated MA(2) Process (θ₁=0.2, θ₂=0.2)")
plt.xlabel("Time")
plt.ylabel("y_t")
plt.show()
h_max = 50
fig, ax = plt.subplots()
plot_acf(y, lags = h_max, ax = ax)
ax.set_title("Sample ACF of log differenced varve series")
plt.show()
GDP Growth Rate Data¶
Our second dataset is the GDP growth rate data. This can be calculated from the raw GDP data by first taking the logarithms, and then differencing the logarithms. Because, this is a common preprocessing, one can directly obtain the growth rate data from FRED (https://
gdp = pd.read_csv("GDP_12Nov2025.csv")
print(gdp.head())
y = gdp['GDP'].to_numpy()
plt.plot(y, color = 'black')
plt.xlabel('Quarter')
plt.show() observation_date GDP
0 1947-01-01 243.164
1 1947-04-01 245.968
2 1947-07-01 249.585
3 1947-10-01 259.745
4 1948-01-01 265.742

We cannot fit MA models directly to this dataset. So we do the usual preprocessing by first taking logarithms and then differences.
ylog = np.log(y)
ylogdiff = np.diff(ylog)*100
plt.plot(ylogdiff, color = 'black')
plt.xlabel('Quarter')
plt.title('Differenced Logarithm of GDP')
plt.show()
Let us compute the sample acf of this dataset.
h_max = 50
fig, ax = plt.subplots()
plot_acf(ylogdiff, lags = h_max, ax = ax)
ax.set_title("Sample ACF of GDP percent change")
plt.show()
There are two spikes sticking out at lags 1 and 2. This indicates that MA(2) is a reasonable model. We fit MA(2) by using the ARIMA function as follows.
mamod = ARIMA(ylogdiff, order = (0, 0, 2)).fit()
print(mamod.summary()) SARIMAX Results
==============================================================================
Dep. Variable: y No. Observations: 313
Model: ARIMA(0, 0, 2) Log Likelihood -503.717
Date: Sat, 15 Nov 2025 AIC 1015.435
Time: 22:14:36 BIC 1030.419
Sample: 0 HQIC 1021.423
- 313
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.5418 0.118 13.081 0.000 1.311 1.773
ma.L1 0.1893 0.026 7.250 0.000 0.138 0.240
ma.L2 0.2232 0.060 3.749 0.000 0.107 0.340
sigma2 1.4628 0.039 37.788 0.000 1.387 1.539
===================================================================================
Ljung-Box (L1) (Q): 0.18 Jarque-Bera (JB): 6483.73
Prob(Q): 0.67 Prob(JB): 0.00
Heteroskedasticity (H): 1.59 Skew: -0.03
Prob(H) (two-sided): 0.02 Kurtosis: 25.30
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
Fitting MA models is more complicated compared to fitting AR models. This can lead to some numerical issues. For example, in the above code if you fit the MA(2) model to ylogdiff = np.diff(np.log(y)), you might get a warning saying the optimization for computing the MLE is not converging. The warning seems to go away if you work with np.diff(np.log(y))*100 (now you are multiplying by 100).
AR Models¶
Let us now see if AR models can be fit to the GDP growth rate data. The sample PACF gives an indication of a good value of for which AR() is reasonable.
p_max = 40
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_pacf(ylogdiff, lags = p_max)
plt.title("Sample PACF of diff(log(GDP))*100")
plt.show()

We can fit the AR model using AutoReg (as we have been doing previously). We can also use the ARIMA function with order (you will see in the Monday section).
from statsmodels.tsa.ar_model import AutoReg
armod = AutoReg(ylogdiff, lags = 2).fit()
print(armod.summary()) AutoReg Model Results
==============================================================================
Dep. Variable: y No. Observations: 313
Model: AutoReg(2) Log Likelihood -499.916
Method: Conditional MLE S.D. of innovations 1.207
Date: Sat, 15 Nov 2025 AIC 1007.833
Time: 22:14:39 BIC 1022.792
Sample: 2 HQIC 1013.812
313
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 0.9099 0.125 7.293 0.000 0.665 1.154
y.L1 0.2100 0.056 3.780 0.000 0.101 0.319
y.L2 0.2010 0.056 3.616 0.000 0.092 0.310
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 1.7686 +0.0000j 1.7686 0.0000
AR.2 -2.8136 +0.0000j 2.8136 0.5000
-----------------------------------------------------------------------------
AutoReg fit a causal stationary model (because the roots have moduli strictly larger than 1) to the data.
This AR(2) model fit the equation (below is ).
where But when we used the MA(2) model, we got the equation:
with . Since both these models are fit to the same dataset, it makes sense to ask if they are similar models in some sense. This is actually true, and it can be verified in a few ways. First note that the mean of the MA(2) model is clearly 1.5418. For computing the mean of the AR(2) model , take expectation on both sides ( has mean zero and, by stationarity each of , and have the same mean ) to get
So the two means are almost the same.
print(armod.params[0]/(1 - armod.params[1] - armod.params[2]))1.5446358356501164
Next let us compute the AutoCovariance values of the two models. There is an inbuilt function called \texttt{arma_acovf} which computes autocovariance.
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_process import arma_acovf
# ---- AR(2) model ----
ar = np.array([1, -0.21, -0.2010]) # AR polynomial
ma = np.array([1]) # no MA terms
sigma2 = 1.207**2
acov_ar2 = arma_acovf(ar=ar, ma=ma, sigma2=sigma2, nobs=20)
# ---- MA(2) model ----
ar2 = np.array([1])
ma2 = np.array([1, 0.1893, 0.2232])
sigma2_ma = 1.4628
acov_ma2 = arma_acovf(ar=ar2, ma=ma2, sigma2=sigma2_ma, nobs=20)
# ---- Plot both ----
plt.figure(figsize=(8,5))
lags = np.arange(len(acov_ar2))
plt.stem(lags, acov_ar2, markerfmt='bo', label="AR(2)")
plt.stem(lags, acov_ma2, markerfmt='ro', label="MA(2)")
plt.title("Autocovariance Comparison: AR(2) vs MA(2)")
plt.xlabel("Lag")
plt.ylabel("Gamma(h)")
plt.legend()
plt.tight_layout()
plt.show()

The autocovariances of the two models are somewhat close to each other. This suggests that the AR(2) and MA(2) models are not very different from each other.
ARIMA Modeling¶
We apply the Box-Jenkins philosophy to a real dataset from FRED. The Box-Jenkins philosophy deals with nonstationarity by differencing, and then attempts to fit a causal, stationary AR models to the differenced data.
TTLCONS data¶
The following data gives the monthly total construction spending in the United States (after taking logs)
ttlcons = pd.read_csv('TTLCONS_13Nov2025.csv')
print(ttlcons.head())
y = np.log(ttlcons['TTLCONS'])
print(len(y))
plt.plot(y)
plt.title('Logarithm of Total Construction Spending')
plt.xlabel('Time (months)')
plt.show()
observation_date TTLCONS
0 1993-01-01 458080
1 1993-02-01 462967
2 1993-03-01 458399
3 1993-04-01 469425
4 1993-05-01 468998
391

It will clearly be not appropriate to fit a causal stationary model to this dataset. So we first difference and visualize the differenced data.
ydiff = np.diff(y)
plt.plot(ydiff)
plt.title('Differenced Logarithm of Total Construction Spending')
plt.show()
Now we look for an appropriate AR() or MA() model to the data. For this, let us first plot the sample ACF and PACF values.
h_max = 50
fig, axes = plt.subplots(nrows = 2, ncols = 1, figsize = (12, 6))
plot_acf(ydiff, lags = h_max, ax = axes[0])
axes[0].set_title("Sample ACF")
plot_pacf(ydiff, lags = h_max, ax = axes[1])
axes[1].set_title("Sample PACF")
plt.tight_layout()
plt.show()
The sample PACF appears to be negligible after lag 3. This suggests fitting an AR(3) model to the data.
p = 3
armod = AutoReg(ydiff, lags = p).fit()
print(armod.summary()) AutoReg Model Results
==============================================================================
Dep. Variable: y No. Observations: 390
Model: AutoReg(3) Log Likelihood 1198.770
Method: Conditional MLE S.D. of innovations 0.011
Date: Thu, 13 Nov 2025 AIC -2387.541
Time: 16:53:22 BIC -2367.748
Sample: 3 HQIC -2379.692
390
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0019 0.001 3.159 0.002 0.001 0.003
y.L1 0.2221 0.049 4.491 0.000 0.125 0.319
y.L2 0.0644 0.050 1.278 0.201 -0.034 0.163
y.L3 0.2077 0.049 4.209 0.000 0.111 0.304
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 1.3956 -0.0000j 1.3956 -0.0000
AR.2 -0.8529 -1.6500j 1.8574 -0.3259
AR.3 -0.8529 +1.6500j 1.8574 0.3259
-----------------------------------------------------------------------------
From the Modulus values reported near the end of the summary table, it is clear that the fitted AR(3) model is causal and stationary. Let us use the fitted model to predict the future values of the time series. Because the model is fitted to the differenced data, we would need to convert it to the original data before makign predictions. The fitted model is:
which is equivalent to
This is an AR(4) model and we can predict using it in the usual way as follows.
This AR(4) model is for the original data, and will not be stationary as can be checked below.
phi0_hat, phi1_hat, phi2_hat, phi3_hat = armod.params
# Convert differenced AR(3) model to AR(4) model for the original data
a1 = 1 + phi1_hat
a2 = phi2_hat - phi1_hat
a3 = phi3_hat - phi2_hat
a4 = -phi3_hat
# Characteristic polynomial: 1 - a1*z - a2*z^2 - a3*z^3 - a4*z^4 = 0
coeffs = [1, -a1, -a2, -a3, -a4]
roots = np.roots(coeffs)
moduli = np.abs(roots)
# Display results
print("AR(4) coefficients (on original data):")
print(f"a1 = {a1:.4f}, a2 = {a2:.4f}, a3 = {a3:.4f}, a4 = {a4:.4f}\n")
print("Roots of the characteristic polynomial:")
for i, r in enumerate(roots, 1):
print(f"Root {i}: {r:.4f}, Modulus: {moduli[i-1]:.4f}")
# Check for stationarity
if np.all(moduli > 1):
print("The model is causal and stationary.")
else:
print("The model is NOT stationary (some roots have modulus ≤ 1).")AR(4) coefficients (on original data):
a1 = 1.2221, a2 = -0.1577, a3 = 0.1433, a4 = -0.2077
Roots of the characteristic polynomial:
Root 1: 1.0000+0.0000j, Modulus: 1.0000
Root 2: 0.7166+0.0000j, Modulus: 0.7166
Root 3: -0.2472+0.4783j, Modulus: 0.5384
Root 4: -0.2472-0.4783j, Modulus: 0.5384
The model is NOT stationary (some roots have modulus ≤ 1).
Predictions can be made based on the AR(4) model fitted above.
#The following are the coefficients of the above AR(4) model:
phi_vals = np.array([armod.params[0], armod.params[1] + 1, armod.params[2] - armod.params[1], armod.params[3]-armod.params[2], -armod.params[3]])
print(phi_vals)
k = 100
n = len(y)
yhat = np.concatenate([y.astype(float), np.full(k, -9999)]) #extend data by k placeholder values
p = len(phi_vals)-1
for i in range(1, k+1):
ans = phi_vals[0]
for j in range(1, p+1):
ans += phi_vals[j] * yhat[n+i-j-1]
yhat[n+i-1] = ans
predvalues = yhat[n:]
print(predvalues)[ 0.00194823 1.22210763 -0.15767478 0.14326653 -0.20769939]
[14.5770252 14.57834935 14.58052477 14.58327511 14.5862494 14.58948729
14.59291757 14.59645407 14.60008131 14.60377551 14.6075125 14.61128214
14.61507571 14.61888557 14.62270738 14.62653787 14.63037443 14.63421539
14.63805951 14.64190589 14.64575388 14.64960303 14.65345301 14.65730359
14.66115459 14.6650059 14.66885742 14.67270911 14.67656091 14.68041278
14.68426472 14.68811669 14.6919687 14.69582073 14.69967277 14.70352482
14.70737688 14.71122895 14.71508102 14.71893309 14.72278517 14.72663724
14.73048932 14.7343414 14.73819348 14.74204556 14.74589764 14.74974972
14.7536018 14.75745388 14.76130596 14.76515804 14.76901012 14.7728622
14.77671428 14.78056636 14.78441844 14.78827052 14.7921226 14.79597468
14.79982677 14.80367885 14.80753093 14.81138301 14.81523509 14.81908717
14.82293925 14.82679133 14.83064341 14.83449549 14.83834757 14.84219965
14.84605173 14.84990381 14.85375589 14.85760797 14.86146005 14.86531213
14.86916422 14.8730163 14.87686838 14.88072046 14.88457254 14.88842462
14.8922767 14.89612878 14.89998086 14.90383294 14.90768502 14.9115371
14.91538918 14.91924126 14.92309334 14.92694542 14.9307975 14.93464959
14.93850167 14.94235375 14.94620583 14.95005791]
Let us plot these predictions along with the original data (on the log-scale and then on the orginal scale).
tme = range(1, n+1)
tme_future = range(n+1, n+k+1)
plt.plot(tme, y, label = 'Data')
plt.plot(tme_future, predvalues, label = 'Forecast', color = 'black')
plt.axvline(x=n, color='gray', linestyle='--')
plt.title('Log Scale')
plt.legend()
plt.show()
tme = range(1, n+1)
tme_future = range(n+1, n+k+1)
plt.plot(tme, np.exp(y), label = 'Data')
plt.plot(tme_future, np.exp(predvalues), label = 'Forecast', color = 'black')
plt.axvline(x=n, color='gray', linestyle='--')
plt.title('Original Scale')
plt.legend()
plt.show()
AR(3) without intercept¶
To illustrate a technical point, suppose that we don’t want an intercept term while fitting the AR(3) model after differencing. We can do this by setting trend = ‘n’ in AutoReg as follows.
p = 3
armod_nointercept = AutoReg(ydiff, lags = p, trend = 'n').fit()
print(armod_nointercept.summary()) AutoReg Model Results
==============================================================================
Dep. Variable: y No. Observations: 390
Model: AutoReg(3) Log Likelihood 1193.843
Method: Conditional MLE S.D. of innovations 0.011
Date: Thu, 13 Nov 2025 AIC -2379.686
Time: 16:54:50 BIC -2363.852
Sample: 3 HQIC -2373.408
390
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
y.L1 0.2542 0.049 5.185 0.000 0.158 0.350
y.L2 0.0923 0.050 1.837 0.066 -0.006 0.191
y.L3 0.2400 0.049 4.909 0.000 0.144 0.336
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 1.2919 -0.0000j 1.2919 -0.0000
AR.2 -0.8383 -1.5882j 1.7959 -0.3273
AR.3 -0.8383 +1.5882j 1.7959 0.3273
-----------------------------------------------------------------------------
The above AR(3) model is also in the causal stationary regime. It can be converted to the following AR(4) model for the original (undifferenced) :
print(armod_nointercept.params)
phi_vals_nointercept = np.array([0, armod_nointercept.params[0] + 1, armod_nointercept.params[1] - armod_nointercept.params[0], armod_nointercept.params[2]-armod_nointercept.params[1], -armod_nointercept.params[2]])
print(phi_vals_nointercept)[0.25418633 0.09234799 0.2400083 ]
[ 0. 1.25418633 -0.16183835 0.14766032 -0.2400083 ]
Predictions using this AR model (which does not have an intercept) are computed as follows.
yhat = np.concatenate([y.astype(float), np.full(k, -9999)]) #extend data by k placeholder values
p = len(phi_vals_nointercept)-1
for i in range(1, k+1):
ans = phi_vals_nointercept[0]
for j in range(1, p+1):
ans += phi_vals_nointercept[j] * yhat[n+i-j-1]
yhat[n+i-1] = ans
predvalues_nointercept = yhat[n:]
print(predvalues_nointercept)[14.57487898 14.57359749 14.57301636 14.57250515 14.57201397 14.57170244
14.5714552 14.57124569 14.57109484 14.57097781 14.57088384 14.57081295
14.57075816 14.57071513 14.57068212 14.57065661 14.57063675 14.57062142
14.57060957 14.57060037 14.57059326 14.57058776 14.5705835 14.5705802
14.57057765 14.57057567 14.57057414 14.57057296 14.57057204 14.57057133
14.57057078 14.57057036 14.57057003 14.57056977 14.57056958 14.57056942
14.5705693 14.57056921 14.57056914 14.57056909 14.57056905 14.57056901
14.57056899 14.57056897 14.57056895 14.57056894 14.57056893 14.57056892
14.57056892 14.57056891 14.57056891 14.57056891 14.57056891 14.57056891
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 14.5705689 14.5705689
14.5705689 14.5705689 14.5705689 14.5705689 ]
Let us plot the different predictions together.
plt.plot(tme, np.exp(y), label = 'Data')
plt.plot(tme_future, np.exp(predvalues), label = 'Forecast (with intercept)', color = 'black')
plt.plot(tme_future, np.exp(predvalues_nointercept), label = 'Forecast (no intercept)', color = 'red')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
The two predictions are quite different. The absence of the intercept term seems to make a big difference in the predictions in this example.
Model Fitting and Prediction using the function ARIMA¶
Below, we fit the AR(3) model to the differenced data using the ARIMA function with order . Note that the ARIMA function is applied directly to the original data (and not to the differenced series).
ar_arima = ARIMA(y, order = (3, 1, 0)).fit()
print(ar_arima.summary()) SARIMAX Results
==============================================================================
Dep. Variable: TTLCONS No. Observations: 391
Model: ARIMA(3, 1, 0) Log Likelihood 1200.965
Date: Thu, 13 Nov 2025 AIC -2393.930
Time: 16:56:04 BIC -2378.066
Sample: 0 HQIC -2387.642
- 391
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 0.2542 0.046 5.516 0.000 0.164 0.345
ar.L2 0.0923 0.042 2.217 0.027 0.011 0.174
ar.L3 0.2400 0.045 5.293 0.000 0.151 0.329
sigma2 0.0001 7.34e-06 16.845 0.000 0.000 0.000
===================================================================================
Ljung-Box (L1) (Q): 0.99 Jarque-Bera (JB): 47.28
Prob(Q): 0.32 Prob(JB): 0.00
Heteroskedasticity (H): 0.48 Skew: -0.23
Prob(H) (two-sided): 0.00 Kurtosis: 4.64
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
One thing to note is that the ARIMA function does not fit a intercept term by default when the differencing order . Below we obtain forecasts for the data using the fitted ARIMA model. The predictions outputted by the get_prediction() function directly applies to (there is no need to do any conversion of the predictions of the differenced series to predictions of the original series; these conversions are automatically done by the get_prediction() function).
fcast = ar_arima.get_prediction(start = n, end = n+k-1)
fcast_mean = fcast.predicted_mean #this gives the point predictions
plt.plot(tme, np.exp(y), label = 'Data')
plt.plot(tme_future, np.exp(predvalues), label = 'Forecast (AutoReg)', color = 'black')
plt.plot(tme_future, np.exp(predvalues_nointercept), label = 'Forecast (AutoReg with no intercept)', color = 'red')
plt.plot(tme_future, np.exp(fcast_mean), label = 'Forecast (ARIMA)', color = 'green')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
The predictions obtained by the ARIMA model coincide with those obtained by the AutoReg model with no intercept.
Double Differenced Data (ARIMA models with d = 2)¶
The differenced series (ydiff) plotted above might be said to have some trends. One can get rid of these by differencing the series again.
ydiff2 = np.diff(np.diff(y))
plt.figure(figsize = (12, 7))
plt.plot(ydiff2)
plt.title("Double Differenced Data")
plt.show()
Let us look at the sample acf and pacf of the double differenced data.
h_max = 50
fig, axes = plt.subplots(nrows = 2, ncols = 1)
plot_acf(ydiff2, lags = h_max, ax = axes[0])
axes[0].set_title("Sample ACF")
plot_pacf(ydiff2, lags = h_max, ax = axes[1])
axes[1].set_title("Sample PACF")
plt.tight_layout()
plt.show()
From the ACF above, MA(1) is a simple model that can be used for this double-difference data. We shall fit this model. Instead of fitting the model to the double-differenced data, we can directly use ARIMA with order (0, 2, 1) to the original data. This model can then be used to obtain forecasts for the original data directly.
ma_arima = ARIMA(y, order = (0, 2, 1)).fit()
print(ma_arima.summary()) SARIMAX Results
==============================================================================
Dep. Variable: TTLCONS No. Observations: 391
Model: ARIMA(0, 2, 1) Log Likelihood 1199.732
Date: Thu, 13 Nov 2025 AIC -2395.464
Time: 16:57:39 BIC -2387.537
Sample: 0 HQIC -2392.321
- 391
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ma.L1 -0.8730 0.026 -33.690 0.000 -0.924 -0.822
sigma2 0.0001 6.74e-06 18.126 0.000 0.000 0.000
===================================================================================
Ljung-Box (L1) (Q): 1.68 Jarque-Bera (JB): 47.87
Prob(Q): 0.20 Prob(JB): 0.00
Heteroskedasticity (H): 0.55 Skew: -0.20
Prob(H) (two-sided): 0.00 Kurtosis: 4.67
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
The model that is fit here is: . Note that there is no mean term on the right hand side.
fcast_ma = ma_arima.get_prediction(start = n, end = n+k-1)
fcast_mean_ma = fcast_ma.predicted_mean #this gives the point predictions
plt.plot(tme, np.exp(y), label = 'Data')
plt.plot(tme_future, np.exp(predvalues), label = 'Forecast (AR(3) with intercept on differenced data)', color = 'black')
plt.plot(tme_future, np.exp(fcast_mean), label = 'Forecast (AR(3) without intercept on differenced data)', color = 'green')
plt.plot(tme_future, np.exp(fcast_mean_ma), label = 'Forecast (MA(1) on twice differenced data)', color = 'red')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
This example illustrates that these different models can give predictions for future data that can behave quite differently.