Estimation and Prediction in AR models

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

Dataset One: Personal Consumption Expenditures from FRED¶

The following data is on personal consumption expenditures from FRED (https://fred.stlouisfed.org/series/PCEC). It measures the total amount of money that U.S. households spend on goods and services. This is a time-series record of that spending reported quarterly in billions of dollars (with seasonal adjustment). This measure is compiled by the Bureau of Economic Analysis (BEA) and is a key indicator used to track consumer demand and overall economic activity in the United States.

pcec = pd.read_csv("PCEC_30Oct2025.csv")
print(pcec.head())
y = pcec['PCEC'].to_numpy()
plt.plot(y, color = 'black')
plt.xlabel('Quarter')
plt.ylabel('Billions of Dollars')
plt.title('Personal Consumption Expenditure')
plt.show()

  observation_date     PCEC
0       1947-01-01  156.161
1       1947-04-01  160.031
2       1947-07-01  163.543
3       1947-10-01  167.672
4       1948-01-01  170.372

We will fit AR(p) models to this dataset, and use the fitted models to predict future values. Let us leave the last 12 observations (corresponding to three years) as test values, and the remaining data form the training data. We will fit AR(p) models to the training data, predict the test values and then compare the predictions with the actual test values.

n = len(y)
tme = range(1, n+1)
#n_test = 50
n_test = 19 
n_train = n - n_test
y_train = y[:n_train]
tme_train = tme[:n_train]
y_test = y[n_train:]
tme_test = tme[n_train:]

Below, we fit AR(1) to the training data. There are two ways of doing this. Either manually create $y$ and $X$ and use OLS, or use the AutoReg function from statsmodels.

p = 1
yreg = y_train[p:] #these are the response values in the autoregression
Xmat = np.ones((n_train-p, 1)) #this will be the design matrix (X) in the autoregression
for j in range(1, p+1):
    col = y_train[p-j : n_train-j].reshape(-1, 1)
    Xmat = np.column_stack([Xmat, col])

armod = sm.OLS(yreg, Xmat).fit()
print(armod.params)
print(armod.summary())

[16.73313785  1.00766445]
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 4.192e+05
Date:                Fri, 31 Oct 2025   Prob (F-statistic):               0.00
Time:                        01:46:54   Log-Likelihood:                -1813.7
No. Observations:                 294   AIC:                             3631.
Df Residuals:                     292   BIC:                             3639.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         16.7331      9.377      1.784      0.075      -1.722      35.189
x1             1.0077      0.002    647.467      0.000       1.005       1.011
==============================================================================
Omnibus:                      302.301   Durbin-Watson:                   2.181
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           160905.939
Skew:                          -3.206   Prob(JB):                         0.00
Kurtosis:                     117.429   Cond. No.                     8.35e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.35e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

from statsmodels.tsa.ar_model import AutoReg
armod_sm = AutoReg(y_train, lags = p).fit()
print(armod_sm.summary())

                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                      y   No. Observations:                  295
Model:                     AutoReg(1)   Log Likelihood               -1813.666
Method:               Conditional MLE   S.D. of innovations            115.583
Date:                Fri, 31 Oct 2025   AIC                           3633.332
Time:                        01:46:55   BIC                           3644.382
Sample:                             1   HQIC                          3637.757
                                  295                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         16.7331      9.345      1.791      0.073      -1.583      35.050
y.L1           1.0077      0.002    649.681      0.000       1.005       1.011
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            0.9924           +0.0000j            0.9924            0.0000
-----------------------------------------------------------------------------

Here are some observations comparing the above two outputs:

The parameter estimates are exactly the same.
The regression summary gives t-scores corresponding to each parameter estimate while the AutoReg summary only gives z-scores
The standard errors are slightly different.

The standard errors given by the regression summary correspond to the square roots of the diagonal entries of:

\hat{\sigma}^2 (X^T X)^{-1} ~~~ \text{ where } \hat{\sigma}^2 = \sqrt{\frac{\|Y - X \hat{\beta}\|^2}{n-2p-1}}

while the standard errors reported by AutoReg summary correspond to the square roots of the diagonal entries of:

\hat{\sigma}_{\text{MLE}}^2 (X^T X)^{-1} ~~~ \text{ where } \hat{\sigma}_{\text{MLE}}^2 = \sqrt{\frac{\|Y - X \hat{\beta}\|^2}{n-p}}

Here is how the fitted model can be used to predict the test observations.

k = n_test
fcast = armod_sm.get_prediction(start = n_train, end = n_train+k-1)
fcast_mean = fcast.predicted_mean #this gives point predictions for the future values of the time series

The method that the above function uses to calculate the predictions was detailed in class; and is captured in the code below.

#Predictions
yhat = np.concatenate([y_train.astype(float), np.full(k, -9999)]) #extend data by k placeholder values
phi_vals = armod_sm.params
for i in range(1, k+1):
    ans = phi_vals[0]
    for j in range(1, p+1):
        ans += phi_vals[j] * yhat[n_train+i-j-1]
    yhat[n_train+i-1] = ans
predvalues = yhat[n_train:]

#Check that both predictions are identical:
print(np.column_stack([predvalues, fcast_mean]))
preds_nolog = fcast_mean

[[14606.9142231  14606.9142231 ]
 [14735.60125891 14735.60125891]
 [14865.2746095  14865.2746095 ]
 [14995.94183441 14995.94183441]
 [15127.61055115 15127.61055115]
 [15260.28843559 15260.28843559]
 [15393.98322245 15393.98322245]
 [15528.70270571 15528.70270571]
 [15664.45473911 15664.45473911]
 [15801.24723657 15801.24723657]
 [15939.08817267 15939.08817267]
 [16077.98558312 16077.98558312]
 [16217.94756519 16217.94756519]
 [16358.98227825 16358.98227825]
 [16501.09794419 16501.09794419]
 [16644.30284789 16644.30284789]
 [16788.60533778 16788.60533778]
 [16934.01382624 16934.01382624]
 [17080.53679013 17080.53679013]]

Below we plot the predictions along with the actual test observations.

plt.plot(tme_train, y_train, label = 'Training Data')
plt.plot(tme_test, fcast_mean, label = 'Forecast', color = 'black')
plt.plot(tme_test, y_test, color = 'red',  label = 'Actual future values')
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.legend()
plt.show()

Next we discuss prediction uncertainty quantification. The prediction standard errors, and the associated uncertainty intervals, are calculated as follows.

fcast_se = fcast.se_mean
alpha = 0.05
from scipy import stats
z_alpha_half = stats.norm.ppf(1 - alpha/2)
predlower = predvalues - z_alpha_half * fcast_se
predupper = predvalues + z_alpha_half * fcast_se
fcast_int = fcast.conf_int()
print(np.column_stack([predlower, predupper, fcast_int]))

[[14380.37479095 14833.45365525 14380.37479095 14833.45365525]
 [14413.99602915 15057.20648867 14413.99602915 15057.20648867]
 [14469.87407837 15260.67514062 14469.87407837 15260.67514062]
 [14537.61070798 15454.27296085 14537.61070798 15454.27296085]
 [14613.19828795 15642.02281435 14613.19828795 15642.02281435]
 [14694.59243385 15825.98443733 14694.59243385 15825.98443733]
 [14780.58587329 16007.3805716  14780.58587329 16007.3805716 ]
 [14870.39721667 16187.00819474 14870.39721667 16187.00819474]
 [14963.48775649 16365.42172173 14963.48775649 16365.42172173]
 [15059.4685554  16543.02591775 15059.4685554  16543.02591775]
 [15158.04876856 16720.12757678 15158.04876856 16720.12757678]
 [15259.00483287 16896.96633336 15259.00483287 16896.96633336]
 [15362.16107099 17073.7340594  15362.16107099 17073.7340594 ]
 [15467.37693502 17250.58762149 15467.37693502 17250.58762149]
 [15574.53830884 17427.65757953 15574.53830884 17427.65757953]
 [15683.5513957  17605.05430009 15683.5513957  17605.05430009]
 [15794.33831069 17782.87236487 15794.33831069 17782.87236487]
 [15906.83383123 17961.19382125 15906.83383123 17961.19382125]
 [16020.98295454 18140.09062572 16020.98295454 18140.09062572]]

Below we plot the predictions along with uncertainty intervals.

#Plotting predictions along with uncertainty:
plt.plot(tme_train, y_train, label = 'Original Data')
plt.plot(tme_test, predvalues, label = 'Forecast', color = 'black')
plt.plot(tme_test, predlower, color = 'green', label = 'Prediction lower bound')
plt.plot(tme_test, predupper, color = 'green', label = 'Prediction upper bound')
plt.plot(tme_test, y_test, color = 'red', label = 'Actual Future Values')
plt.legend()
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.show()

In this example, the predicted values are somewhat below the actual test values.

Instead of applying AR models, directly to the raw data, it is common practice to apply them to the logarithms.

ylog_train = np.log(y_train)
ylog_test = np.log(y_test)

armod_sm = AutoReg(ylog_train, lags = p).fit()
print(armod_sm.summary())

                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                      y   No. Observations:                  295
Model:                     AutoReg(1)   Log Likelihood                 877.497
Method:               Conditional MLE   S.D. of innovations              0.012
Date:                Fri, 31 Oct 2025   AIC                          -1748.994
Time:                        01:47:02   BIC                          -1737.943
Sample:                             1   HQIC                         -1744.569
                                  295                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0273      0.004      7.271      0.000       0.020       0.035
y.L1           0.9984      0.000   2033.940      0.000       0.997       0.999
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.0016           +0.0000j            1.0016            0.0000
-----------------------------------------------------------------------------

Predictions are obtained as follows.

fcast = armod_sm.get_prediction(start = n_train, end = n_train+k-1)
fcast_mean = fcast.predicted_mean #this gives the point predictions

Prediction uncertainty intervals are obtained as follows.

fcast_se = fcast.se_mean
alpha = 0.05
z_alpha_half = stats.norm.ppf(1 - alpha/2)
predlower = fcast_mean - z_alpha_half * fcast_se
predupper = fcast_mean + z_alpha_half * fcast_se
fcast_int = fcast.conf_int()
print(np.column_stack([predlower, predupper, fcast_int]))

[[9.56863084 9.61658311 9.56863084 9.61658311]
 [9.57084537 9.63860634 9.57084537 9.63860634]
 [9.57536344 9.65828756 9.57536344 9.65828756]
 [9.58106762 9.67674429 9.58106762 9.67674429]
 [9.5875247  9.69440978 9.5875247  9.69440978]
 [9.59451235 9.71150643 9.59451235 9.71150643]
 [9.60189838 9.72816647 9.60189838 9.72816647]
 [9.60959661 9.74447617 9.60959661 9.74447617]
 [9.61754713 9.76049548 9.61754713 9.76049548]
 [9.6257063  9.7762681  9.6257063  9.7762681 ]
 [9.63404116 9.79182707 9.63404116 9.79182707]
 [9.64252607 9.80719807 9.64252607 9.80719807]
 [9.65114064 9.82240156 9.65114064 9.82240156]
 [9.65986833 9.83745414 9.65986833 9.83745414]
 [9.6686955  9.8523695  9.6686955  9.8523695 ]
 [9.67761074 9.86715912 9.67761074 9.86715912]
 [9.68660442 9.88183269 9.68660442 9.88183269]
 [9.69566826 9.89639854 9.69566826 9.89639854]
 [9.70479516 9.91086384 9.70479516 9.91086384]]

plt.plot(tme_train, y_train, label = 'Training Data')
plt.plot(tme_test, np.exp(fcast_mean), label = 'Forecast', color = 'black') #Note the exponentiation
plt.plot(tme_test, y_test, color = 'red',  label = 'Actual future values')
plt.plot(tme_test, np.exp(predlower), color = 'green', label = 'Prediction lower bound')
plt.plot(tme_test, np.exp(predupper), color = 'green', label = 'Prediction upper bound')
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.legend()
plt.show()

Note that now the predictions are slightly closer to the actual observations.

plt.plot(tme_train, y_train, label = 'Training Data')
plt.plot(tme_test, np.exp(fcast_mean), label = 'Predictions with logs', color = 'black') #Note the exponentiation
plt.plot(tme_test, y_test, color = 'red',  label = 'Actual future values')
plt.plot(tme_test, preds_nolog, color = 'darkgreen',  label = 'Predictions without logs')
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.legend()
plt.show()

Dataset Two: House Price Data from FRED¶

#The following is FRED data on Average Sales Price of Houses Sold for the US (https://fred.stlouisfed.org/series/ASPUS)
hprice = pd.read_csv('ASPUS_October2025.csv')
print(hprice.head())
y = hprice['ASPUS'].to_numpy()
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.xlabel('Year')
plt.ylabel('Dollars')
plt.title('Average Sales Price of Houses Sold in the US')
plt.show()

  observation_date  ASPUS
0       1963-01-01  19300
1       1963-04-01  19400
2       1963-07-01  19200
3       1963-10-01  19600
4       1964-01-01  19600

Training-test split:

n = len(y)
tme = range(1, n+1)
#n_test = 50
n_test = 100
n_train = n - n_test
y_train = y[:n_train]
tme_train = tme[:n_train]
y_test = y[n_train:]
tme_test = tme[n_train:]

p = 1
from statsmodels.tsa.ar_model import AutoReg
armod_sm = AutoReg(y_train, lags = p).fit()
print(armod_sm.summary())

                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                      y   No. Observations:                  150
Model:                     AutoReg(1)   Log Likelihood               -1371.075
Method:               Conditional MLE   S.D. of innovations           2399.236
Date:                Thu, 30 Oct 2025   AIC                           2748.150
Time:                        16:14:00   BIC                           2757.161
Sample:                             1   HQIC                          2751.811
                                  150                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        597.9359    366.248      1.633      0.103    -119.896    1315.768
y.L1           1.0071      0.003    290.953      0.000       1.000       1.014
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            0.9930           +0.0000j            0.9930            0.0000
-----------------------------------------------------------------------------

k = n_test
fcast = armod_sm.get_prediction(start = n_train, end = n_train+k-1)
fcast_mean = fcast.predicted_mean #this gives point predictions for the future values of the time series 
preds_nolog = fcast_mean

plt.plot(tme_train, y_train, label = 'Training Data')
plt.plot(tme_test, fcast_mean, label = 'Forecast (no logs)', color = 'black')
plt.plot(tme_test, y_test, color = 'red',  label = 'Actual future values')
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.legend()
plt.show()

fcast_se = fcast.se_mean
alpha = 0.05
z_alpha_half = stats.norm.ppf(1 - alpha/2)
predlower = fcast_mean - z_alpha_half * fcast_se
predupper = fcast_mean + z_alpha_half * fcast_se
fcast_int = fcast.conf_int()
print(np.column_stack([predlower, predupper, fcast_int]))

[[199725.76615832 209130.59668745 199725.76615832 209130.59668745]
 [199796.937386   213144.45231145 199796.937386   213144.45231145]
 [200324.99344089 216730.28966636 200324.99344089 216730.28966636]
 [201093.84012565 220104.40693304 201093.84012565 220104.40693304]
 [202020.13480094 223350.35217579 202020.13480094 223350.35217579]
 [203061.32477668 226510.88496127 203061.32477668 226510.88496127]
 [204192.2527195  229611.3709613  204192.2527195  229611.3709613 ]
 [205396.61252049 232668.32609549 205396.61252049 232668.32609549]
 [206663.14243429 235693.22340245 206663.14243429 235693.22340245]
 [207983.69414988 238694.4239796  207983.69414988 238694.4239796 ]
 [209352.15867435 241678.25110984 209352.15867435 241678.25110984]
 [210763.82585245 244649.63075271 210763.82585245 244649.63075271]
 [212214.98112465 247612.49479698 212214.98112465 247612.49479698]
 [213702.64028876 250570.04630982 213702.64028876 250570.04630982]
 [215224.36863025 253524.94041734 215224.36863025 253524.94041734]
 [216778.15379801 256479.41143975 216778.15379801 256479.41143975]
 [218362.31412526 259435.36458148 218362.31412526 259435.36458148]
 [219975.43102932 262394.44354244 219975.43102932 262394.44354244]
 [221616.29819339 265358.08134745 221616.29819339 265358.08134745]
 [223283.8827094  268327.53921462 223283.8827094  268327.53921462]
 [224977.29491605 271303.93672862 224977.29491605 271303.93672862]
 [226695.76466946 274288.27558144 226695.76466946 274288.27558144]
 [228438.62244804 277281.45847901 228438.62244804 277281.45847901]
 [230205.28414185 280284.30436337 230205.28414185 280284.30436337]
 [231995.23868641 283297.56079054 231995.23868641 283297.56079054]
 [233808.03791795 286321.91408698 233808.03791795 286321.91408698]
 [235643.28818218 289357.99775262 235643.28818218 289357.99775262]
 [237500.64334064 292406.39946642 237500.64334064 292406.39946642]
 [239379.79890105 295467.66696798 239379.79890105 295467.66696798]
 [241280.48705903 298542.31302792 241280.48705903 298542.31302792]
 [243202.47248461 301630.81967345 243202.47248461 301630.81967345]
 [245145.54872154 304733.64180134 245145.54872154 304733.64180134]
 [247109.53509442 307851.21028311 247109.53509442 307851.21028311]
 [249094.27403901 310983.93464711 249094.27403901 310983.93464711]
 [251099.62878731 314132.20540593 251099.62878731 314132.20540593]
 [253125.48135186 317296.39608467 253125.48135186 317296.39608467]
 [255171.73076343 320476.864996   255171.73076343 320476.864996  ]
 [257238.29152441 323673.95679938 257238.29152441 323673.95679938]
 [259325.09224688 326888.00387603 259325.09224688 326888.00387603]
 [261432.07444898 330119.3275453  261432.07444898 330119.3275453 ]
 [263559.19148819 333368.23914461 263559.19148819 333368.23914461]
 [265706.40761272 336635.04099104 265706.40761272 336635.04099104]
 [267873.69711589 339920.02724025 267873.69711589 339920.02724025]
 [270061.04358008 343223.48465581 270061.04358008 343223.48465581]
 [272268.43919912 346545.69330024 272268.43919912 346545.69330024]
 [274495.88416947 349886.92715726 274495.88416947 349886.92715726]
 [276743.38614204 353247.45469368 276743.38614204 353247.45469368]
 [279010.95972741 356627.53936775 279010.95972741 356627.53936775]
 [281298.62604845 360027.44009041 281298.62604845 360027.44009041]
 [283606.41233488 363447.41164461 283606.41233488 363447.41164461]
 [285934.35155523 366887.70506728 285934.35155523 366887.70506728]
 [288282.48208213 370348.56799816 288282.48208213 370348.56799816]
 [290650.8473874  373830.2449989  290650.8473874  373830.2449989 ]
 [293039.4957638  377332.97784556 293039.4957638  377332.97784556]
 [295448.48007082 380857.00579724 295448.48007082 380857.00579724]
 [297877.85750191 384402.56584331 297877.85750191 384402.56584331]
 [300327.68937131 387969.89293117 300327.68937131 387969.89293117]
 [302798.04091826 391559.22017672 302798.04091826 391559.22017672]
 [305288.98112727 395170.77905893 305288.98112727 395170.77905893]
 [307800.58256265 398804.79960013 307800.58256265 398804.79960013]
 [310332.92121622 402461.51053338 310332.92121622 402461.51053338]
 [312886.07636684 406141.13945801 312886.07636684 406141.13945801]
 [315460.13045087 409843.91298439 315460.13045087 409843.91298439]
 [318055.16894238 413570.056869   318055.16894238 413570.056869  ]
 [320671.28024257 417319.79614052 320671.28024257 417319.79614052]
 [323308.55557733 421093.35521777 323308.55557733 421093.35521777]
 [325967.08890245 424890.95802012 325967.08890245 424890.95802012]
 [328646.97681575 428712.82807118 328646.97681575 428712.82807118]
 [331348.31847566 432559.18859607 331348.31847566 432559.18859607]
 [334071.21552561 436430.26261297 334071.21552561 436430.26261297]
 [336815.77202394 440326.27301941 336815.77202394 440326.27301941]
 [339582.09437871 444247.44267354 339582.09437871 444247.44267354]
 [342370.29128725 448193.99447104 342370.29128725 448193.99447104]
 [345180.47367988 452166.15141776 345180.47367988 452166.15141776]
 [348012.75466763 456164.13669854 348012.75466763 456164.13669854]
 [350867.24949369 460188.17374253 350867.24949369 460188.17374253]
 [353744.07548814 464238.48628514 353744.07548814 464238.48628514]
 [356643.352026   468315.29842701 356643.352026   468315.29842701]
 [359565.20048807 472418.83469011 359565.20048807 472418.83469011]
 [362509.74422461 476549.32007124 362509.74422461 476549.32007124]
 [365477.10852156 480706.98009312 365477.10852156 480706.98009312]
 [368467.42056912 484892.04085316 368467.42056912 484892.04085316]
 [371480.80943258 489104.72907017 371480.80943258 489104.72907017]
 [374517.40602518 493345.27212912 374517.40602518 493345.27212912]
 [377577.34308299 497613.89812404 377577.34308299 497613.89812404]
 [380660.75514158 501910.83589926 380660.75514158 501910.83589926]
 [383767.77851437 506236.31508904 383767.77851437 506236.31508904]
 [386898.55127262 510590.56615575 386898.55127262 510590.56615575]
 [390053.21322695 514973.82042663 390053.21322695 514973.82042663]
 [393231.90591018 519386.31012931 393231.90591018 519386.31012931]
 [396434.77256163 523828.2684261  396434.77256163 523828.2684261 ]
 [399661.95811255 528299.92944718 399661.95811255 528299.92944718]
 [402913.60917286 532801.52832269 402913.60917286 532801.52832269]
 [406189.87401887 537333.30121388 406189.87401887 537333.30121388]
 [409490.90258217 541895.48534337 409490.90258217 541895.48534337]
 [412816.84643945 546488.31902446 412816.84643945 546488.31902446]
 [416167.85880327 551112.04168968 416167.85880327 551112.04168968]
 [419544.09451375 555766.89391864 419544.09451375 555766.89391864]
 [422945.71003107 560453.1174651  422945.71003107 560453.1174651 ]
 [426372.86342878 565170.95528336 426372.86342878 565170.95528336]]

#Plotting predictions along with uncertainty:
plt.plot(tme_train, y_train, label = 'Original Data')
plt.plot(tme_test, fcast_mean, label = 'Forecast', color = 'black')
plt.plot(tme_test, predlower, color = 'green', label = 'Prediction lower bound')
plt.plot(tme_test, predupper, color = 'green', label = 'Prediction upper bound')
plt.plot(tme_test, y_test, color = 'red', label = 'Actual Future Values')
plt.legend()
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.show()

Now we take logs.

ylog_train = np.log(y_train)
ylog_test = np.log(y_test)

armod_sm = AutoReg(ylog_train, lags = p).fit()
print(armod_sm.summary())

                            AutoReg Model Results                             
==============================================================================
Dep. Variable:                      y   No. Observations:                  150
Model:                     AutoReg(1)   Log Likelihood                 349.431
Method:               Conditional MLE   S.D. of innovations              0.023
Date:                Thu, 30 Oct 2025   AIC                           -692.861
Time:                        16:14:04   BIC                           -683.849
Sample:                             1   HQIC                          -689.200
                                  150                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0530      0.028      1.912      0.056      -0.001       0.107
y.L1           0.9967      0.002    401.775      0.000       0.992       1.002
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.0033           +0.0000j            1.0033            0.0000
-----------------------------------------------------------------------------

fcast = armod_sm.get_prediction(start = n_train, end = n_train+k-1)
fcast_mean = fcast.predicted_mean #this gives the point predictions

plt.plot(tme_train, y_train, label = 'Training Data')
plt.plot(tme_test, np.exp(fcast_mean), label = 'Predictions with logs', color = 'black') #Note the exponentiation
plt.plot(tme_test, y_test, color = 'red',  label = 'Actual future values')
plt.plot(tme_test, preds_nolog, color = 'darkgreen',  label = 'Predictions without logs')
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.legend()
plt.show()

fcast_se = fcast.se_mean
alpha = 0.05
z_alpha_half = stats.norm.ppf(1 - alpha/2)
predlower = fcast_mean - z_alpha_half * fcast_se
predupper = fcast_mean + z_alpha_half * fcast_se
fcast_int = fcast.conf_int()
print(np.column_stack([predlower, predupper, fcast_int]))

[[12.18473427 12.27563044 12.18473427 12.27563044]
 [12.17815671 12.30648897 12.17815671 12.30648897]
 [12.17596656 12.33287905 12.17596656 12.33287905]
 [12.17603963 12.35692514 12.17603963 12.35692514]
 [12.17755168 12.37945174 12.17755168 12.37945174]
 [12.18007928 12.40088258 12.18007928 12.40088258]
 [12.18337047 12.42146985 12.18337047 12.42146985]
 [12.18726086 12.44137823 12.18726086 12.44137823]
 [12.19163608 12.46072235 12.19163608 12.46072235]
 [12.19641275 12.47958586 12.19641275 12.47958586]
 [12.20152783 12.49803205 12.20152783 12.49803205]
 [12.20693229 12.51611023 12.20693229 12.51611023]
 [12.21258708 12.5338597  12.21258708 12.5338597 ]
 [12.21846052 12.55131241 12.21846052 12.55131241]
 [12.22452647 12.56849476 12.22452647 12.56849476]
 [12.23076305 12.58542888 12.23076305 12.58542888]
 [12.23715175 12.60213355 12.23715175 12.60213355]
 [12.24367673 12.61862486 12.24367673 12.61862486]
 [12.25032431 12.63491676 12.25032431 12.63491676]
 [12.25708259 12.65102139 12.25708259 12.65102139]
 [12.26394112 12.66694947 12.26394112 12.66694947]
 [12.2708907  12.68271044 12.2708907  12.68271044]
 [12.27792314 12.69831276 12.27792314 12.69831276]
 [12.28503115 12.71376396 12.28503115 12.71376396]
 [12.29220816 12.72907086 12.29220816 12.72907086]
 [12.2994483  12.74423961 12.2994483  12.74423961]
 [12.3067462  12.75927579 12.3067462  12.75927579]
 [12.31409704 12.77418449 12.31409704 12.77418449]
 [12.32149639 12.78897039 12.32149639 12.78897039]
 [12.32894023 12.80363777 12.32894023 12.80363777]
 [12.33642484 12.81819056 12.33642484 12.81819056]
 [12.34394682 12.83263243 12.34394682 12.83263243]
 [12.35150305 12.84696676 12.35150305 12.84696676]
 [12.35909062 12.86119668 12.35909062 12.86119668]
 [12.36670685 12.87532512 12.36670685 12.87532512]
 [12.37434926 12.88935481 12.37434926 12.88935481]
 [12.38201551 12.90328832 12.38201551 12.90328832]
 [12.38970347 12.91712804 12.38970347 12.91712804]
 [12.39741112 12.93087622 12.39741112 12.93087622]
 [12.40513657 12.94453499 12.40513657 12.94453499]
 [12.41287806 12.95810633 12.41287806 12.95810633]
 [12.42063395 12.97159216 12.42063395 12.97159216]
 [12.42840268 12.98499424 12.42840268 12.98499424]
 [12.43618279 12.99831429 12.43618279 12.99831429]
 [12.44397292 13.01155389 12.44397292 13.01155389]
 [12.45177176 13.02471459 12.45177176 13.02471459]
 [12.45957811 13.03779783 12.45957811 13.03779783]
 [12.46739081 13.050805   12.46739081 13.050805  ]
 [12.47520876 13.06373743 12.47520876 13.06373743]
 [12.48303094 13.07659638 12.48303094 13.07659638]
 [12.49085637 13.08938305 12.49085637 13.08938305]
 [12.49868413 13.1020986  12.49868413 13.1020986 ]
 [12.50651333 13.11474414 12.50651333 13.11474414]
 [12.51434315 13.12732073 12.51434315 13.12732073]
 [12.52217278 13.13982939 12.52217278 13.13982939]
 [12.53000149 13.15227111 12.53000149 13.15227111]
 [12.53782854 13.16464682 12.53782854 13.16464682]
 [12.54565325 13.17695744 12.54565325 13.17695744]
 [12.55347498 13.18920384 12.55347498 13.18920384]
 [12.5612931  13.20138688 12.5612931  13.20138688]
 [12.56910703 13.21350735 12.56910703 13.21350735]
 [12.57691619 13.22556606 12.57691619 13.22556606]
 [12.58472004 13.23756375 12.58472004 13.23756375]
 [12.59251808 13.24950118 12.59251808 13.24950118]
 [12.60030981 13.26137905 12.60030981 13.26137905]
 [12.60809476 13.27319806 12.60809476 13.27319806]
 [12.61587247 13.28495887 12.61587247 13.28495887]
 [12.62364251 13.29666213 12.62364251 13.29666213]
 [12.63140448 13.30830848 12.63140448 13.30830848]
 [12.63915797 13.31989852 12.63915797 13.31989852]
 [12.64690261 13.33143286 12.64690261 13.33143286]
 [12.65463804 13.34291207 12.65463804 13.34291207]
 [12.6623639  13.35433671 12.6623639  13.35433671]
 [12.67007986 13.36570734 12.67007986 13.36570734]
 [12.6777856  13.37702448 12.6777856  13.37702448]
 [12.68548082 13.38828866 12.68548082 13.38828866]
 [12.69316521 13.39950038 12.69316521 13.39950038]
 [12.7008385  13.41066013 12.7008385  13.41066013]
 [12.70850042 13.42176841 12.70850042 13.42176841]
 [12.7161507  13.43282567 12.7161507  13.43282567]
 [12.72378909 13.44383239 12.72378909 13.44383239]
 [12.73141536 13.454789   12.73141536 13.454789  ]
 [12.73902927 13.46569595 12.73902927 13.46569595]
 [12.74663059 13.47655367 12.74663059 13.47655367]
 [12.75421912 13.48736257 12.75421912 13.48736257]
 [12.76179466 13.49812306 12.76179466 13.49812306]
 [12.76935699 13.50883556 12.76935699 13.50883556]
 [12.77690594 13.51950045 12.77690594 13.51950045]
 [12.78444133 13.53011811 12.78444133 13.53011811]
 [12.79196297 13.54068893 12.79196297 13.54068893]
 [12.7994707  13.55121328 12.7994707  13.55121328]
 [12.80696436 13.56169152 12.80696436 13.56169152]
 [12.81444379 13.572124   12.81444379 13.572124  ]
 [12.82190885 13.58251107 12.82190885 13.58251107]
 [12.82935938 13.59285308 12.82935938 13.59285308]
 [12.83679526 13.60315036 12.83679526 13.60315036]
 [12.84421635 13.61340325 12.84421635 13.61340325]
 [12.85162252 13.62361206 12.85162252 13.62361206]
 [12.85901366 13.63377713 12.85901366 13.63377713]
 [12.86638964 13.64389875 12.86638964 13.64389875]]

plt.plot(tme_train, y_train, label = 'Training Data')
plt.plot(tme_test, np.exp(fcast_mean), label = 'Forecast', color = 'black') #Note the exponentiation
plt.plot(tme_test, y_test, color = 'red',  label = 'Actual future values')
plt.plot(tme_test, np.exp(predlower), color = 'green', label = 'Prediction lower bound')
plt.plot(tme_test, np.exp(predupper), color = 'green', label = 'Prediction upper bound')
plt.axvline(x=n_train, color='gray', linestyle='--')
plt.legend()
plt.show()

How are the Prediction Uncertainties actually calculated?¶

We now calculate the prediction standard errors directly using the method described in class. The first step is to fit the AR(p) model and then to obtain estimates of $\phi_0, \dots, \phi_p$ as well as $\sigma$ . There are two estimates of $\sigma$ (only difference being the denominators: $n-p$ vs $n-2p-1$ ).

yreg = ylog_train[p:] #these are the response values in the autoregression
Xmat = np.ones((n_train-p, 1)) #this will be the design matrix (X) in the autoregression
for j in range(1, p+1):
    col = ylog_train[p-j : n_train-j]
    Xmat = np.column_stack([Xmat, col])

armod = sm.OLS(yreg, Xmat).fit()
sighat = np.sqrt(np.mean(armod.resid ** 2)) #note that this mean is taken over n-p observations (this is the MLE for sigma)
resdf = n_train - 2*p - 1
sigols = np.sqrt((np.sum(armod.resid ** 2))/resdf)
print(sighat, sigols)

0.023188223680961746 0.023345433777514888

The two estimates of $\sigma$ lead to two values for standard errors.

covmat = (sighat ** 2) * np.linalg.inv(np.dot(Xmat.T, Xmat))
covmat_ols = (sigols ** 2) * np.linalg.inv(np.dot(Xmat.T, Xmat))
print(np.sqrt(np.diag(covmat)))
print(armod_sm.bse)
print(np.sqrt(np.diag(covmat_ols)))
print(armod.bse)

[0.02770428 0.00248065]
[0.02770428 0.00248065]
[0.02789211 0.00249746]
[0.02789211 0.00249746]

The following code computes the prediction standard errors. It uses the matrix recursion method described in class.

sigest = sighat
Gamhat = np.array([[sigest ** 2]])
vkp = np.array([[armod.params[1]]])

for i in range(1, k):
    covterm = Gamhat @ vkp 
    varterm = sigest**2 + (vkp.T @ (Gamhat @ vkp)) 
    Gamhat = np.block([
        [Gamhat,    covterm],
        [covterm.T, varterm ]
    ])
    if i < p:
        new_coef = armod.params[i+1]
        vkp = np.vstack([ [new_coef], vkp ])
    else:
        vkp = np.vstack([ [0], vkp ])
predsd = np.sqrt(np.diag(Gamhat))

#Check that this method produces the same standard errors as those computed by .se_mean:
print(np.column_stack([predsd, fcast_se]))

[[0.02318822 0.02318822]
 [0.03273842 0.03273842]
 [0.04002943 0.04002943]
 [0.04614511 0.04614511]
 [0.05150606 0.05150606]
 [0.05632841 0.05632841]
 [0.06074075 0.06074075]
 [0.06482705 0.06482705]
 [0.06864572 0.06864572]
 [0.07223936 0.07223936]
 [0.07564022 0.07564022]
 [0.07887337 0.07887337]
 [0.08195881 0.08195881]
 [0.08491276 0.08491276]
 [0.08774863 0.08774863]
 [0.09047764 0.09047764]
 [0.09310931 0.09310931]
 [0.09565179 0.09565179]
 [0.09811212 0.09811212]
 [0.10049644 0.10049644]
 [0.10281014 0.10281014]
 [0.10505799 0.10505799]
 [0.10724422 0.10724422]
 [0.10937263 0.10937263]
 [0.11144661 0.11144661]
 [0.11346926 0.11346926]
 [0.11544334 0.11544334]
 [0.1173714  0.1173714 ]
 [0.11925576 0.11925576]
 [0.12109854 0.12109854]
 [0.12290168 0.12290168]
 [0.12466699 0.12466699]
 [0.12639613 0.12639613]
 [0.12809063 0.12809063]
 [0.12975194 0.12975194]
 [0.13138138 0.13138138]
 [0.1329802  0.1329802 ]
 [0.13454956 0.13454956]
 [0.13609054 0.13609054]
 [0.13760417 0.13760417]
 [0.1390914  0.1390914 ]
 [0.14055315 0.14055315]
 [0.14199025 0.14199025]
 [0.14340353 0.14340353]
 [0.14479373 0.14479373]
 [0.14616157 0.14616157]
 [0.14750774 0.14750774]
 [0.14883289 0.14883289]
 [0.15013762 0.15013762]
 [0.15142254 0.15142254]
 [0.15268818 0.15268818]
 [0.15393509 0.15393509]
 [0.15516377 0.15516377]
 [0.1563747  0.1563747 ]
 [0.15756836 0.15756836]
 [0.15874517 0.15874517]
 [0.15990556 0.15990556]
 [0.16104995 0.16104995]
 [0.16217871 0.16217871]
 [0.16329223 0.16329223]
 [0.16439086 0.16439086]
 [0.16547495 0.16547495]
 [0.16654482 0.16654482]
 [0.16760081 0.16760081]
 [0.16864321 0.16864321]
 [0.16967233 0.16967233]
 [0.17068844 0.17068844]
 [0.17169183 0.17169183]
 [0.17268276 0.17268276]
 [0.1736615  0.1736615 ]
 [0.17462827 0.17462827]
 [0.17558334 0.17558334]
 [0.17652692 0.17652692]
 [0.17745925 0.17745925]
 [0.17838054 0.17838054]
 [0.17929101 0.17929101]
 [0.18019085 0.18019085]
 [0.18108027 0.18108027]
 [0.18195946 0.18195946]
 [0.18282861 0.18282861]
 [0.18368789 0.18368789]
 [0.18453748 0.18453748]
 [0.18537756 0.18537756]
 [0.18620829 0.18620829]
 [0.18702983 0.18702983]
 [0.18784233 0.18784233]
 [0.18864596 0.18864596]
 [0.18944085 0.18944085]
 [0.19022717 0.19022717]
 [0.19100503 0.19100503]
 [0.19177459 0.19177459]
 [0.19253598 0.19253598]
 [0.19328932 0.19328932]
 [0.19403474 0.19403474]
 [0.19477238 0.19477238]
 [0.19550234 0.19550234]
 [0.19622475 0.19622475]
 [0.19693973 0.19693973]
 [0.19764737 0.19764737]
 [0.19834781 0.19834781]]