import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as smWe will fit AutoRegressive Models to some time series datasets, and obtain predictions for future values of the time series.
Dataset One: Liquor Sales Data from FRED¶
#The following is FRED data on retail sales (in millions of dollars) for beer, wine and liquor stores (https://fred.stlouisfed.org/series/MRTSSM4453USN)
beersales = pd.read_csv('MRTSSM4453USN_October2025.csv')
print(beersales.head())
y = beersales['MRTSSM4453USN'].to_numpy()
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.xlabel('Year')
plt.ylabel('Millions of Dollars')
plt.title('Retail Sales: Beer, wine and liquor stores')
plt.show() observation_date MRTSSM4453USN
0 1992-01-01 1414
1 1992-02-01 1444
2 1992-03-01 1496
3 1992-04-01 1569
4 1992-05-01 1707

AutoRegressive (AR) models are a simple way for obtaining forecasts (predictions) for future values of the time series.
p = 12 #this is the order of the AR model
n = len(y)
print(n)
print(y)403
[1414 1444 1496 1569 1707 1663 1792 1744 1658 1763 1715 2354 1512 1433
1573 1606 1683 1679 1828 1665 1600 1647 1670 2287 1450 1410 1607 1648
1715 1741 1858 1742 1710 1693 1731 2416 1459 1407 1619 1594 1696 1747
1769 1745 1731 1678 1762 2437 1575 1549 1723 1687 1836 1837 1908 1936
1672 1748 1871 2395 1605 1526 1745 1712 1940 1886 1976 1960 1783 1932
1962 2657 1740 1665 1783 1848 2015 1938 2079 1993 1910 2033 2026 2857
1755 1700 1867 1974 2078 2017 2200 2013 2000 2088 2140 3076 1810 1863
2066 2011 2225 2278 2326 2288 2228 2223 2409 3192 1995 1941 2201 2110
2337 2389 2349 2382 2191 2279 2495 3349 1999 1994 2236 2174 2429 2349
2427 2451 2170 2267 2456 3294 2003 1931 2145 2202 2431 2297 2466 2511
2323 2495 2514 3456 2169 2101 2270 2418 2543 2505 2716 2485 2481 2586
2619 3604 2104 2194 2398 2501 2568 2637 2771 2645 2639 2656 2782 3925
2303 2394 2616 2653 2870 2906 2966 2878 2852 2773 2987 4067 2445 2478
2862 2730 3107 3203 3163 3136 2911 2942 3142 4195 2551 2677 2852 2861
3264 3132 3359 3316 3010 3201 3235 4260 2782 2657 2888 2990 3312 3160
3397 3249 3119 3274 3209 4422 2754 2790 3076 3168 3319 3292 3503 3307
3222 3349 3374 4484 2765 2858 3111 3221 3325 3402 3584 3428 3373 3372
3499 4696 2877 3077 3413 3278 3640 3668 3589 3606 3365 3482 3714 4917
3021 3088 3465 3334 3771 3636 3848 3908 3459 3605 3863 4961 3213 3143
3466 3511 3947 3822 4014 3988 3601 3857 3865 5245 3359 3273 3625 3631
4070 3957 4252 3995 3818 4004 3995 5444 3376 3528 3786 3844 4114 4179
4436 4182 4136 4138 4359 5645 3474 3503 3957 3960 4328 4353 4466 4324
4221 4218 4506 5801 3684 3686 4314 4028 4574 4611 4632 4569 4296 4432
4797 5969 3837 3818 4331 4252 4741 4647 4876 4902 4349 4569 4933 6163
4066 4184 5081 4757 5737 5572 5905 5634 5410 5590 5631 7142 4960 4831
5566 5560 5940 5877 6044 5601 5468 5647 5893 7474 4812 4920 5441 5544
5727 5833 6003 5755 5621 5744 6017 7651 4983 4972 5535 5527 6067 6040
6096 5834 5783 5736 6166 7863 4907 5248 5765 5539 6264 6155 6256 6203
5658 5987 6453 7854 5055 5037 5613 5626 6310 5950 6225]
AR models work just like regression. The response vector and design matrix in the regression are set up as follows.
yreg = y[p:] #these are the response values in the autoregression
Xmat = np.ones((n-p, 1)) #this will be the design matrix (X) in the autoregression
for j in range(1, p+1):
col = y[p-j : n-j]
Xmat = np.column_stack([Xmat, col])
print(Xmat.shape)
print(Xmat)(391, 13)
[[1.000e+00 2.354e+03 1.715e+03 ... 1.496e+03 1.444e+03 1.414e+03]
[1.000e+00 1.512e+03 2.354e+03 ... 1.569e+03 1.496e+03 1.444e+03]
[1.000e+00 1.433e+03 1.512e+03 ... 1.707e+03 1.569e+03 1.496e+03]
...
[1.000e+00 5.626e+03 5.613e+03 ... 6.256e+03 6.155e+03 6.264e+03]
[1.000e+00 6.310e+03 5.626e+03 ... 6.203e+03 6.256e+03 6.155e+03]
[1.000e+00 5.950e+03 6.310e+03 ... 5.658e+03 6.203e+03 6.256e+03]]
armod = sm.OLS(yreg, Xmat).fit()
print(armod.summary()) OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.988
Model: OLS Adj. R-squared: 0.988
Method: Least Squares F-statistic: 2597.
Date: Thu, 23 Oct 2025 Prob (F-statistic): 0.00
Time: 16:14:31 Log-Likelihood: -2529.9
No. Observations: 391 AIC: 5086.
Df Residuals: 378 BIC: 5137.
Df Model: 12
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 6.7819 21.711 0.312 0.755 -35.907 49.471
x1 0.0439 0.018 2.448 0.015 0.009 0.079
x2 0.0428 0.018 2.379 0.018 0.007 0.078
x3 0.0521 0.018 2.913 0.004 0.017 0.087
x4 0.0336 0.018 1.870 0.062 -0.002 0.069
x5 0.0523 0.018 2.898 0.004 0.017 0.088
x6 0.0002 0.018 0.009 0.993 -0.036 0.036
x7 -0.0128 0.018 -0.697 0.486 -0.049 0.023
x8 -0.0304 0.019 -1.633 0.103 -0.067 0.006
x9 -0.0440 0.019 -2.368 0.018 -0.081 -0.007
x10 -0.0621 0.019 -3.352 0.001 -0.099 -0.026
x11 -0.0163 0.019 -0.875 0.382 -0.053 0.020
x12 0.9739 0.019 52.184 0.000 0.937 1.011
==============================================================================
Omnibus: 172.369 Durbin-Watson: 0.812
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1196.724
Skew: 1.726 Prob(JB): 1.36e-260
Kurtosis: 10.845 Cond. No. 3.36e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.36e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
arfitted = armod.fittedvalues
#Plotting the fitted values against response values
plt.xlabel('Fitted Values')
plt.ylabel('Response Values')
plt.title('AR(1) Model Fitted Values vs Response Values')
plt.plot(arfitted, yreg, 'o')
plt.show()
#Generate k-step ahead forecasts:
k = 100
yhat = np.concatenate([y, np.full(k, -9999)]) #extend data by k placeholder values
for i in range(1, k+1):
ans = armod.params[0]
for j in range(1, p+1):
ans += armod.params[j] * yhat[n+i-j-1]
yhat[n+i-1] = ans
predvalues = yhat[n:]#Plotting the series with forecasts:
plt.figure(figsize=(12, 6))
time_all = np.arange(1, n + k + 1)
plt.plot(time_all, yhat, color='C0')
plt.plot(range(1, n + 1), y, label='Original Data', color='C1')
plt.plot(range(n + 1, n + k + 1), predvalues, label='Forecasts', color='blue')
plt.axvline(x=n, color='black', linestyle='--', label='Forecast Start')
#plt.axhline(y=np.mean(y), color='gray', linestyle=':', label='Mean of Original Data')
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Time Series + AR(p) Forecasts')
plt.legend()
plt.show()
The predictions depend crucially on the order of the model. If , we get reasonable predictions. However for smaller values of , the predictions look very unnatural. Go back and repeat the code above for other values of .
Dataset Two: House Price Data from FRED¶
#The following is FRED data on Average Sales Price of Houses Sold for the US (https://fred.stlouisfed.org/series/ASPUS)
hprice = pd.read_csv('ASPUS_October2025.csv')
print(hprice.head())
y = hprice['ASPUS'].to_numpy()
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.xlabel('Year')
plt.ylabel('Dollars')
plt.title('Average Sales Price of Houses Sold in the US')
plt.show() observation_date ASPUS
0 1963-01-01 19300
1 1963-04-01 19400
2 1963-07-01 19200
3 1963-10-01 19600
4 1964-01-01 19600

p = 30 #this is the order of the AR model
n = len(y)
print(n)250
yreg = y[p:] #these are the response values in the autoregression
Xmat = np.ones((n-p, 1)) #this will be the design matrix (X) in the autoregression
for j in range(1, p+1):
col = y[p-j : n-j].reshape(-1, 1)
Xmat = np.column_stack([Xmat, col])
print(Xmat.shape)
print(y)(220, 31)
[ 19300 19400 19200 19600 19600 20200 20500 20900 21500 21000
21600 21700 22700 23200 23000 22800 24000 24200 23900 24400
25400 26700 26600 27000 27600 28100 28100 27100 27000 27300
26000 26300 27300 28600 28300 28200 29000 29400 30300 31600
32800 35100 35900 36600 38000 38600 39000 39300 40900 42600
42200 44400 46000 47800 48100 50300 51600 54300 54000 57500
59300 61600 63500 66400 68300 72400 74200 72700 73600 74400
77500 80000 80900 84300 83800 83700 81200 85700 83900 84600
86700 89100 92500 90800 94700 99200 98500 97800 98500 100500
100500 103800 106300 112000 114400 115600 120800 126100 129900 133500
137900 134800 141500 140400 144300 146800 150200 151200 149500 151200
145500 150100 151100 148200 145400 144400 144500 145300 141700 147200
144700 148900 148000 148300 153600 154200 152800 156100 153500 158900
157700 160900 161100 166000 164000 171000 172200 177200 174700 175400
180000 178800 184300 181500 189100 191800 193000 204800 202900 202400
204100 212100 211000 211200 207800 214200 227600 227600 219100 232500
233100 241000 248100 256000 262900 265300 274000 286300 288500 287800
294600 294200 305300 302600 308100 299600 322100 310100 301200 305800
290400 304200 285100 276600 257000 273400 274100 272900 275300 268800
266000 278000 268100 267600 263000 259700 278000 282700 294500 297700
307400 320400 324400 334400 331400 340600 340400 369400 348000 339700
347400 366700 357000 357900 358800 364900 374800 376900 373200 399700
374600 378400 392900 384000 375500 376700 382700 384600 383000 371100
400600 396900 417400 428600 468000 496700 499300 525100 520300 521000
505300 503000 521900 498300 519700 502200 498700 510900 514200 512800]
armod = sm.OLS(yreg, Xmat).fit()
print(armod.summary()) OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.998
Model: OLS Adj. R-squared: 0.997
Method: Least Squares F-statistic: 2885.
Date: Thu, 23 Oct 2025 Prob (F-statistic): 4.29e-235
Time: 16:22:56 Log-Likelihood: -2239.4
No. Observations: 220 AIC: 4541.
Df Residuals: 189 BIC: 4646.
Df Model: 30
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1252.3043 895.312 1.399 0.164 -513.784 3018.393
x1 0.7836 0.073 10.799 0.000 0.640 0.927
x2 0.5246 0.093 5.623 0.000 0.341 0.709
x3 -0.2076 0.100 -2.077 0.039 -0.405 -0.010
x4 0.0044 0.099 0.045 0.964 -0.191 0.200
x5 0.0292 0.099 0.296 0.768 -0.166 0.224
x6 -0.3315 0.098 -3.382 0.001 -0.525 -0.138
x7 -0.0284 0.101 -0.282 0.778 -0.227 0.170
x8 0.1678 0.102 1.646 0.101 -0.033 0.369
x9 0.1665 0.103 1.621 0.107 -0.036 0.369
x10 -0.2485 0.103 -2.413 0.017 -0.452 -0.045
x11 0.0576 0.106 0.546 0.586 -0.151 0.266
x12 0.3131 0.104 3.004 0.003 0.107 0.519
x13 -0.4220 0.108 -3.910 0.000 -0.635 -0.209
x14 -0.1454 0.110 -1.316 0.190 -0.363 0.073
x15 0.5566 0.116 4.807 0.000 0.328 0.785
x16 -0.0623 0.118 -0.527 0.599 -0.295 0.171
x17 -0.3085 0.123 -2.515 0.013 -0.550 -0.066
x18 0.1707 0.119 1.432 0.154 -0.064 0.406
x19 0.0860 0.118 0.728 0.468 -0.147 0.319
x20 -0.0764 0.118 -0.649 0.517 -0.309 0.156
x21 -0.2422 0.119 -2.043 0.042 -0.476 -0.008
x22 0.1030 0.119 0.867 0.387 -0.131 0.337
x23 0.0116 0.120 0.097 0.923 -0.225 0.248
x24 -0.2901 0.118 -2.468 0.014 -0.522 -0.058
x25 0.5416 0.117 4.641 0.000 0.311 0.772
x26 0.1503 0.123 1.219 0.224 -0.093 0.394
x27 -0.5970 0.123 -4.840 0.000 -0.840 -0.354
x28 0.2527 0.131 1.930 0.055 -0.006 0.511
x29 0.1942 0.121 1.610 0.109 -0.044 0.432
x30 -0.1401 0.103 -1.365 0.174 -0.343 0.062
==============================================================================
Omnibus: 33.340 Durbin-Watson: 1.992
Prob(Omnibus): 0.000 Jarque-Bera (JB): 114.033
Skew: 0.550 Prob(JB): 1.73e-25
Kurtosis: 6.351 Cond. No. 2.29e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.29e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
arfitted = armod.fittedvalues
#Plotting the fitted values against response values
plt.xlabel('Fitted Values')
plt.ylabel('Response Values')
plt.title('AR(1) Model Fitted Values vs Response Values')
#plotting with reduced size points for better visibility
plt.plot(arfitted, yreg, 'o', markersize=4)
plt.show()
#Generate k-step ahead forecasts:
k = 100
yhat = np.concatenate([y, np.full(k, -9999)]) #extend data by k placeholder values
for i in range(1, k+1):
ans = armod.params[0]
for j in range(1, p+1):
ans += armod.params[j] * yhat[n+i-j-1]
yhat[n+i-1] = ans
predvalues = yhat[n:]#Plotting the series with forecasts:
time_all = np.arange(1, n + k + 1)
plt.plot(time_all, yhat, color='C0')
plt.plot(range(1, n + 1), y, label='Original Data', color='C1')
plt.plot(range(n + 1, n + k + 1), predvalues, label='Forecasts', color='blue')
plt.axvline(x=n, color='black', linestyle='--', label='Forecast Start')
#plt.axhline(y=np.mean(y), color='gray', linestyle=':', label='Mean of Original Data')
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Time Series + AR(p) Forecasts')
plt.legend()
plt.show()
Repeat the above code for different values of to see how the predictions change.
We will now apply AR models to logarithms of .
ylog = np.log(y)
plt.plot(ylog)
plt.xlabel('Year')
plt.ylabel('Log Dollars')
plt.title('Log of Average Sales Price of Houses Sold in the US')
plt.show()
p = 30 #this is the order of the AR model
n = len(ylog)
print(n)250
yreg = ylog[p:] #these are the response values in the autoregression
Xmat = np.ones((n-p, 1)) #this will be the design matrix (X) in the autoregression
for j in range(1, p+1):
col = ylog[p-j : n-j].reshape(-1, 1)
Xmat = np.column_stack([Xmat, col])armod_log = sm.OLS(yreg, Xmat).fit()
print(armod_log.summary()) OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.999
Model: OLS Adj. R-squared: 0.999
Method: Least Squares F-statistic: 7003.
Date: Thu, 23 Oct 2025 Prob (F-statistic): 1.96e-271
Time: 16:22:59 Log-Likelihood: 509.83
No. Observations: 220 AIC: -957.7
Df Residuals: 189 BIC: -852.5
Df Model: 30
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.1013 0.033 3.049 0.003 0.036 0.167
x1 0.8024 0.073 11.057 0.000 0.659 0.946
x2 0.4279 0.092 4.628 0.000 0.246 0.610
x3 -0.1316 0.097 -1.356 0.177 -0.323 0.060
x4 0.0597 0.095 0.627 0.531 -0.128 0.247
x5 -0.0348 0.095 -0.366 0.715 -0.222 0.153
x6 -0.2380 0.094 -2.544 0.012 -0.423 -0.053
x7 0.0186 0.095 0.195 0.845 -0.169 0.207
x8 0.0834 0.095 0.874 0.383 -0.105 0.272
x9 0.0662 0.095 0.694 0.488 -0.122 0.254
x10 -0.1635 0.095 -1.727 0.086 -0.350 0.023
x11 0.0860 0.095 0.905 0.367 -0.101 0.273
x12 0.2016 0.094 2.135 0.034 0.015 0.388
x13 -0.2995 0.096 -3.128 0.002 -0.488 -0.111
x14 -0.0542 0.097 -0.558 0.577 -0.246 0.137
x15 0.3066 0.097 3.161 0.002 0.115 0.498
x16 -0.1182 0.097 -1.214 0.226 -0.310 0.074
x17 -0.1742 0.098 -1.781 0.077 -0.367 0.019
x18 0.0710 0.097 0.735 0.463 -0.119 0.261
x19 0.1175 0.096 1.227 0.221 -0.071 0.306
x20 0.1415 0.096 1.473 0.143 -0.048 0.331
x21 -0.2030 0.096 -2.105 0.037 -0.393 -0.013
x22 0.0006 0.097 0.007 0.995 -0.191 0.192
x23 -0.0754 0.097 -0.780 0.437 -0.266 0.115
x24 -0.1470 0.097 -1.517 0.131 -0.338 0.044
x25 0.3098 0.096 3.228 0.001 0.120 0.499
x26 0.1047 0.098 1.065 0.288 -0.089 0.299
x27 -0.3197 0.098 -3.256 0.001 -0.513 -0.126
x28 0.1514 0.101 1.503 0.135 -0.047 0.350
x29 0.0733 0.097 0.756 0.451 -0.118 0.264
x30 -0.0703 0.075 -0.938 0.349 -0.218 0.078
==============================================================================
Omnibus: 0.073 Durbin-Watson: 1.980
Prob(Omnibus): 0.964 Jarque-Bera (JB): 0.002
Skew: 0.001 Prob(JB): 0.999
Kurtosis: 3.013 Cond. No. 6.47e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.47e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
arfitted_log = armod_log.fittedvalues
#Plotting the fitted values against response values
plt.xlabel('Fitted Values')
plt.ylabel('Response Values')
plt.title('AR(1) Model Fitted Values vs Response Values')
#plotting with reduced size points for better visibility
plt.plot(arfitted_log, yreg, 'o', markersize=4)
plt.show()
#Generate k-step ahead forecasts:
k = 100
yhat = np.concatenate([ylog, np.full(k, -9999)]) #extend data by k placeholder values
for i in range(1, k+1):
ans = armod_log.params[0]
for j in range(1, p+1):
ans += armod_log.params[j] * yhat[n+i-j-1]
yhat[n+i-1] = ans
predvalues_log = yhat[n:]#Plotting the series with forecasts:
time_all = np.arange(1, n + k + 1)
plt.plot(time_all, yhat, color='C0')
plt.plot(range(1, n + 1), ylog, label='Original Data (log scale)', color='C1')
plt.plot(range(n + 1, n + k + 1), predvalues_log, label='Forecasts', color='blue')
plt.axvline(x=n, color='black', linestyle='--', label='Forecast Start')
#plt.axhline(y=np.mean(y), color='gray', linestyle=':', label='Mean of Original Data')
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Time Series + AR(p) Forecasts')
plt.legend()
plt.show()
We now plot the predictions on the original data.
#Plotting the series with forecasts:
time_all = np.arange(1, n + k + 1)
plt.plot(range(1, n + 1), y, label='Original Data', color='C1')
plt.plot(range(n + 1, n + k + 1), np.exp(predvalues_log), label='Forecasts (AR on log)', color='blue')
plt.plot(range(n + 1, n + k + 1), predvalues, label='Forecasts (AR original data)', color='green')
plt.axvline(x=n, color='black', linestyle='--', label='Forecast Start')
#plt.axhline(y=np.mean(y), color='gray', linestyle=':', label='Mean of Original Data')
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Time Series + AR(p) Forecasts')
plt.legend()
plt.show()
Sunspots Data¶
Autoregressive models were invented in the context of the sunspots data by Yule in 1927.
sunspots = pd.read_csv('SN_y_tot_V2.0.csv', header = None, sep = ';')
y = sunspots.iloc[:,1].values
n = len(y)
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.show()
The most basic model for this dataset is the single sinusoid model:
with . We previously used this model. For estimating , we calculate for a bunch of values, and then minimize it over .
#rss function:
def rss(f):
n = len(y)
X = np.column_stack([np.ones(n)])
x = np.arange(1, n+1)
xcos = np.cos(2 * np.pi * f * x)
xsin = np.sin(2 * np.pi * f * x)
X = np.column_stack([X, xcos, xsin])
md = sm.OLS(y, X).fit()
ans = np.sum(md.resid ** 2)
return ansngrid = 10000
fvals = np.linspace(0, 0.5, ngrid)
rssvals = np.array([rss(f) for f in fvals])
plt.plot(fvals, rssvals) #plot rss and find f which minimizes rss
plt.show()
fhat = fvals[np.argmin(rssvals)]
print(fhat)
print(1/fhat) #period corresponding to fhat0.09090909090909091
11.0
The MLE of is basically (corresponding to the 11-year solar cycle). After estimating , the other parameters are estimated as in usual linear regression as follows.
#Estimates of beta and sigma:
x = np.arange(1, n+1)
f = fhat
xcos = np.cos(2 * np.pi * f * x)
xsin = np.sin(2 * np.pi * f * x)
X = np.column_stack([np.ones(n), xcos, xsin])
md = sm.OLS(y, X).fit()
print(md.params) #this gives estimates of beta_0, beta_1, beta_2
rss_fhat = np.sum(md.resid ** 2)
#there are two estimates for sigma (which usually give similar values)
sigma_mle = np.sqrt(rss_fhat/n)
sigma_unbiased = np.sqrt((rss_fhat)/(n-3))
print(np.array([sigma_mle, sigma_unbiased])) [ 78.87599158 -38.28231622 -29.2786631 ]
[51.62376442 51.86369026]
Yule did not like the simple single sinusoidal model for the sunspots data, mainly because if we simulate data from it, the data will look much more noisy and irregular as opposed to the sunspots data which have a smooth appearance. We can check this by simulating synthetic data from the single sinusoid model and comparing the synthetic data with the actual sunspots data. We will take the parameters to be those estimated from the sunspots dataset to make the plots comparable to the actual sunspots data.
rng = np.random.default_rng(seed = 42)
errorsamples = rng.normal(loc = 0, scale = sigma_mle, size = n)
sim_data = md.fittedvalues + errorsamples
plt.figure(figsize = (10, 6))
plt.plot(sim_data)
plt.show()
To facilitate comparison with the actual sunspots dataset, let us plot a bunch of these simulated datasets along with the real sunspots data (just to see if the sunspots dataset can be spotted as the “odd one out” from these plots).
fig, axes = plt.subplots(3, 3, figsize = (12, 4))
axes = axes.flatten()
for i in range(5):
errorsamples = rng.normal(loc = 0, scale = sigma_mle, size = n)
sim_data = md.fittedvalues + errorsamples
axes[i].plot(sim_data)
axes[5].plot(y)
for i, idx in enumerate(range(6, 9)):
errorsamples = rng.normal(loc = 0, scale = sigma_mle, size = n)
sim_data = md.fittedvalues + errorsamples
axes[idx].plot(sim_data)
plt.tight_layout()
plt.show()
It is clear from the above plot-grid that the sunspots dataset sticks out as the odd one out. This shows that the simulated datasets (which are too wiggly and do not have well-defined peaks) are generated from a model that ignores many aspects of the sunspots dataset.
Yule Model¶
Yule model is given by:
Note that in the usual AR(2) model is set to -1 above. Also is given by . The above model can be rewritten as
so we can estimate and by regressing on .
p = 2
yreg = y[p:]
x1 = y[1:-1]
x2 = y[:-2]
Xmat = np.column_stack([np.ones(len(yreg)), x1])
print(Xmat.shape)(323, 2)
y_adjusted = yreg + x2
yulemod = sm.OLS(y_adjusted, Xmat).fit()
print(yulemod.summary())
print(yulemod.params)
slpe_est = yulemod.params[1] OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.930
Model: OLS Adj. R-squared: 0.930
Method: Least Squares F-statistic: 4254.
Date: Thu, 23 Oct 2025 Prob (F-statistic): 2.90e-187
Time: 16:51:37 Log-Likelihood: -1532.1
No. Observations: 323 AIC: 3068.
Df Residuals: 321 BIC: 3076.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 28.6834 2.511 11.421 0.000 23.742 33.624
x1 1.6365 0.025 65.224 0.000 1.587 1.686
==============================================================================
Omnibus: 11.119 Durbin-Watson: 2.493
Prob(Omnibus): 0.004 Jarque-Bera (JB): 17.202
Skew: 0.225 Prob(JB): 0.000184
Kurtosis: 4.037 Cond. No. 162.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[28.6833743 1.63648935]
From the estimate for , the frequency parameter can be estimated using the relation: .
yulefhat = np.arccos(slpe_est/2) / (2 * np.pi)
print(1/yulefhat)10.259176143439632
The period corresponding to this frequency is 10.259 which is smaller than the period of 11 that we get by fitting .
#Generate k-step ahead forecasts:
k = 100
yhat = np.concatenate([y, np.full(k, -9999)]) #extend data by k placeholder values
for i in range(1, k+1):
ans = yulemod.params[0] - yhat[n+i-3]
ans += yulemod.params[1] * yhat[n+i-2]
yhat[n+i-1] = ans
predvalues = yhat[n:]
print(predvalues)
print(yhat)[ 1.56348277e+02 1.29845664e+02 8.48261433e+01 3.76547907e+01
5.47889486e+00 -5.26328291e-03 2.31958661e+01 6.66484255e+01
1.14556947e+02 1.49506172e+02 1.58791685e+02 1.39038104e+02
9.74260659e+01 4.90819890e+01 1.15794607e+01 -1.44895065e+00
1.47327213e+01 5.42422665e+01 1.02717544e+02 1.42537275e+02
1.59226562e+02 1.46718673e+02 1.09560357e+02 6.12590590e+01
1.93728148e+01 -8.72279635e-01 7.88308317e+00 4.24562356e+01
9.02794685e+01 1.33968527e+02 1.57641974e+02 1.52694258e+02
1.20923928e+02 7.38798357e+01 2.86630109e+01 1.71025067e+00
2.81917042e+00 3.15866660e+01 7.75554463e+01 1.24015370e+02
1.54077760e+02 1.56814618e+02 1.31231066e+02 8.66269979e+01
3.92164679e+01 6.23370836e+00 -3.31696240e-01 2.19068486e+01
6.48653949e+01 1.12928054e+02 1.48623536e+02 1.58976155e+02
1.40222622e+02 9.91800471e+01 5.07678428e+01 1.25843613e+01
-1.49029536e+00 1.36601606e+01 5.25283769e+01 1.00985343e+02
1.41416436e+02 1.59124522e+02 1.47672524e+02 1.11223365e+02
6.30267024e+01 2.06025365e+01 -6.27496582e-01 7.05394632e+00
4.08545789e+01 8.84875112e+01 1.32637665e+02 1.57255989e+02
1.53393461e+02 1.22454150e+02 7.56848253e+01 3.00866351e+01
2.23500697e+00 2.25430426e+00 3.01375122e+01 7.57487878e+01
1.22507947e+02 1.53417536e+02 1.57241592e+02 1.32590028e+02
8.84239513e+01 4.07982008e+01 7.02524409e+00 -6.18089412e-01
2.06466335e+01 6.30894595e+01 1.11281969e+02 1.47705672e+02
1.59120165e+02 1.41376156e+02 1.00923784e+02 5.24679154e+01
1.36227750e+01 -1.49101484e+00 1.26205694e+01 5.08278165e+01]
[ 8.30000000e+00 1.83000000e+01 2.67000000e+01 3.83000000e+01
6.00000000e+01 9.67000000e+01 4.83000000e+01 3.33000000e+01
1.67000000e+01 1.33000000e+01 5.00000000e+00 0.00000000e+00
0.00000000e+00 3.30000000e+00 1.83000000e+01 4.50000000e+01
7.83000000e+01 1.05000000e+02 1.00000000e+02 6.50000000e+01
4.67000000e+01 4.33000000e+01 3.67000000e+01 1.83000000e+01
3.50000000e+01 6.67000000e+01 1.30000000e+02 2.03300000e+02
1.71700000e+02 1.21700000e+02 7.83000000e+01 5.83000000e+01
1.83000000e+01 8.30000000e+00 2.67000000e+01 5.67000000e+01
1.16700000e+02 1.35000000e+02 1.85000000e+02 1.68300000e+02
1.21700000e+02 6.67000000e+01 3.33000000e+01 2.67000000e+01
8.30000000e+00 1.83000000e+01 3.67000000e+01 6.67000000e+01
1.00000000e+02 1.34800000e+02 1.39000000e+02 7.95000000e+01
7.97000000e+01 5.12000000e+01 2.03000000e+01 1.60000000e+01
1.70000000e+01 5.40000000e+01 7.93000000e+01 9.00000000e+01
1.04800000e+02 1.43200000e+02 1.02000000e+02 7.52000000e+01
6.07000000e+01 3.48000000e+01 1.90000000e+01 6.30000000e+01
1.16300000e+02 1.76800000e+02 1.68000000e+02 1.36000000e+02
1.10800000e+02 5.80000000e+01 5.10000000e+01 1.17000000e+01
3.30000000e+01 1.54200000e+02 2.57300000e+02 2.09800000e+02
1.41300000e+02 1.13500000e+02 6.42000000e+01 3.80000000e+01
1.70000000e+01 4.02000000e+01 1.38200000e+02 2.20000000e+02
2.18200000e+02 1.96800000e+02 1.49800000e+02 1.11000000e+02
1.00000000e+02 7.82000000e+01 6.83000000e+01 3.55000000e+01
2.67000000e+01 1.07000000e+01 6.80000000e+00 1.13000000e+01
2.42000000e+01 5.67000000e+01 7.50000000e+01 7.18000000e+01
7.92000000e+01 7.03000000e+01 4.68000000e+01 1.68000000e+01
1.35000000e+01 4.20000000e+00 0.00000000e+00 2.30000000e+00
8.30000000e+00 2.03000000e+01 2.32000000e+01 5.90000000e+01
7.63000000e+01 6.83000000e+01 5.29000000e+01 3.85000000e+01
2.42000000e+01 9.20000000e+00 6.30000000e+00 2.20000000e+00
1.14000000e+01 2.82000000e+01 5.99000000e+01 8.30000000e+01
1.08500000e+02 1.15200000e+02 1.17400000e+02 8.08000000e+01
4.43000000e+01 1.34000000e+01 1.95000000e+01 8.58000000e+01
1.92700000e+02 2.27300000e+02 1.68700000e+02 1.43000000e+02
1.05500000e+02 6.33000000e+01 4.03000000e+01 1.81000000e+01
2.51000000e+01 6.58000000e+01 1.02700000e+02 1.66300000e+02
2.08300000e+02 1.82500000e+02 1.26300000e+02 1.22000000e+02
1.02700000e+02 7.41000000e+01 3.90000000e+01 1.27000000e+01
8.20000000e+00 4.34000000e+01 1.04400000e+02 1.78300000e+02
1.82200000e+02 1.46600000e+02 1.12100000e+02 8.35000000e+01
8.92000000e+01 5.78000000e+01 3.07000000e+01 1.39000000e+01
6.28000000e+01 1.23600000e+02 2.32000000e+02 1.85300000e+02
1.69200000e+02 1.10100000e+02 7.45000000e+01 2.83000000e+01
1.89000000e+01 2.07000000e+01 5.70000000e+00 1.00000000e+01
5.37000000e+01 9.05000000e+01 9.90000000e+01 1.06100000e+02
1.05800000e+02 8.63000000e+01 4.24000000e+01 2.18000000e+01
1.12000000e+01 1.04000000e+01 1.18000000e+01 5.95000000e+01
1.21700000e+02 1.42000000e+02 1.30000000e+02 1.06600000e+02
6.94000000e+01 4.38000000e+01 4.44000000e+01 2.02000000e+01
1.57000000e+01 4.60000000e+00 8.50000000e+00 4.08000000e+01
7.01000000e+01 1.05500000e+02 9.01000000e+01 1.02800000e+02
8.09000000e+01 7.32000000e+01 3.09000000e+01 9.50000000e+00
6.00000000e+00 2.40000000e+00 1.61000000e+01 7.90000000e+01
9.50000000e+01 1.73600000e+02 1.34600000e+02 1.05700000e+02
6.27000000e+01 4.35000000e+01 2.37000000e+01 9.70000000e+00
2.79000000e+01 7.40000000e+01 1.06500000e+02 1.14700000e+02
1.29700000e+02 1.08200000e+02 5.94000000e+01 3.51000000e+01
1.86000000e+01 9.20000000e+00 1.46000000e+01 6.02000000e+01
1.32800000e+02 1.90600000e+02 1.82600000e+02 1.48000000e+02
1.13000000e+02 7.92000000e+01 5.08000000e+01 2.71000000e+01
1.61000000e+01 5.53000000e+01 1.54300000e+02 2.14700000e+02
1.93000000e+02 1.90700000e+02 1.18900000e+02 9.83000000e+01
4.50000000e+01 2.01000000e+01 6.60000000e+00 5.42000000e+01
2.00700000e+02 2.69300000e+02 2.61700000e+02 2.25100000e+02
1.59000000e+02 7.64000000e+01 5.34000000e+01 3.99000000e+01
1.50000000e+01 2.20000000e+01 6.68000000e+01 1.32900000e+02
1.50000000e+02 1.49400000e+02 1.48000000e+02 9.44000000e+01
9.76000000e+01 5.41000000e+01 4.92000000e+01 2.25000000e+01
1.84000000e+01 3.93000000e+01 1.31000000e+02 2.20100000e+02
2.18900000e+02 1.98900000e+02 1.62400000e+02 9.10000000e+01
6.05000000e+01 2.06000000e+01 1.48000000e+01 3.39000000e+01
1.23000000e+02 2.11100000e+02 1.91800000e+02 2.03300000e+02
1.33000000e+02 7.61000000e+01 4.49000000e+01 2.51000000e+01
1.16000000e+01 2.89000000e+01 8.83000000e+01 1.36300000e+02
1.73900000e+02 1.70400000e+02 1.63600000e+02 9.93000000e+01
6.53000000e+01 4.58000000e+01 2.47000000e+01 1.26000000e+01
4.20000000e+00 4.80000000e+00 2.49000000e+01 8.08000000e+01
8.45000000e+01 9.40000000e+01 1.13300000e+02 6.98000000e+01
3.98000000e+01 2.17000000e+01 7.00000000e+00 3.60000000e+00
8.80000000e+00 2.96000000e+01 8.32000000e+01 1.25500000e+02
1.54700000e+02 1.56348277e+02 1.29845664e+02 8.48261433e+01
3.76547907e+01 5.47889486e+00 -5.26328291e-03 2.31958661e+01
6.66484255e+01 1.14556947e+02 1.49506172e+02 1.58791685e+02
1.39038104e+02 9.74260659e+01 4.90819890e+01 1.15794607e+01
-1.44895065e+00 1.47327213e+01 5.42422665e+01 1.02717544e+02
1.42537275e+02 1.59226562e+02 1.46718673e+02 1.09560357e+02
6.12590590e+01 1.93728148e+01 -8.72279635e-01 7.88308317e+00
4.24562356e+01 9.02794685e+01 1.33968527e+02 1.57641974e+02
1.52694258e+02 1.20923928e+02 7.38798357e+01 2.86630109e+01
1.71025067e+00 2.81917042e+00 3.15866660e+01 7.75554463e+01
1.24015370e+02 1.54077760e+02 1.56814618e+02 1.31231066e+02
8.66269979e+01 3.92164679e+01 6.23370836e+00 -3.31696240e-01
2.19068486e+01 6.48653949e+01 1.12928054e+02 1.48623536e+02
1.58976155e+02 1.40222622e+02 9.91800471e+01 5.07678428e+01
1.25843613e+01 -1.49029536e+00 1.36601606e+01 5.25283769e+01
1.00985343e+02 1.41416436e+02 1.59124522e+02 1.47672524e+02
1.11223365e+02 6.30267024e+01 2.06025365e+01 -6.27496582e-01
7.05394632e+00 4.08545789e+01 8.84875112e+01 1.32637665e+02
1.57255989e+02 1.53393461e+02 1.22454150e+02 7.56848253e+01
3.00866351e+01 2.23500697e+00 2.25430426e+00 3.01375122e+01
7.57487878e+01 1.22507947e+02 1.53417536e+02 1.57241592e+02
1.32590028e+02 8.84239513e+01 4.07982008e+01 7.02524409e+00
-6.18089412e-01 2.06466335e+01 6.30894595e+01 1.11281969e+02
1.47705672e+02 1.59120165e+02 1.41376156e+02 1.00923784e+02
5.24679154e+01 1.36227750e+01 -1.49101484e+00 1.26205694e+01
5.08278165e+01]
#Plotting the series with forecasts:
plt.figure(figsize=(12, 6))
time_all = np.arange(1, n + k + 1)
plt.plot(time_all, yhat, label='Extended Series (yhat)', color='C0')
plt.plot(range(1, n + 1), y, label='Original Data', color='C1')
plt.plot(range(n + 1, n + k + 1), predvalues, label='Forecasts', color='blue')
plt.axvline(x=n, color='black', linestyle='--', label='Forecast Start')
#plt.axhline(y=np.mean(y), color='gray', linestyle=':', label='Mean of Original Data')
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Time Series + AR(p) Forecasts')
plt.legend()
plt.show()
The predictions above are perfectly sinusoidal. Because the cycles for the sunspots data are irregular and can have different periods, there is a danger that these perfectly sinusoidal predictions can get out of phase sometimes leading to loss of prediction accuracy.
Below, we simulate datasets using the Yule model.
#Simulating from the Yule Model:
rng = np.random.default_rng(seed = 42)
sighat = np.sqrt(np.mean(yulemod.resid ** 2))
print(sighat)
fig, axes = plt.subplots(3, 3, figsize = (12, 6))
axes = axes.flatten()
for j in range(1):
ysim = y.copy()
for i in range(2, n):
err = rng.normal(loc = 0, scale = sighat, size = 1)
ysim[i] = yulemod.params[0] + yulemod.params[1] * ysim[i-1] - ysim[i-2] + err[0]
axes[j].plot(ysim)
axes[1].plot(y)
for j in range(2, 9):
ysim = y.copy()
for i in range(2, n):
err = rng.normal(loc = 0, scale = sighat, size = 1)
ysim[i] = yulemod.params[0] + yulemod.params[1] * ysim[i-1] - ysim[i-2] + err[0]
axes[j].plot(ysim)
plt.show()27.778317980024667

The sunspots dataset still probably sticks out compared to the synthetic datasets. However, all these synthetic datasets are quite smooth just like the actual sunspots data. This plots shows that the two “sinusoid + noise” models described in lecture are fundamentally different.
AR(2) Model¶
The Yule model is a special case of the AR(2) model. Specifically, the AR(2) model is: and the Yule model is a special case obtained by taking . Below we fit the AR(2) model to the sunspots data (i.e., we estimate all the parameters ).
p = 2 #this is the order of the AR model
n = len(y)
print(n)325
yreg = y[p:] #these are the response values in the autoregression
Xmat = np.ones((n-p, 1)) #this will be the design matrix (X) in the autoregression
for j in range(1, p+1):
col = y[p-j : n-j].reshape(-1, 1)
Xmat = np.column_stack([Xmat, col])armod = sm.OLS(yreg, Xmat).fit()
print(armod.params)
print(armod.summary())
sighat = np.sqrt(np.mean(armod.resid ** 2))
print(sighat)[24.45610705 1.38803272 -0.69646032]
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.829
Model: OLS Adj. R-squared: 0.828
Method: Least Squares F-statistic: 774.7
Date: Thu, 23 Oct 2025 Prob (F-statistic): 2.25e-123
Time: 16:56:04 Log-Likelihood: -1505.5
No. Observations: 323 AIC: 3017.
Df Residuals: 320 BIC: 3028.
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 24.4561 2.384 10.260 0.000 19.767 29.146
x1 1.3880 0.040 34.524 0.000 1.309 1.467
x2 -0.6965 0.040 -17.342 0.000 -0.775 -0.617
==============================================================================
Omnibus: 33.173 Durbin-Watson: 2.200
Prob(Omnibus): 0.000 Jarque-Bera (JB): 50.565
Skew: 0.663 Prob(JB): 1.05e-11
Kurtosis: 4.414 Cond. No. 231.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
25.588082517109108
#Simulations from the fitted AR(2) model:
fig, axes = plt.subplots(3, 3, figsize = (12, 6))
axes = axes.flatten()
for j in range(3):
ysim = y.copy()
for i in range(2, n):
err = rng.normal(loc = 0, scale = sighat, size = 1)
ysim[i] = armod.params[0] + armod.params[1] * ysim[i-1] + armod.params[2] * ysim[i-2] + err[0]
axes[j].plot(ysim)
axes[3].plot(y)
for j in range(4, 9):
ysim = y.copy()
for i in range(2, n):
err = rng.normal(loc = 0, scale = sighat, size = 1)
ysim[i] = armod.params[0] + armod.params[1] * ysim[i-1] + armod.params[2] * ysim[i-2] + err[0]
axes[j].plot(ysim)
plt.show()
These datasets are still smooth (much smoother than observations simulated from ).
Below, we obtain predictions using the AR(2) model.
#Generate k-step ahead forecasts:
k = 100
yhat = np.concatenate([y, np.full(k, -9999)]) #extend data by k placeholder values
for i in range(1, k+1):
ans = armod.params[0]
for j in range(1, p+1):
ans += armod.params[j] * yhat[n+i-j-1]
yhat[n+i-1] = ans
predvalues = yhat[n:]
#Plotting the series with forecasts:
plt.figure(figsize=(12, 6))
time_all = np.arange(1, n + k + 1)
plt.plot(time_all, yhat, color='C0')
plt.plot(range(1, n + 1), y, label='Original Data', color='C1')
plt.plot(range(n + 1, n + k + 1), predvalues, label='Forecasts', color='blue')
plt.axvline(x=n, color='black', linestyle='--', label='Forecast Start')
#plt.axhline(y=np.mean(y), color='gray', linestyle=':', label='Mean of Original Data')
plt.xlabel('Time')
plt.ylabel('Data')
plt.title('Time Series + AR(p) Forecasts')
plt.legend()
plt.show()
The predictions above are quite different from those obtained from the Yule model. The predictions actually are given by a damped sinusoid (as we shall see in the coming lectures). The predictions become flat after a few time points. When the cycles become irregular with varying periods, constant predictors might do well compared to sinusoidal predictors which may go out of phase at some places.