Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We shall fit simple linear regression models to the Consumer Price Index (CPI) Data (downloaded from https://fred.stlouisfed.org/series/CPIAUCSL). Changes in the CPI are used to measure Inflation.

## Inflation (Consumer Price Index)
cpi = pd.read_csv('CPIAUCSL_01September2025.csv')
print(cpi)
    observation_date  CPIAUCSL
0         1947-01-01    21.480
1         1947-02-01    21.620
2         1947-03-01    22.000
3         1947-04-01    22.000
4         1947-05-01    21.950
..               ...       ...
938       2025-03-01   319.615
939       2025-04-01   320.321
940       2025-05-01   320.580
941       2025-06-01   321.500
942       2025-07-01   322.132

[943 rows x 2 columns]
cpi['observation_date'] = pd.to_datetime(cpi['observation_date'])
cpi.set_index('observation_date', inplace = True)
print(cpi)
                  CPIAUCSL
observation_date          
1947-01-01          21.480
1947-02-01          21.620
1947-03-01          22.000
1947-04-01          22.000
1947-05-01          21.950
...                    ...
2025-03-01         319.615
2025-04-01         320.321
2025-05-01         320.580
2025-06-01         321.500
2025-07-01         322.132

[943 rows x 1 columns]
plt.figure(figsize=(8,6))
plt.plot(cpi.index, cpi['CPIAUCSL'], label='CPI')
plt.xlabel('Year')
plt.ylabel('CPI')
plt.title('Consumer Price Index (CPI)')
plt.show()
<Figure size 800x600 with 1 Axes>

CPI Regression with Time as Covariate

We will fit a linear regression model to the CPI data, with time as the covariate. The simplest model is:

yi=β0+β1xi+ϵiy_i = \beta_0 + \beta_1 x_i + \epsilon_i

where yiy_i denotes the CPI for time index ii, and xi=ix_i = i. The interpretation of β1\beta_1 is that it represents the numerical change in CPI for one month to the next. This is not a good model for CPI, because the relationship between CPI and data does not appear to be linear. Also CPI increases are usually reported in percentages (e.g., inflation is now at 2.8% etc.), and not in raw numerical increase. A better linear regression model here would use yiy_i as the logarithm of CPI for time index ii:

yi=log(CPIi)=β0+β1xi+ϵiy_i = \log(\text{CPI}_i) = \beta_0 + \beta_1 x_i + \epsilon_i

where xi=ix_i = i as before. Now the interpretation of β1\beta_1 is that it represents the numerical change in logCPIi\log \text{CPI}_i for one month to the next. Note that

logCPIilogCPIi1=logCPIiCPIi1CPIiCPIi11=CPIiCPIi1CPIi1\log \text{CPI}_i - \log \text{CPI}_{i-1} = \log \frac{\text{CPI}_i}{\text{CPI}_{i-1}} \approx \frac{\text{CPI}_i}{\text{CPI}_{i-1}} - 1 = \frac{\text{CPI}_i - \text{CPI}_{i-1}}{\text{CPI}_{i-1}}

In other words, 100×β1100 \times \beta_1 represents the percent change in CPI from one month to the next. If we want the percent change for one year (as opposed to one month), we can look at 12×100×β112 \times 100 \times \beta_1. This number can be taken to represent an estimate of the historical inflation rate.

plt.figure(figsize=(6,4))
plt.plot(cpi.index, np.log(cpi['CPIAUCSL']), label='Logarithm of CPI')
plt.xlabel('Year')
plt.ylabel('Log(CPI)')
plt.title('Logarithm of Consumer Price Index (CPI)')
plt.show()
<Figure size 600x400 with 1 Axes>

The process of using the observed data to obtain estimates of the unknown parameters (in this case β0\beta_0 and β1\beta_1) is known as ** Model Fitting **. For the linear regression model, model fitting will be done using the OLS function from the statsmodels library.

import statsmodels.api as sm

We need to input the observed response values y1,,yny_1, \dots, y_n as well as the covariate values x1,,xnx_1, \dots, x_n (remember that we are using xi=ix_i = i).

yvec = np.log(np.array(cpi['CPIAUCSL']))
n = cpi.shape[0]
xvec = np.arange(1, n+1)
linreg = sm.OLS(yvec, xvec).fit()
print(linreg.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.892
Model:                            OLS   Adj. R-squared (uncentered):              0.892
Method:                 Least Squares   F-statistic:                              7771.
Date:                Tue, 02 Sep 2025   Prob (F-statistic):                        0.00
Time:                        16:08:51   Log-Likelihood:                         -1720.7
No. Observations:                 943   AIC:                                      3443.
Df Residuals:                     942   BIC:                                      3448.
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0079   8.97e-05     88.151      0.000       0.008       0.008
==============================================================================
Omnibus:                      148.417   Durbin-Watson:                   0.000
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               37.391
Skew:                          -0.138   Prob(JB):                     7.60e-09
Kurtosis:                       2.064   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In the context of the linear regression model, the parameters β0\beta_0 and β1\beta_1 are also known as the coefficients. The table in the output above reports only one coefficient value. Actually the function sm.OLS(yvec, xvec).fit() is not fitting our intended model yi=β0+β1xi+ϵiy_i = \beta_0 + \beta_1 x_i + \epsilon_i but it is fitting instead yi=β1xi+ϵiy_i = \beta_1 x_i + \epsilon_i (i.e., it is not using β0\beta_0). To enable β0\beta_0, we need to do the following:

X = sm.add_constant(xvec)
linreg2 = sm.OLS(yvec, X).fit()
print(linreg2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.966
Model:                            OLS   Adj. R-squared:                  0.966
Method:                 Least Squares   F-statistic:                 2.695e+04
Date:                Tue, 02 Sep 2025   Prob (F-statistic):               0.00
Time:                        16:10:36   Log-Likelihood:                 385.22
No. Observations:                 943   AIC:                            -766.4
Df Residuals:                     941   BIC:                            -756.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.9859      0.010    284.536      0.000       2.965       3.006
x1             0.0032   1.93e-05    164.171      0.000       0.003       0.003
==============================================================================
Omnibus:                     8570.174   Durbin-Watson:                   0.000
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               72.093
Skew:                          -0.026   Prob(JB):                     2.21e-16
Kurtosis:                       1.646   Cond. No.                     1.09e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.09e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

From this table, we see that the estimate of β0\beta_0 is 2.9859 and the estimate of β1\beta_1 is 0.0032. As per our interpretation, the month-to-month inflation rate is 100×β1100 \times \beta_1 which is 0.32%0.32 \%. If we want the annual inflation rate, we would have to multiply this by 12 which gives:

historical_inflation_rate = 12 * 100 * linreg2.params[1]
print(f'Historical annual inflation rate: {historical_inflation_rate:.3f}%')
Historical annual inflation rate: 3.794%

The regression output also gives an indication of the uncertainty associated with this inflation estimate:

conf_intervals_linreg2 = linreg2.conf_int(alpha=0.05)
print(conf_intervals_linreg2)
print(f'95% confidence interval for historical annual inflation rate (Linear Regression model): [{12*100*conf_intervals_linreg2[1,0]:.3f}%, {   12*100*conf_intervals_linreg2[1,1]:.3f}%]')
[[2.96527443 3.00646247]
 [0.003124   0.00319959]]
95% confidence interval for historical annual inflation rate (Linear Regression model): [3.749%, 3.840%]
plt.figure(figsize=(6,4))
plt.plot(cpi.index, cpi['CPIAUCSL'], label='CPI')
plt.xlabel('Year')
plt.plot(cpi.index, np.exp(linreg2.fittedvalues))
plt.show()
<Figure size 600x400 with 1 Axes>

From this plot, it is clear that this regression model may not do a good job in predicting the values of CPI for the next few months.

plt.figure(figsize=(6,4))
plt.plot(cpi.index, np.log(cpi['CPIAUCSL']), label='Logarithm of CPI')
plt.xlabel('Year')
plt.plot(cpi.index, linreg2.fittedvalues)
plt.show()
<Figure size 600x400 with 1 Axes>

Lagged Regression for CPI

Next let us look at lagged regression, where the covariate is chosen to be xi=yi1x_i = y_{i-1} (i.e., the covariate value for time index i equals the response value for the previous time index).

yreg = yvec[1:]
xreg = yvec[:-1]
Xmat = sm.add_constant(xreg)
armod = sm.OLS(yreg, Xmat).fit()
print(armod.summary())
print(armod.params)
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 6.245e+07
Date:                Tue, 02 Sep 2025   Prob (F-statistic):               0.00
Time:                        21:31:13   Log-Likelihood:                 4019.3
No. Observations:                 942   AIC:                            -8035.
Df Residuals:                     940   BIC:                            -8025.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0039      0.001      6.707      0.000       0.003       0.005
x1             0.9998      0.000   7902.303      0.000       1.000       1.000
==============================================================================
Omnibus:                      129.091   Durbin-Watson:                   0.855
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              640.198
Skew:                           0.519   Prob(JB):                    9.61e-140
Kurtosis:                       6.903   Cond. No.                         24.9
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[0.00387065 0.9997775 ]

Note that the estimated value of β\beta is 0.9998 which is fairly close to 1. Because of this, this AutoRegression model can be treated as approximately trying to fit a constant to the month-to-month inflation rate, as shown below.

The regression model corresponding to the output above is:

logCPIt=β0+β1logCPIt1+ϵt\log \text{CPI}_t = \beta_0 + \beta_1 \log \text{CPI}_{t-1} + \epsilon_t

This fitted coefficients are β^0=0.0039\hat{\beta}_0 = 0.0039 and β^1=0.99981\hat{\beta}_1 = 0.9998 \approx 1. Thus the fitted model is approximately:

logCPIt0.0039+logCPIt1+ϵt,\log \text{CPI}_t \approx 0.0039 + \log \text{CPI}_{t-1} + \epsilon_t,

or equivalently

logCPItlogCPIt10.0039+ϵt,\log \text{CPI}_t - \log \text{CPI}_{t-1} \approx 0.0039 + \epsilon_t,

which is again the same as

100×logCPItCPIt10.39+error100 \times \log \frac{\text{CPI}_t}{\text{CPI}_{t-1}} \approx 0.39 + \text{error}

So this model is giving a historical inflation rate estimate (month-to-month) of 0.39%0.39\% which is equivalent to a historic annual inflation rate estimate of 12×0.3912 \times 0.39:

historical_inflation_rate_armodel = 12 * 100 * armod.params[0] #armod.params[0] is the estimate of beta_0
print(f'Historical annual inflation rate (AR(1) model): {historical_inflation_rate_armodel:.3f}%')
Historical annual inflation rate (AR(1) model): 4.645%

This estimate of 4.645%4.645\% is somewhat higher than the estimate 3.794%3.794\% obtained by the previous regression (where the covariate was time). This increase is the estimate is because of the approximation 0.9998 vs 1. Note that the correct model (without the approximation 0.9998 vs 1) is:

logCPIt0.9998logCPIt1=0.0039+ϵt,\log \text{CPI}_t - 0.9998 * \log \text{CPI}_{t-1} = 0.0039 + \epsilon_t,

which is equivalent to

logCPItCPIt1=0.00390.0002logCPIt1+ϵt,\log \frac{\text{CPI}_t}{\text{CPI}_{t-1}} = 0.0039 - 0.0002 * \log \text{CPI}_{t-1} + \epsilon_t,

The variable 0.0002logCPIt10.0002 * \log \text{CPI}_{t-1} is not exactly negligble; it has a mean of about 0.0008 (see below) which will have an effect of reducing 0.0039 by some amount.

np.mean(0.0002 * yvec[:-1])
0.0008953722086746689

The uncertainty interval associated with the estimate of β0\beta_0 is given by:

# 95% confidence intervals
conf_intervals = armod.conf_int(alpha=0.05)
print(conf_intervals)
[[0.00273806 0.00500324]
 [0.99952921 1.00002579]]

Multiplying the interval above for β0\beta_0 by 1210012 * 100, we get an estimate of the historical inflation rate (with again the caveat that this will be slightly inflated because of the approximation 0.999810.9998 \approx 1)

#So the uncertainty interval for the estimated historical annual inflation rate is: 
print(f'95% confidence interval for historical annual inflation rate (AR(1) model): [{12*100*conf_intervals[0,0]:.3f}%, {12*100*conf_intervals[0,1]:.3f}%]')
95% confidence interval for historical annual inflation rate (AR(1) model): [3.286%, 6.004%]

Compare this interval [3.286,6.004]%[3.286, 6.004] \% (which is much wider) with the previous one [3.749,3.84]%[3.749, 3.84]\%.

To check how well this second model (AutoRegression) fits the data, we can plot the values of 100×logCPItCPIt1100 \times \log \frac{\text{CPI}_t}{\text{CPI}_{t-1}} with time:

#Plot the values of 12 * 100 * log(CPI_t / CPI_{t-1}) with time
inflation_rates = 12 * 100 * (yvec[1:] - yvec[:-1])
dates = cpi.index[1:]   
plt.figure(figsize=(12,7))
plt.plot(dates, inflation_rates, label='Annual Inflation Rate')
plt.xlabel('Year')
plt.ylabel('Inflation Rate (%)')
plt.title('Annual Inflation Rate based on CPI')
plt.axhline(y=historical_inflation_rate_armodel, color='r', linestyle='--', label='Estimated Historical Inflation Rate (AR(1) model)')
plt.legend()
plt.show()
<Figure size 1200x700 with 1 Axes>

The individual inflation rates vary quite widely, so it is natural for the uncertainty interval for the inflation rate to be somewhat wide (unlike the interval that we got from the regression with time model).

Let us plot the mean of the inflation rates. It is below the estimate that we got from the AutoRegression because of the approximation 0.999810.9998 \approx 1.

#Plot the values of 12 * 100 * log(CPI_t / CPI_{t-1}) with time
inflation_rates = 12 * 100 * (np.exp(yvec[1:]) - np.exp(yvec[:-1]))/np.exp(yvec[:-1])
dates = cpi.index[1:]   
plt.figure(figsize=(12,7))
plt.plot(dates, inflation_rates, label='Annual Inflation Rate')
plt.xlabel('Year')
plt.ylabel('Inflation Rate (%)')
plt.title('Annual Inflation Rate based on CPI')
plt.axhline(y=historical_inflation_rate_armodel, color='r', linestyle='--', label='Estimated Historical Inflation Rate (AR(1) model)')
plt.axhline(y=np.mean(inflation_rates), color='b', linestyle='--', label='Mean Historical Inflation Rate')
plt.legend()
plt.show()
<Figure size 1200x700 with 1 Axes>

This was just an illustration of how regressions can be used for time series data. We shall study AutoRegressions in much more detail later.