import numpy as np
import pandas as pd
import matplotlib.pyplot as pltWe shall fit simple linear regression models to the Consumer Price Index (CPI) Data (downloaded from https://
## Inflation (Consumer Price Index)
cpi = pd.read_csv('CPIAUCSL_01September2025.csv')
print(cpi) observation_date CPIAUCSL
0 1947-01-01 21.480
1 1947-02-01 21.620
2 1947-03-01 22.000
3 1947-04-01 22.000
4 1947-05-01 21.950
.. ... ...
938 2025-03-01 319.615
939 2025-04-01 320.321
940 2025-05-01 320.580
941 2025-06-01 321.500
942 2025-07-01 322.132
[943 rows x 2 columns]
cpi['observation_date'] = pd.to_datetime(cpi['observation_date'])
cpi.set_index('observation_date', inplace = True)
print(cpi) CPIAUCSL
observation_date
1947-01-01 21.480
1947-02-01 21.620
1947-03-01 22.000
1947-04-01 22.000
1947-05-01 21.950
... ...
2025-03-01 319.615
2025-04-01 320.321
2025-05-01 320.580
2025-06-01 321.500
2025-07-01 322.132
[943 rows x 1 columns]
plt.figure(figsize=(8,6))
plt.plot(cpi.index, cpi['CPIAUCSL'], label='CPI')
plt.xlabel('Year')
plt.ylabel('CPI')
plt.title('Consumer Price Index (CPI)')
plt.show()
CPI Regression with Time as Covariate¶
We will fit a linear regression model to the CPI data, with time as the covariate. The simplest model is:
where denotes the CPI for time index , and . The interpretation of is that it represents the numerical change in CPI for one month to the next. This is not a good model for CPI, because the relationship between CPI and data does not appear to be linear. Also CPI increases are usually reported in percentages (e.g., inflation is now at 2.8% etc.), and not in raw numerical increase. A better linear regression model here would use as the logarithm of CPI for time index :
where as before. Now the interpretation of is that it represents the numerical change in for one month to the next. Note that
In other words, represents the percent change in CPI from one month to the next. If we want the percent change for one year (as opposed to one month), we can look at . This number can be taken to represent an estimate of the historical inflation rate.
plt.figure(figsize=(6,4))
plt.plot(cpi.index, np.log(cpi['CPIAUCSL']), label='Logarithm of CPI')
plt.xlabel('Year')
plt.ylabel('Log(CPI)')
plt.title('Logarithm of Consumer Price Index (CPI)')
plt.show()
The process of using the observed data to obtain estimates of the unknown parameters (in this case and ) is known as ** Model Fitting **. For the linear regression model, model fitting will be done using the OLS function from the statsmodels library.
import statsmodels.api as smWe need to input the observed response values as well as the covariate values (remember that we are using ).
yvec = np.log(np.array(cpi['CPIAUCSL']))
n = cpi.shape[0]
xvec = np.arange(1, n+1)
linreg = sm.OLS(yvec, xvec).fit()
print(linreg.summary()) OLS Regression Results
=======================================================================================
Dep. Variable: y R-squared (uncentered): 0.892
Model: OLS Adj. R-squared (uncentered): 0.892
Method: Least Squares F-statistic: 7771.
Date: Tue, 02 Sep 2025 Prob (F-statistic): 0.00
Time: 16:08:51 Log-Likelihood: -1720.7
No. Observations: 943 AIC: 3443.
Df Residuals: 942 BIC: 3448.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.0079 8.97e-05 88.151 0.000 0.008 0.008
==============================================================================
Omnibus: 148.417 Durbin-Watson: 0.000
Prob(Omnibus): 0.000 Jarque-Bera (JB): 37.391
Skew: -0.138 Prob(JB): 7.60e-09
Kurtosis: 2.064 Cond. No. 1.00
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In the context of the linear regression model, the parameters and are also known as the coefficients. The table in the output above reports only one coefficient value. Actually the function sm.OLS(yvec, xvec).fit() is not fitting our intended model but it is fitting instead (i.e., it is not using ). To enable , we need to do the following:
X = sm.add_constant(xvec)
linreg2 = sm.OLS(yvec, X).fit()
print(linreg2.summary()) OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.966
Model: OLS Adj. R-squared: 0.966
Method: Least Squares F-statistic: 2.695e+04
Date: Tue, 02 Sep 2025 Prob (F-statistic): 0.00
Time: 16:10:36 Log-Likelihood: 385.22
No. Observations: 943 AIC: -766.4
Df Residuals: 941 BIC: -756.7
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.9859 0.010 284.536 0.000 2.965 3.006
x1 0.0032 1.93e-05 164.171 0.000 0.003 0.003
==============================================================================
Omnibus: 8570.174 Durbin-Watson: 0.000
Prob(Omnibus): 0.000 Jarque-Bera (JB): 72.093
Skew: -0.026 Prob(JB): 2.21e-16
Kurtosis: 1.646 Cond. No. 1.09e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.09e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
From this table, we see that the estimate of is 2.9859 and the estimate of is 0.0032. As per our interpretation, the month-to-month inflation rate is which is . If we want the annual inflation rate, we would have to multiply this by 12 which gives:
historical_inflation_rate = 12 * 100 * linreg2.params[1]
print(f'Historical annual inflation rate: {historical_inflation_rate:.3f}%')Historical annual inflation rate: 3.794%
The regression output also gives an indication of the uncertainty associated with this inflation estimate:
conf_intervals_linreg2 = linreg2.conf_int(alpha=0.05)
print(conf_intervals_linreg2)
print(f'95% confidence interval for historical annual inflation rate (Linear Regression model): [{12*100*conf_intervals_linreg2[1,0]:.3f}%, { 12*100*conf_intervals_linreg2[1,1]:.3f}%]')[[2.96527443 3.00646247]
[0.003124 0.00319959]]
95% confidence interval for historical annual inflation rate (Linear Regression model): [3.749%, 3.840%]
plt.figure(figsize=(6,4))
plt.plot(cpi.index, cpi['CPIAUCSL'], label='CPI')
plt.xlabel('Year')
plt.plot(cpi.index, np.exp(linreg2.fittedvalues))
plt.show()
From this plot, it is clear that this regression model may not do a good job in predicting the values of CPI for the next few months.
plt.figure(figsize=(6,4))
plt.plot(cpi.index, np.log(cpi['CPIAUCSL']), label='Logarithm of CPI')
plt.xlabel('Year')
plt.plot(cpi.index, linreg2.fittedvalues)
plt.show()
Lagged Regression for CPI¶
Next let us look at lagged regression, where the covariate is chosen to be (i.e., the covariate value for time index i equals the response value for the previous time index).
yreg = yvec[1:]
xreg = yvec[:-1]
Xmat = sm.add_constant(xreg)
armod = sm.OLS(yreg, Xmat).fit()
print(armod.summary())
print(armod.params) OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 6.245e+07
Date: Tue, 02 Sep 2025 Prob (F-statistic): 0.00
Time: 21:31:13 Log-Likelihood: 4019.3
No. Observations: 942 AIC: -8035.
Df Residuals: 940 BIC: -8025.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0039 0.001 6.707 0.000 0.003 0.005
x1 0.9998 0.000 7902.303 0.000 1.000 1.000
==============================================================================
Omnibus: 129.091 Durbin-Watson: 0.855
Prob(Omnibus): 0.000 Jarque-Bera (JB): 640.198
Skew: 0.519 Prob(JB): 9.61e-140
Kurtosis: 6.903 Cond. No. 24.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[0.00387065 0.9997775 ]
Note that the estimated value of is 0.9998 which is fairly close to 1. Because of this, this AutoRegression model can be treated as approximately trying to fit a constant to the month-to-month inflation rate, as shown below.
The regression model corresponding to the output above is:
This fitted coefficients are and . Thus the fitted model is approximately:
or equivalently
which is again the same as
So this model is giving a historical inflation rate estimate (month-to-month) of which is equivalent to a historic annual inflation rate estimate of :
historical_inflation_rate_armodel = 12 * 100 * armod.params[0] #armod.params[0] is the estimate of beta_0
print(f'Historical annual inflation rate (AR(1) model): {historical_inflation_rate_armodel:.3f}%')Historical annual inflation rate (AR(1) model): 4.645%
This estimate of is somewhat higher than the estimate obtained by the previous regression (where the covariate was time). This increase is the estimate is because of the approximation 0.9998 vs 1. Note that the correct model (without the approximation 0.9998 vs 1) is:
which is equivalent to
The variable is not exactly negligble; it has a mean of about 0.0008 (see below) which will have an effect of reducing 0.0039 by some amount.
np.mean(0.0002 * yvec[:-1])0.0008953722086746689The uncertainty interval associated with the estimate of is given by:
# 95% confidence intervals
conf_intervals = armod.conf_int(alpha=0.05)
print(conf_intervals)[[0.00273806 0.00500324]
[0.99952921 1.00002579]]
Multiplying the interval above for by , we get an estimate of the historical inflation rate (with again the caveat that this will be slightly inflated because of the approximation )
#So the uncertainty interval for the estimated historical annual inflation rate is:
print(f'95% confidence interval for historical annual inflation rate (AR(1) model): [{12*100*conf_intervals[0,0]:.3f}%, {12*100*conf_intervals[0,1]:.3f}%]')95% confidence interval for historical annual inflation rate (AR(1) model): [3.286%, 6.004%]
Compare this interval (which is much wider) with the previous one .
To check how well this second model (AutoRegression) fits the data, we can plot the values of with time:
#Plot the values of 12 * 100 * log(CPI_t / CPI_{t-1}) with time
inflation_rates = 12 * 100 * (yvec[1:] - yvec[:-1])
dates = cpi.index[1:]
plt.figure(figsize=(12,7))
plt.plot(dates, inflation_rates, label='Annual Inflation Rate')
plt.xlabel('Year')
plt.ylabel('Inflation Rate (%)')
plt.title('Annual Inflation Rate based on CPI')
plt.axhline(y=historical_inflation_rate_armodel, color='r', linestyle='--', label='Estimated Historical Inflation Rate (AR(1) model)')
plt.legend()
plt.show()
The individual inflation rates vary quite widely, so it is natural for the uncertainty interval for the inflation rate to be somewhat wide (unlike the interval that we got from the regression with time model).
Let us plot the mean of the inflation rates. It is below the estimate that we got from the AutoRegression because of the approximation .
#Plot the values of 12 * 100 * log(CPI_t / CPI_{t-1}) with time
inflation_rates = 12 * 100 * (np.exp(yvec[1:]) - np.exp(yvec[:-1]))/np.exp(yvec[:-1])
dates = cpi.index[1:]
plt.figure(figsize=(12,7))
plt.plot(dates, inflation_rates, label='Annual Inflation Rate')
plt.xlabel('Year')
plt.ylabel('Inflation Rate (%)')
plt.title('Annual Inflation Rate based on CPI')
plt.axhline(y=historical_inflation_rate_armodel, color='r', linestyle='--', label='Estimated Historical Inflation Rate (AR(1) model)')
plt.axhline(y=np.mean(inflation_rates), color='b', linestyle='--', label='Mean Historical Inflation Rate')
plt.legend()
plt.show()
This was just an illustration of how regressions can be used for time series data. We shall study AutoRegressions in much more detail later.