Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import cvxpy as cp

Let us consider the temperature anomalies dataset that we used in the last couple of lectures.

temp_jan = pd.read_csv('TempAnomalies_January.csv', skiprows=4)
print(temp_jan.head())
y = temp_jan['Anomaly']
plt.plot(y)
plt.xlabel('year')
plt.ylabel('Celsius')
plt.title('Temperature anomalies (from 1901-2000 average) for January')
plt.show()
   Year  Anomaly
0  1850    -0.46
1  1851    -0.17
2  1852    -0.02
3  1853    -0.12
4  1854    -0.28
<Figure size 640x480 with 1 Axes>

We will fit our high-dimensional regression model:

yt=β0+β1(t1)+β2(t2)+++βn1(t(n1))++ϵt\begin{align*} y_t = \beta_0 + \beta_1 (t - 1) + \beta_2 (t - 2)_+ + \dots + \beta_{n-1}(t - (n-1))_+ + \epsilon_t \end{align*}

to this dataset. This can be written in the usual regression form as y=Xβ+ϵy = X \beta + \epsilon with the XX matrix calculated as follows.

n = len(y)
x = np.arange(1, n+1)
X = np.column_stack([np.ones(n), x-1])
for i in range(n-2):
    c = i+2
    xc = ((x > c).astype(float))*(x-c)
    X = np.column_stack([X, xc])
print(X)
[[  1.   0.  -0. ...  -0.  -0.  -0.]
 [  1.   1.   0. ...  -0.  -0.  -0.]
 [  1.   2.   1. ...  -0.  -0.  -0.]
 ...
 [  1. 173. 172. ...   1.   0.  -0.]
 [  1. 174. 173. ...   2.   1.   0.]
 [  1. 175. 174. ...   3.   2.   1.]]

Quick Recap: Ridge Regularization (Lecture 11)

The estimator is given by the minimizer of:

yXβ2+λj=2n1βj2\begin{align*} \|y - X \beta\|^2 + \lambda \sum_{j=2}^{n-1} \beta_j^2 \end{align*}

where λ\lambda is the tuning parameter. Small λ\lambda leads to fitted values close to the data (overfitting) and large λ\lambda leads to fitted values coming from a straight line (underfitting). We used the following code for computing the ridge estimator for fixed λ\lambda (note that, instead of using this code, we can also use the formula derived in Lecture 12: (XTX+λJ)1XTy(X^T X + \lambda J)^{-1} X^T y to compute the ridge estimator).

def solve_ridge(X, y, lambda_val, penalty_start=2):
    n, p = X.shape
    
    # Define variable
    beta = cp.Variable(p)
    
    # Define objective
    loss = cp.sum_squares(X @ beta - y)
    reg = lambda_val * cp.sum_squares(beta[penalty_start:])
    objective = cp.Minimize(loss + reg)
    
    # Solve problem
    prob = cp.Problem(objective)
    prob.solve()
    
    return beta.value
b_ridge = solve_ridge(X, y, lambda_val = 1e4) #also try 1e-4, 1e8 etc.
plt.plot(y)
ridge_fitted = np.dot(X, b_ridge)
plt.plot(ridge_fitted, color = 'red')
plt.show()
<Figure size 640x480 with 1 Axes>

To choose λ\lambda, we use cross-validation (CV). Here is the code for CV.

def ridge_cv(X, y, lambda_candidates):
    n = len(y)
    k = 5
    fold_size = n // k
    folds = []

    for i in range(k):
        start = i * fold_size
        end = (i + 1) * fold_size if i < k - 1 else n  # last fold includes remainder
        test_indices = np.arange(start, end)
        train_indices = np.concatenate([np.arange(0, start), np.arange(end, n)])
        folds.append((train_indices, test_indices))

    cv_errors = {lamb: 0 for lamb in lambda_candidates}

    for train_index, test_index in folds:
        X_train = X[train_index]
        X_test = X[test_index]
        y_train = y[train_index]
        y_test = y[test_index]

        for lamb in lambda_candidates:
            beta = solve_ridge(X_train, y_train, lambda_val=lamb)
            y_pred = np.dot(X_test, beta)
            squared_errors = (y_test - y_pred) ** 2
            cv_errors[lamb] += np.sum(squared_errors)

    for lamb in lambda_candidates:
        cv_errors[lamb] /= n

    best_lambda = min(cv_errors, key=cv_errors.get)
    return best_lambda, cv_errors
lambda_candidates = np.array([1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8])

best_lambda, cv_errors = ridge_cv(X, y, lambda_candidates)
print(best_lambda)
for lamb, error in sorted(cv_errors.items()):
    print(f"Lambda = {lamb:.2f}, CV Error = {error:.6f}")
1000.0
Lambda = 0.01, CV Error = 20.473188
Lambda = 0.10, CV Error = 11.850955
Lambda = 1.00, CV Error = 2.261767
Lambda = 10.00, CV Error = 0.513626
Lambda = 100.00, CV Error = 0.126082
Lambda = 1000.00, CV Error = 0.044625
Lambda = 10000.00, CV Error = 0.048169
Lambda = 100000.00, CV Error = 0.073446
Lambda = 1000000.00, CV Error = 0.105209
Lambda = 10000000.00, CV Error = 0.155146
Lambda = 100000000.00, CV Error = 0.167589

Below we calculate the ridge estimator for λ\lambda chosen to be the best λ\lambda given by CV:

b_ridge = solve_ridge(X, y, lambda_val = best_lambda) 
plt.plot(y)
ridge_fitted = np.dot(X, b_ridge)
plt.plot(ridge_fitted, color = 'red')
plt.show()
<Figure size 640x480 with 1 Axes>

Bayesian regularization (from Lecture 12)

In the last lecture, we worked with the following prior (as a Bayesian method for regularization):

β0,β1i.i.dunif(C,C)   β2,,βn1i.i.dN(0,τ2),\begin{align*} \beta_0, \beta_1 \overset{\text{i.i.d}}{\sim} \text{unif}(-C, C) ~~~ \beta_2, \dots, \beta_{n-1} \overset{\text{i.i.d}}{\sim} N(0, \tau^2), \end{align*}

We also treat τ\tau (and σ\sigma) as unknown parameters and used the prior:

logτ,logσi.i.dunif(C,C).\begin{align*} \log \tau, \log \sigma \overset{\text{i.i.d}}{\sim} \text{unif}(-C, C). \end{align*}

With this prior, the posterior for β,τ,σ\beta, \tau, \sigma can be described as follows. The posterior of β\beta conditional on σ\sigma and τ\tau is given by:

βdata,σ,τN((XTXσ2+Q1)1XTyσ2,(XTXσ2+Q1)1).\begin{align*} \beta \mid \text{data}, \sigma, \tau \sim N \left(\left(\frac{X^T X}{\sigma^2} + Q^{-1} \right)^{-1} \frac{X^T y}{\sigma^2}, \left(\frac{X^T X}{\sigma^2} + Q^{-1} \right)^{-1}\right). \end{align*}

where QQ is the diagonal matrix with diagonal entries C,C,τ2,,τ2C, C, \tau^2, \dots, \tau^2. Further the posterior of τ\tau and σ\sigma is given by:

fτ,σdata(τ,σ)σn1τ1detQdet(XTXσ2+Q1)1exp(yTy2σ2)exp(yTX2σ2(XTXσ2+Q1)1XTyσ2).\begin{align*} & f_{\tau, \sigma \mid \text{data}}(\tau, \sigma) \\ &\propto \frac{\sigma^{-n-1} \tau^{-1}}{\sqrt{\det Q}} \sqrt{\det \left(\frac{X^T X}{\sigma^2} + Q^{-1} \right)^{-1}} \exp \left(-\frac{y^T y}{2 \sigma^2} \right)\exp \left(\frac{y^T X}{2\sigma^2} \left(\frac{X^T X}{\sigma^2} + Q^{-1} \right)^{-1} \frac{X^T y}{\sigma^2} \right). \end{align*}

We can simplify this formula slightly in order to avoid specifying CC explicitly using

detQ=C×C×τ2××τ2=C2τ2(n2)τ2(n2).\begin{align*} \det Q = C \times C \times \tau^2 \times \dots \times \tau^2 = C^2 \tau^{2(n-2)} \propto \tau^{2(n-2)}. \end{align*}

Also

Q1=diag(1/C,1/C,1/τ2,,1/τ2)diag(0,0,1/τ2,,1/τ2)=Jτ2\begin{align*} Q^{-1} = \text{diag}(1/C, 1/C, 1/\tau^2, \dots, 1/\tau^2) \approx \text{diag}(0, 0, 1/\tau^2, \dots, 1/\tau^2) = \frac{J}{\tau^2} \end{align*}

where JJ is the diagonal matrix with diagonals 0,0,1,,10, 0, 1, \dots, 1.

Let Qapprox1Q^{-1}_{\text{approx}} be the above diagonal matrix with diagonal entries 0,0,1/τ2,,1/τ20, 0, 1/\tau^2, \dots, 1/\tau^2. We shall use Qapprox1Q^{-1}_{\text{approx}} as our proxy for Q1Q^{-1}. Note that Qapprox1Q^{-1}_{\text{approx}} does not involve any specification of CC. With this, we rewrite the posterior of τ,σ\tau, \sigma as:

fτ,σdata(τ,σ)σn1τn+1det(XTXσ2+Jτ2)1exp(yTy2σ2)exp(yTX2σ2(XTXσ2+Jτ2)1XTyσ2).\begin{align*} & f_{\tau, \sigma \mid \text{data}}(\tau, \sigma) \\ &\propto \sigma^{-n-1} \tau^{-n+1} \sqrt{\det \left(\frac{X^T X}{\sigma^2} + \frac{J}{\tau^2} \right)^{-1}} \exp \left(-\frac{y^T y}{2 \sigma^2} \right)\exp \left(\frac{y^T X}{2\sigma^2} \left(\frac{X^T X}{\sigma^2} + \frac{J}{\tau^2} \right)^{-1} \frac{X^T y}{\sigma^2} \right). \end{align*}

Below we compute this posterior on log-scale for each value in a grid chosen for τ\tau and σ\sigma.

tau_gr = np.logspace(np.log10(0.0001), np.log10(1), 100)
sig_gr = np.logspace(np.log10(0.1), np.log10(1), 100)

t, s = np.meshgrid(tau_gr, sig_gr)

g = pd.DataFrame({'tau': t.flatten(), 'sig': s.flatten()})

for i in range(len(g)):
    tau = g.loc[i, 'tau']
    sig = g.loc[i, 'sig']
    J_by_tausq = np.diag(np.concatenate([[0, 0], np.repeat(tau**(-2), n-2)]))
    Mat = J_by_tausq + (X.T @ X)/(sig ** 2)
    Matinv = np.linalg.inv(Mat)
    sgn, logcovdet = np.linalg.slogdet(Matinv)
    g.loc[i, 'logpost'] = (-n-1)*np.log(sig) + (-n+1)*np.log(tau) + 0.5 * logcovdet - ((np.sum(y ** 2))/(2*(sig ** 2))) + (y.T @ X @ Matinv @ X.T @ y)/(2 * (sig ** 4))

The posterior samples for all the parameters can be generated as follows.

g['post'] = np.exp(g['logpost'] - np.max(g['logpost']))
g['post'] = g['post']/np.sum(g['post'])
N = 1000
samples = g.sample(N, weights = g['post'], replace = True)
tau_samples = np.array(samples.iloc[:,0])
sig_samples = np.array(samples.iloc[:,1])
betahats = np.zeros((n, N))
muhats = np.zeros((n, N))
for i in range(N):
    tau = tau_samples[i]
    sig = sig_samples[i]
    J_by_tausq = np.diag(np.concatenate([[0, 0], np.repeat(tau**(-2), n-2)]))
    XTX = np.dot(X.T, X)
    TempMat = np.linalg.inv(J_by_tausq + (XTX/(sig ** 2)))
    XTy = np.dot(X.T, y)
    #generate betahat from the normal distribution with mean: 
    norm_mean = np.dot(TempMat, XTy/(sig ** 2)) 
    #and covariance matrix: 
    norm_cov = TempMat  
    betahat = np.random.multivariate_normal(norm_mean, norm_cov)
    muhat = np.dot(X, betahat)
    betahats[:,i] = betahat
    muhats[:,i] = muhat
beta_est = np.mean(betahats, axis = 1)
mu_est = np.mean(muhats, axis = 1) #these are the fitted values

plt.plot(y, label = 'Data')
plt.plot(mu_est, color = 'black', label = 'Posterior Mean')
plt.legend()
plt.show()
<Figure size 640x480 with 1 Axes>
#Plotting all the posterior fitted values: 
plt.plot(y, label = 'Data')
for i in range(N):
    plt.plot(muhats[:,i], color = 'red')
plt.plot(mu_est, color = 'black', label = 'Posterior Mean')
plt.legend()
plt.show()
<Figure size 640x480 with 1 Axes>

The posterior samples of τ\tau and σ\sigma can be visualized by their histograms.

# First histogram: tau_samples
plt.subplot(1, 2, 1)
plt.hist(tau_samples, bins=30, color='skyblue', edgecolor='black')
plt.title('Histogram of tau samples')
plt.xlabel('tau')
plt.ylabel('Frequency')

# Second histogram: sig_samples
plt.subplot(1, 2, 2)
plt.hist(sig_samples, bins=30, color='lightcoral', edgecolor='black')
plt.title('Histogram of sigma samples')
plt.xlabel('sigma')
plt.ylabel('Frequency')

# Adjust layout and show both side by side
plt.tight_layout()
plt.show()
<Figure size 640x480 with 2 Axes>

Summary statistics of the τ\tau and σ\sigma samples can be obtained as follows.

df_tau_sigma = pd.DataFrame({
    'tau_samples': tau_samples,
    'sig_samples': sig_samples
})

# Summary statistics
print(df_tau_sigma.describe())
       tau_samples  sig_samples
count  1000.000000  1000.000000
mean      0.002407     0.173061
std       0.000984     0.009609
min       0.000705     0.148497
25%       0.001789     0.166810
50%       0.002154     0.170735
75%       0.002848     0.178865
max       0.010476     0.205651

Bayesian Regularization (with a slightly different prior)

Now we work with the following prior:

β0,β1i.i.dunif(C,C)   β2,,βn1i.i.dN(0,γ2σ2)\begin{align*} \beta_0, \beta_1 \overset{\text{i.i.d}}{\sim} \text{unif}(-C, C) ~~~ \beta_2, \dots, \beta_{n-1} \overset{\text{i.i.d}}{\sim} N(0, \gamma^2 \sigma^2) \end{align*}

We treat γ\gamma and σ\sigma also as unknown parameters and assign the prior:

logγ,logσi.i.dunif(C,C).\begin{align*} \log \gamma, \log \sigma \overset{\text{i.i.d}}{\sim} \text{unif}(-C, C). \end{align*}

The only difference between this prior and the previous prior is that we are now parametrizing in terms of (γ,σ)(\gamma, \sigma) as opposed to (τ,σ)(\tau, \sigma) (the connection is τ=σγ\tau = \sigma \gamma). This is strictly a different prior however because under the assumption logγ,logσi.i.dunif(C,C)\log \gamma, \log \sigma \overset{\text{i.i.d}}{\sim} \text{unif}(-C, C), the parameters γ×σ\gamma \times \sigma and σ\sigma become dependent (under the prior) but the previous model assumes prior independence of τ\tau and σ\sigma.

The posterior of γ,σ,β\gamma, \sigma, \beta can be described as follows.

The posterior of β\beta conditional on σ\sigma and γ\gamma is given by:

βdata,σ,τN((XTXσ2+Q1)1XTyσ2,(XTXσ2+Q1)1).\begin{align*} \beta \mid \text{data}, \sigma, \tau \sim N \left(\left(\frac{X^T X}{\sigma^2} + Q^{-1} \right)^{-1} \frac{X^T y}{\sigma^2}, \left(\frac{X^T X}{\sigma^2} + Q^{-1} \right)^{-1}\right). \end{align*}

where QQ is the diagonal matrix with diagonal entries C,C,γ2σ2,,γ2σ2C, C, \gamma^2\sigma^2, \dots, \gamma^2\sigma^2. This is the same as before. We shall again use the approximation Q1J/τ2=J/(γσ)2Q^{-1} \approx J/\tau^2 = J/(\gamma \sigma)^2.

We next describe the posterior of γ\gamma and σ\sigma. Unlike the case of the previous prior, the posterior of σ\sigma conditional on γ\gamma can be described in closed form as:

1σ2data,γGamma(n21,yTyyTX(XTX+γ2J)1XTy2)\frac{1}{\sigma^2} \mid \text{data}, \gamma \sim \text{Gamma} \left(\frac{n}{2} - 1, \frac{y^T y - y^T X (X^T X + \gamma^{-2} J)^{-1} X^T y}{2} \right)

Finally the posterior of γ\gamma is given by:

γdataγn+1det(XTX+γ2J)1(1yTyyTX(XTX+γ2J)1XTy)(n/2)1\begin{align*} \gamma \mid \text{data} \sim \gamma^{-n+1} \sqrt{\det (X^T X + \gamma^{-2} J)^{-1}} \left(\frac{1}{y^T y - y^T X (X^T X + \gamma^{-2} J)^{-1} X^T y} \right)^{(n/2) - 1} \end{align*}

We can compute this posterior on a grid of γ\gamma values. Now we only need a 1D grid for γ\gamma (unlike the previous model, where we needed two grids for τ\tau and σ\sigma).

gamma_gr = np.logspace(np.log10(1e-6), np.log10(1e4), 1000)
logpost_gamma = np.zeros(len(gamma_gr))

for i in range(len(gamma_gr)):
    gamma = gamma_gr[i]
    J_by_gammasq = np.diag(np.concatenate([[0, 0], np.repeat(gamma**(-2), n-2)])) 
    Mat =  X.T @ X + J_by_gammasq
    Matinv = np.linalg.inv(Mat)
    sgn, logcovdet = np.linalg.slogdet(Matinv)
    logpost_gamma[i] =(-n+1)*np.log(gamma) + 0.5 * logcovdet - (n/2 - 1)*np.log(y.T @ y - y.T @ X @ Matinv @ X.T @ y)
post_gamma  = np.exp(logpost_gamma - np.max(logpost_gamma))
post_gamma = post_gamma/np.sum(post_gamma)
postmean_gamma = np.sum(gamma_gr * post_gamma)
print(postmean_gamma)
print(1/(postmean_gamma ** 2))
0.013881161336822876
5189.773404602882

The parameter γ\gamma is nicely connected to the tuning parameter λ\lambda used in ridge regression. The connection is given by γ=1/λ\gamma = 1/\sqrt{\lambda} or λ=1/γ2\lambda = 1/\gamma^2. This is because λ=σ2/τ2\lambda = \sigma^2/\tau^2 (as we saw in Lecture 12) and τ=γσ\tau = \gamma \sigma.

b_ridge = solve_ridge(X, y, lambda_val = 1/(postmean_gamma ** 2))
plt.plot(y)
ridge_fitted = np.dot(X, b_ridge)
plt.plot(ridge_fitted, color = 'red')
plt.show()
<Figure size 640x480 with 1 Axes>

Let us draw posterior samples for all the parameters γ,σ,β\gamma, \sigma, \beta.

N = 1000
gamma_samples = np.random.choice(gamma_gr, size=N, p=post_gamma, replace=True)
plt.hist(gamma_samples, bins=30, color='lightgreen', edgecolor='black')
plt.title('Histogram of gamma samples')
plt.xlabel('gamma')
plt.ylabel('Frequency')
plt.show()
<Figure size 640x480 with 1 Axes>

Because of the connection between γ\gamma and λ\lambda: λ=1/γ2\lambda = 1/\gamma^2, we can look at the histogram of values of λ\lambda:

lambda_samples = 1/(gamma_samples ** 2)
plt.hist(lambda_samples, bins=100, color='lightgreen', edgecolor='black')
plt.title('Histogram of lambda samples')
plt.xlabel('lambda')
plt.ylabel('Frequency')
plt.show()
<Figure size 640x480 with 1 Axes>

Here is the code for generating the samples from the other parameters.

for i in range(N):
    gamma = gamma_samples[i]
    J_by_gammasq = np.diag(np.concatenate([[0, 0], np.repeat(gamma**(-2), n-2)])) 
    Mat =  X.T @ X + J_by_gammasq
    Matinv = np.linalg.inv(Mat)
    gamma_dist_lambda_parameter = (y.T @ y - y.T @ X @ Matinv @ X.T @ y)/2
    gamma_dist_alpha_parameter = n/2 - 1
    sig = np.sqrt(1/np.random.gamma(gamma_dist_alpha_parameter, 1/gamma_dist_lambda_parameter))
    sig_samples[i] = sig
    XTX = np.dot(X.T, X)
    TempMat = np.linalg.inv((J_by_gammasq/(sig ** 2)) + (XTX/(sig ** 2)))
    XTy = np.dot(X.T, y)
    #generate betahat from the normal distribution with mean: 
    norm_mean = np.dot(TempMat, XTy/(sig ** 2)) 
    #and covariance matrix: 
    norm_cov = TempMat  
    betahat = np.random.multivariate_normal(norm_mean, norm_cov)
    muhat = np.dot(X, betahat)
    betahats[:,i] = betahat
    muhats[:,i] = muhat
    
beta_est = np.mean(betahats, axis = 1)
mu_est = np.mean(muhats, axis = 1) #these are the fitted values

plt.plot(y, label = 'Data')
plt.plot(mu_est, color = 'black', label = 'Posterior Mean')
plt.legend()
plt.show()
<Figure size 640x480 with 1 Axes>
#Plotting all the posterior fitted values: 
plt.plot(y, label = 'Data')
for i in range(N):
    plt.plot(muhats[:,i], color = 'lightcoral')
plt.plot(mu_est, color = 'black', label = 'Posterior Mean')
plt.legend()
plt.show()
<Figure size 640x480 with 1 Axes>

The following is the histogram of σ\sigma values.

plt.hist(sig_samples, bins=30, color='lightgreen', edgecolor='black')
plt.title('Histogram of sigma samples')
plt.xlabel('sigma')
plt.ylabel('Frequency')
plt.show()
df_gamma_sigma = pd.DataFrame({
    'gamma_samples': gamma_samples,
    'sig_samples': sig_samples
})

# Summary statistics
print(df_gamma_sigma.describe())

Variance Models

Next we shall study variance models (mainly in the context of spectral analysis). To illustrate the main ideas, we first consider the following simple model:

ytindependentN(0,τt2)y_t \overset{\text{independent}}{\sim} N(0, \tau_t^2)

We discuss estimation of τ1,,τn\tau_1, \dots, \tau_n under the assumption that they are smooth in some sense. We shall discuss estimation next week. For now, let us simulate some data from this model to see how they look like.

Below are two simulation settings for this variance model.

Simulation 1

#Simulate data from this variance model:
n = 400
tvals = np.arange(1, n+1)
th = -0.8
tau_t = np.sqrt((1 + (th ** 2) + 2*th*np.cos(2 * np.pi * (tvals)/n))) #this is a smooth function of t
y = rng.normal(loc = 0, scale = tau_t)
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.show()
<Figure size 1200x600 with 1 Axes>

Simulation 2

Here is the second simulation setting for this model. We take the following smooth function αt\alpha_t and then generate τt\tau_t as exp(αt)\exp(\alpha_t).

def smoothfun(x):
    ans = np.sin(15*x) + np.exp(-(x ** 2)/2) + 0.5*((x - 0.5) ** 2) + 2*np.log(x + 0.1)
    return ans

n = 2000
xx = np.linspace(0, 1, n)
alpha_true = np.array([smoothfun(x) for x in xx])
plt.figure(figsize = (12, 6))
plt.plot(alpha_true)
plt.show()
<Figure size 1200x600 with 1 Axes>
tau_t = np.exp(alpha_true)
y = rng.normal(loc = 0, scale = tau_t)
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.show()
<Figure size 1200x600 with 1 Axes>

Here are two common features about these simulated datasets:

  1. The data y1,,yny_1, \dots, y_n would oscillate around zero, with clusters of small values when τt\tau_t is small and bursts of large values when τt\tau_t is large.

  2. Because logτt=αt\log \tau_t = \alpha_t is smooth, the standard deviation τt\tau_t changes gradually, not abruptly -- so the data would exhibit smooth heteroscedasticity: periods of calm and periods of volatility, but with slow transitions.

There exist real datasets (particularly from finance) which also display these characteristics. Below is one example.

A real dataset from finance for which this variance model is applicable

Next, we examine a real dataset where this variance model may be applicable. The dataset consists of stock price data for the S&P 500 mutual fund, downloaded from Yahoo Finance using the yfinance library.

import yfinance as yf
# Download S&P 500 data
sp500 = yf.download('^GSPC', start='2000-01-01', end='2024-01-01', auto_adjust=True)

sp500_closeprice = sp500['Close'].to_numpy().flatten() #this is the daily closing price of the S&P 500 index

plt.figure(figsize=(12,6))
plt.plot(sp500_closeprice, label="Price")
plt.xlabel("Date")
plt.title("S&P 500 daily closing price")
plt.ylabel("Price (dollar)")
plt.legend()
plt.show()
[*********************100%***********************]  1 of 1 completed
<Figure size 1200x600 with 1 Axes>

Instead of working with the prices directly, we work with percentage daily returns.

log_prices = np.log(sp500_closeprice)
y = 100 * np.diff(log_prices) #these are the percentage daily returns
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.title('SNP Daily Percentage Returns')
plt.show()
<Figure size 1200x600 with 1 Axes>

This clearly shares features with the simulated datasets.

We shall study estimation of τt\tau_t from these datasets in the next lecture.