import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import statsmodels.api as sm
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.arima.model import ARIMAThe LSTM Unit¶
Before fitting LSTM models, let us first see how the basic LSTM unit in PyTorch (nn.LSTM) works (see \url{https://
lstm_net = nn.LSTM(input_size = 1, hidden_size = 5, batch_first = True) The above code line will create a LSTM unit and will randomly initialize all its parameters and . Given a sequence for some (here each needs to be of dimension input_size), the LSTM unit will then use the formulas to compute . It will output as well as .
If we create two sets of input sequences and , then the LSTM unit will use its formulae to output (as well as ) corresponding to , as well as , as well as corresponding to . In general, if we send in input sequences (i.e., batches of input sequences), each sequence having length (and each in each sequence has dimension input_size), the output of the LSTM will correspond to sequences of length (each element of the sequence has dimension equal to hidden_size). The inputs and outputs can therefore both be treated as tensors. The ‘batch_first = True’ in the specification of lstm_net above indicates that the input tensor should have shape (B, T, input_size) and the output tensor will have shape (B, T, hidden_size). We will only use .
#Let us create an input tensor for lstm_net:
input = torch.randn(1, 10, 1)
#this input has one batch, which is a sequence x_1, \dots, x_10 of length 10. Each x_t is a scalar (input_size = 1).
print(input)
output, (hn, cn) = lstm_net(input)
#output will have shape (1, 10, 5). It is a simply the sequence h_1, \dots, h_10 where each h_t is of dimension hidden_size = 5.
print(output.shape)
print(hn.shape) #h_n is simply the hidden vector h_t corresponding to the last time (here t = 10). You can check that hn is identical to the last element of the output
print(cn.shape) #c_n is the cell state corresponding to the last output
print(output[:, 9, :])
print(hn)
#check that hn and output[:, 9, :] are identicaltensor([[[-0.6010],
[ 0.5618],
[-1.5536],
[-0.8574],
[ 0.0943],
[ 0.6305],
[ 0.8702],
[ 0.6343],
[-0.4735],
[ 1.1954]]])
torch.Size([1, 10, 5])
torch.Size([1, 1, 5])
torch.Size([1, 1, 5])
tensor([[ 0.0266, -0.0024, -0.1292, 0.0224, 0.0120]],
grad_fn=<SliceBackward0>)
tensor([[[ 0.0266, -0.0024, -0.1292, 0.0224, 0.0120]]],
grad_fn=<StackBackward0>)
The weight matrices as well as , and the biases as well as can be accessed as follows.
print(lstm_net.weight_ih_l0.shape)
print(lstm_net.weight_ih_l0) #this contains the four weight matrices W_{ii}, W_{if}, W_{ig}, W_{io}
print(lstm_net.weight_hh_l0.shape)
print(lstm_net.weight_hh_l0) #this contains the four weight matrices W_{hi}, W_{hf}, W_{hg}, W_{ho}
print(lstm_net.bias_ih_l0.shape)
print(lstm_net.bias_ih_l0) #this contains the four biases b_{ii}, b_{if}, b_{ig}, b_{io}
print(lstm_net.bias_hh_l0.shape)
print(lstm_net.bias_hh_l0) #this contains the four biases b_{hi}, b_{hf}, b_{hg}, b_{ho}
#l0 here refers to the fact that there is a single LSTM layer. Sometimes, it is common to stack multiple LSTM units, in which case there will be weights and biases for each LSTM. We will only deal with a single LSTM layertorch.Size([20, 1])
Parameter containing:
tensor([[ 0.2603],
[-0.2236],
[ 0.2664],
[ 0.1512],
[-0.2186],
[-0.1205],
[-0.1810],
[ 0.1973],
[-0.1793],
[ 0.1518],
[ 0.0319],
[-0.3571],
[-0.2811],
[-0.3101],
[ 0.2343],
[ 0.0969],
[-0.0112],
[-0.2663],
[-0.4431],
[-0.1583]], requires_grad=True)
torch.Size([20, 5])
Parameter containing:
tensor([[-0.3423, -0.4448, -0.4265, -0.2921, -0.4348],
[ 0.1394, 0.1673, 0.1803, -0.4452, 0.2765],
[ 0.0872, 0.2287, -0.1099, 0.2148, -0.4241],
[ 0.1523, 0.3546, 0.1880, -0.4215, -0.1255],
[-0.0555, 0.2343, -0.1692, 0.0643, -0.1570],
[-0.0710, -0.0089, -0.2568, -0.3104, 0.3692],
[ 0.3757, -0.0135, 0.3871, 0.3446, -0.2817],
[-0.1777, -0.3072, 0.1446, 0.0113, -0.0613],
[ 0.3320, -0.1748, -0.2448, -0.3179, -0.2275],
[ 0.4310, -0.2174, -0.1231, 0.2016, -0.3435],
[-0.2714, 0.3292, -0.1715, -0.2321, -0.0579],
[-0.0637, 0.0817, 0.1640, -0.3448, 0.4064],
[-0.4252, 0.1515, -0.1262, -0.1889, 0.4183],
[-0.4310, 0.0305, 0.2866, 0.3998, -0.3781],
[ 0.0539, 0.4425, -0.4381, 0.1092, -0.2802],
[ 0.0726, -0.4045, 0.2200, -0.0992, 0.2673],
[-0.0067, -0.0637, -0.1086, -0.4095, -0.1900],
[ 0.0149, -0.3460, 0.3925, -0.2909, -0.1742],
[-0.0598, -0.3623, -0.0035, -0.2154, -0.3743],
[ 0.0110, 0.3959, 0.1101, 0.3689, -0.0044]], requires_grad=True)
torch.Size([20])
Parameter containing:
tensor([-0.1561, 0.3604, -0.2381, -0.4137, 0.0979, 0.2229, -0.2673, 0.1574,
-0.1587, -0.0543, -0.0990, -0.0957, -0.3639, 0.1096, -0.1176, 0.3420,
0.0746, -0.3657, -0.3402, -0.2331], requires_grad=True)
torch.Size([20])
Parameter containing:
tensor([ 0.2142, -0.2887, -0.2046, 0.0812, -0.3101, -0.2202, 0.4055, -0.2364,
-0.1490, 0.3198, 0.1002, 0.3510, -0.3940, 0.2235, -0.0684, -0.2865,
-0.1915, -0.4326, 0.3094, -0.0858], requires_grad=True)
The values that we see above for the weights and biases are randomly chosen initial values. Given some data, they will be trained so as to minimize some loss function.
Simulated Dataset One¶
Consider the following simple dataset.
n = 1500
truelag = 85
rng = np.random.default_rng(seed = 0)
sig_noise = 0.2
eps = rng.normal(loc = 0, scale = sig_noise, size = n)
y_sim = np.full(shape = n, fill_value = -999.0)
y_sim[0:truelag] = np.linspace(-1, 1, truelag)
for i in range(truelag, n):
y_sim[i] = y_sim[i - truelag]
y_sim = y_sim + eps
plt.figure(figsize = (12, 6))
plt.plot(y_sim)
plt.show()
We first fit the AR() model. If is taken to be smaller than the true lag which generated the data, the predictions will be quite poor. But the predictions are quite good when is exactly equal to the true lag.
#Let us fit AR(p) with the truelag
#p = 60
p = truelag
ar = AutoReg(y_sim, lags = p).fit()
n = len(y_sim)
tme = range(1, n+1)
n_future = 1000
tme_future = range(n+1, n+n_future+1)
fcast = ar.get_prediction(start = n, end = n+n_future-1).predicted_mean
plt.figure(figsize = (12, 7))
plt.plot(tme, y_sim, label = 'Data')
plt.plot(tme_future, fcast, label = 'Forecast (AR(p))', color = 'green')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
Next we apply LSTM (and RNN). The first step is to create the input and output tensors. We scale the data before fitting models.
mu, sig = y_sim.mean(), y_sim.std()
y_std = (y_sim - mu) / sig
X = torch.tensor(y_std[:-1], dtype=torch.float32)
Y = torch.tensor(y_std[1: ], dtype=torch.float32)
X = X.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
Y = Y.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
#unsqueeze(0) adds a dimension at the front (the batch dimension)
#unsqueeze(-1) adds a dimension at the end (the input dimension of x_t)
seq_len = X.size(1)
print(seq_len)1499
We create the LSTM model class below. It consists of the LSTM unit which outputs (or in our notation) followed by a fully connected layer which outputs . We take (below nh) to be 32. Increasing will make the parameter estimation process slower.
class lstm_net(nn.Module):
def __init__(self, nh):
super().__init__()
self.rnn = nn.LSTM(input_size=1, hidden_size=nh,
batch_first=True)
self.fc = nn.Linear(nh, 1)
def forward(self, x, hc=None):
out, hc = self.rnn(x, hc)
out = self.fc(out)
return out, hc
#lstm_net defined above has an LSTM unit followed by a linear unit ('fc' stands for fully-connected) which converts the output of the LSTM to a scalar (this scalar is mu_t in our notation)
torch.manual_seed(0)
np.random.seed(0)
nh = 32
model = lstm_net(nh)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
Training is done as follows.
n_epochs = 1000
for epoch in range(1, n_epochs + 1):
model.train()
opt.zero_grad()
pred, _ = model(X)
loss = criterion(pred, Y)
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 100/1000 | loss = 0.291070
epoch 200/1000 | loss = 0.196253
epoch 300/1000 | loss = 0.177559
epoch 400/1000 | loss = 0.173790
epoch 500/1000 | loss = 0.169772
epoch 600/1000 | loss = 0.165050
epoch 700/1000 | loss = 0.163995
epoch 800/1000 | loss = 0.158396
epoch 900/1000 | loss = 0.156533
epoch 1000/1000 | loss = 0.165788
Predictions are obtained as follows.
model.eval()
with torch.no_grad():
_, hc = model(X)
preds = np.zeros(n_future, dtype=np.float32)
last_in = torch.tensor([[y_std[-1]]], dtype=torch.float32)
for t in range(n_future):
out, hc = model(last_in.view(1, 1, 1), hc) # reuse hidden state
next_val = out.squeeze().item()
preds[t] = next_val
last_in = torch.tensor([[next_val]], dtype=torch.float32)Predictions are plotted below.
lstm_preds_orig = preds * sig + mu
tme_pred_axis = np.arange(n, n + n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y_sim, lw=2, label="Data")
plt.plot(tme_pred_axis, lstm_preds_orig, lw=2, color="r", label="forecast (LSTM)")
plt.xlabel("Time"); plt.ylabel("Data")
for t in range(0, n + n_future, truelag):
plt.axvline(x=t, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=1, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=-1, linestyle='--', color='gray', linewidth=1)
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
Let us now fit the RNN.
class RNNReg(nn.Module):
def __init__(self, nh):
super().__init__()
self.rnn = nn.RNN(1, nh, nonlinearity="tanh", batch_first=True)
self.fc = nn.Linear(nh, 1)
def forward(self, x, h=None):
out, h = self.rnn(x, h)
return self.fc(out), h
nh = 32
model = RNNReg(nh)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(),lr = 1e-3)n_epochs = 900
for epoch in range(1, n_epochs + 1):
model.train()
opt.zero_grad()
pred, _ = model(X)
loss = criterion(pred, Y)
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 100/900 | loss = 0.293481
epoch 200/900 | loss = 0.271486
epoch 300/900 | loss = 0.263299
epoch 400/900 | loss = 0.222172
epoch 500/900 | loss = 0.199036
epoch 600/900 | loss = 0.189352
epoch 700/900 | loss = 0.188009
epoch 800/900 | loss = 0.179842
epoch 900/900 | loss = 0.176493
Below we obtain predictions for the RNN model.
model.eval()
with torch.no_grad():
_, h = model(X)
preds = np.zeros(n_future, np.float32)
last = torch.tensor([[y_std[-1]]], dtype=torch.float32)
for t in range(n_future):
out, h = model(last.view(1,1,1), h)
preds[t] = out.item()
last = torch.tensor([[preds[t]]], dtype=torch.float32)
rnn_preds_orig = preds * sig + mu
tme_pred_axis = np.arange(n, n + n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y_sim, lw=2, label="Data")
plt.plot(tme_pred_axis, lstm_preds_orig, lw=2, color="red", label="forecast (LSTM)")
plt.plot(tme_pred_axis, rnn_preds_orig, lw=2, color="black", label="forecast (RNN)")
#plt.plot(tme_pred_axis, fcast, lw=2, color="green", label="forecast (AR)")
for t in range(0, n + n_future, truelag):
plt.axvline(x=t, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=1, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=-1, linestyle='--', color='gray', linewidth=1)
plt.xlabel("Time"); plt.ylabel("Data")
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
RNN gives predictions that are similar to the LSTM predictions.
Simulated Dataset Two¶
We now make two changes to the first simulation above. We increase the true lag. We also add noise slightly differently (noise is now added to the equation ).
n = 1500
truelag = 344
rng = np.random.default_rng(seed = 0)
sig_noise = 0.2
eps = rng.normal(loc = 0, scale = sig_noise, size = n)
y_sim = np.full(shape = n, fill_value = -999.0)
y_sim[0:truelag] = np.linspace(-1, 1, truelag)
for i in range(truelag, n):
y_sim[i] = y_sim[i - truelag] + eps[i]
#y_sim = y_sim + eps
plt.figure(figsize = (12, 6))
plt.plot(y_sim)
plt.show()
This is a long range prediction problem. In this example, AR() with taken to be the true lag gives very noisy predictions (as shown below)
#Let us fit AR(p)
p = truelag
ar = AutoReg(y_sim, lags = p).fit()
n = len(y_sim)
tme = range(1, n+1)
n_future = 1000
tme_future = range(n+1, n+n_future+1)
fcast = ar.get_prediction(start = n, end = n+n_future-1).predicted_mean
plt.figure(figsize = (12, 7))
plt.plot(tme, y_sim, label = 'Data')
plt.plot(tme_future, fcast, label = 'Forecast (AR(p))', color = 'green')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
Let us see how the LSTM and RNN do.
mu, sig = y_sim.mean(), y_sim.std()
y_std = (y_sim - mu) / sig
X = torch.tensor(y_std[:-1], dtype=torch.float32)
Y = torch.tensor(y_std[1: ], dtype=torch.float32)
X = X.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
Y = Y.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
seq_len = X.size(1)
print(seq_len)1499
Below is the LSTM again with 32 hidden size.
class lstm_net(nn.Module):
def __init__(self, nh):
super().__init__()
self.rnn = nn.LSTM(input_size=1, hidden_size=nh,
batch_first=True)
self.fc = nn.Linear(nh, 1)
def forward(self, x, hc=None):
out, hc = self.rnn(x, hc)
out = self.fc(out)
return out, hc
torch.manual_seed(0)
np.random.seed(0)
nh = 32
model = lstm_net(nh)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
Parameter estimation (training) is done below.
n_epochs = 1000
for epoch in range(1, n_epochs + 1):
model.train()
opt.zero_grad()
pred, _ = model(X)
loss = criterion(pred, Y)
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 100/1000 | loss = 0.243756
epoch 200/1000 | loss = 0.227583
epoch 300/1000 | loss = 0.218399
epoch 400/1000 | loss = 0.209345
epoch 500/1000 | loss = 0.188405
epoch 600/1000 | loss = 0.184548
epoch 700/1000 | loss = 0.181477
epoch 800/1000 | loss = 0.179199
epoch 900/1000 | loss = 0.177259
epoch 1000/1000 | loss = 0.175526
Below we obtain predictions.
model.eval()
with torch.no_grad():
_, hc = model(X)
preds = np.zeros(n_future, dtype=np.float32)
last_in = torch.tensor([[y_std[-1]]], dtype=torch.float32) # (1, 1, 1) after view
for t in range(n_future):
out, hc = model(last_in.view(1, 1, 1), hc) # reuse hidden state
next_val = out.squeeze().item()
preds[t] = next_val
last_in = torch.tensor([[next_val]], dtype=torch.float32)lstm_preds_orig = preds * sig + mu
tme_pred_axis = np.arange(n, n + n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y_sim, lw=2, label="Data")
plt.plot(tme_pred_axis, lstm_preds_orig, lw=2, color="r", label="forecast (LSTM)")
plt.xlabel("Time"); plt.ylabel("Data")
for t in range(0, n + n_future, truelag):
plt.axvline(x=t, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=1, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=-1, linestyle='--', color='gray', linewidth=1)
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
The predictions are decent. Note that the nature of the predictions (which seem quite clean) is quite different from the predictions given by the AR model (these were very noisy).
RNN does not seem to work for this predictions as shown below (this is because of the lack of ability to capture long range dependencies).
class RNNReg(nn.Module):
def __init__(self, nh):
super().__init__()
self.rnn = nn.RNN(1, nh, nonlinearity="tanh", batch_first=True)
self.fc = nn.Linear(nh, 1)
def forward(self, x, h=None):
out, h = self.rnn(x, h)
return self.fc(out), h
torch.manual_seed(0)
np.random.seed(0)
nh = 50 #I could not find any value of nh for which the RNN is giving good predictions for this data
model = RNNReg(nh)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(),lr = 1e-3)n_epochs = 1000
for epoch in range(1, n_epochs + 1):
model.train()
opt.zero_grad()
pred, _ = model(X)
loss = criterion(pred, Y)
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 100/1000 | loss = 0.236083
epoch 200/1000 | loss = 0.228475
epoch 300/1000 | loss = 0.221912
epoch 400/1000 | loss = 0.209255
epoch 500/1000 | loss = 0.197350
epoch 600/1000 | loss = 0.200900
epoch 700/1000 | loss = 0.196046
epoch 800/1000 | loss = 0.204768
epoch 900/1000 | loss = 0.205714
epoch 1000/1000 | loss = 0.182275
model.eval()
with torch.no_grad():
_, h = model(X)
preds = np.zeros(n_future, np.float32)
last = torch.tensor([[y_std[-1]]], dtype=torch.float32)
for t in range(n_future):
out, h = model(last.view(1,1,1), h)
preds[t] = out.item()
last = torch.tensor([[preds[t]]], dtype=torch.float32)
rnn_preds_orig = preds * sig + mu
tme_pred_axis = np.arange(n, n + n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y_sim, lw=2, label="Data")
plt.plot(tme_pred_axis, lstm_preds_orig, lw=2, color="red", label="forecast (LSTM)")
plt.plot(tme_pred_axis, rnn_preds_orig, lw=2, color="black", label="forecast (RNN)")
for t in range(0, n + n_future, truelag):
plt.axvline(x=t, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=1, linestyle='--', color='gray', linewidth=1)
plt.axhline(y=-1, linestyle='--', color='gray', linewidth=1)
plt.xlabel("Time"); plt.ylabel("Data")
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
These RNN predictions seem decent initially but they deviate considerably from the LSTM predictions as we go further into the future.
Sunspots Data¶
Below we apply LSTM, RNN and GRU to obtain predictions for the sunspots dataset.
sunspots = pd.read_csv('SN_y_tot_V2.0.csv', header = None, sep = ';')
print(sunspots.head())
y = sunspots.iloc[:,1].values
n = len(y)
plt.figure(figsize = (12, 6))
plt.plot(y)
plt.show()
print(n)
n_future = 300 0 1 2 3 4
0 1700.5 8.3 -1.0 -1 1
1 1701.5 18.3 -1.0 -1 1
2 1702.5 26.7 -1.0 -1 1
3 1703.5 38.3 -1.0 -1 1
4 1704.5 60.0 -1.0 -1 1

325
First we prepare and .
mu, sig = y.mean(), y.std()
y_std = (y - mu) / sig
X = torch.tensor(y_std[:-1], dtype=torch.float32)
Y = torch.tensor(y_std[1: ], dtype=torch.float32)
X = X.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
Y = Y.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
seq_len = X.size(1)
print(seq_len)324
We fit LSTM with the number of hidden units equaling 64.
LSTM for Sunspots¶
class lstm_net(nn.Module):
def __init__(self, nh):
super().__init__()
self.rnn = nn.LSTM(input_size=1, hidden_size=nh,
batch_first=True)
self.fc = nn.Linear(nh, 1)
def forward(self, x, hc=None):
out, hc = self.rnn(x, hc)
out = self.fc(out)
return out, hc
torch.manual_seed(0)
np.random.seed(0)
nh = 64
model = lstm_net(nh)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
n_epochs = 1000
for epoch in range(1, n_epochs + 1):
model.train()
opt.zero_grad()
pred, _ = model(X)
loss = criterion(pred, Y)
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 100/1000 | loss = 0.178550
epoch 200/1000 | loss = 0.089431
epoch 300/1000 | loss = 0.063866
epoch 400/1000 | loss = 0.042808
epoch 500/1000 | loss = 0.036359
epoch 600/1000 | loss = 0.030837
epoch 700/1000 | loss = 0.025321
epoch 800/1000 | loss = 0.021516
epoch 900/1000 | loss = 0.017578
epoch 1000/1000 | loss = 0.015462
model.eval()
with torch.no_grad():
_, hc = model(X)
preds = np.zeros(n_future, dtype=np.float32)
last_in = torch.tensor([[y_std[-1]]], dtype=torch.float32) # (1, 1, 1) after view
for t in range(n_future):
out, hc = model(last_in.view(1, 1, 1), hc) # reuse hidden state
next_val = out.squeeze().item()
preds[t] = next_val
last_in = torch.tensor([[next_val]], dtype=torch.float32)lstm_preds_orig = preds * sig + mu
tme_pred_axis = np.arange(n, n + n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y, lw=2, label="Data")
plt.plot(tme_pred_axis, lstm_preds_orig, lw=2, color="r", label="forecast (LSTM)")
plt.xlabel("Time"); plt.ylabel("Data")
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
The predictions for the LSTM look quite realistic. They are quite different in characteristic from the predictions generated by an AR() model (the AR() predictions oscillate for a short while before settling to a constant value; in contrast, the LSTM predictions seem to oscillate well into the future in a way that is visually similar to the true dataset).
RNN for Sunspots¶
Next we apply the RNN (again with 64 hidden units).
torch.manual_seed(0)
np.random.seed(0)
nh = 64
class RNNReg(nn.Module):
def __init__(self):
super().__init__()
self.rnn = nn.RNN(1, nh, nonlinearity="tanh", batch_first=True)
self.fc = nn.Linear(nh, 1)
def forward(self, x, h=None):
out, h = self.rnn(x, h)
return self.fc(out), h
model = RNNReg()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
lossf = nn.MSELoss()
for epoch in range(1000):
opt.zero_grad()
pred,_ = model(X)
loss = lossf(pred, Y)
loss.backward(); opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 0/1000 | loss = 0.904697
epoch 100/1000 | loss = 0.177153
epoch 200/1000 | loss = 0.113939
epoch 300/1000 | loss = 0.086803
epoch 400/1000 | loss = 0.066403
epoch 500/1000 | loss = 0.062139
epoch 600/1000 | loss = 0.050468
epoch 700/1000 | loss = 0.035208
epoch 800/1000 | loss = 0.030020
epoch 900/1000 | loss = 0.031938
model.eval()
with torch.no_grad():
_, h = model(X)
preds = np.zeros(n_future, np.float32)
last = torch.tensor([[y_std[-1]]], dtype=torch.float32)
for t in range(n_future):
out, h = model(last.view(1,1,1), h)
preds[t] = out.item()
last = torch.tensor([[preds[t]]], dtype=torch.float32)rnn_preds_orig = preds*sig + mu
tme_pred_axis = np.arange(n, n + n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y, lw=2, label="Data")
plt.plot(tme_pred_axis, rnn_preds_orig, lw=2, color="r", label="forecast (RNN)")
plt.xlabel("Time"); plt.ylabel("Data")
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
These predictions do not appear as realistic as the LSTM forecasts.
GRU for Sunspots¶
Below we fit the GRU again with 64 hidden units. The code works in exactly the same way as LSTM and RNN.
torch.manual_seed(0)
np.random.seed(0)
nh = 64
class GRUReg(nn.Module):
def __init__(self):
super().__init__()
self.gru = nn.GRU(1, nh, batch_first=True)
self.fc = nn.Linear(nh,1)
def forward(self,x,h=None):
out,h = self.gru(x,h)
return self.fc(out),h
model = GRUReg()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
lossf = nn.MSELoss()for epoch in range(1000):
opt.zero_grad()
pred,_ = model(X)
loss = lossf(pred, Y)
loss.backward(); opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 0/1000 | loss = 1.017522
epoch 100/1000 | loss = 0.193518
epoch 200/1000 | loss = 0.111449
epoch 300/1000 | loss = 0.082548
epoch 400/1000 | loss = 0.063553
epoch 500/1000 | loss = 0.054251
epoch 600/1000 | loss = 0.041851
epoch 700/1000 | loss = 0.056074
epoch 800/1000 | loss = 0.028392
epoch 900/1000 | loss = 0.046378
model.eval()
with torch.no_grad():
_,h = model(X)
preds = np.zeros(n_future,np.float32)
last = torch.tensor([[y_std[-1]]], dtype=torch.float32)
for t in range(n_future):
out,h = model(last.view(1,1,1),h)
preds[t]=out.item()
last = torch.tensor([[preds[t]]], dtype=torch.float32)
gru_preds_orig = preds*sig+mu
plt.figure(figsize=(12,6))
plt.plot(np.arange(len(y)),y,lw=2,label="Data")
plt.plot(np.arange(len(y),len(y)+n_future),gru_preds_orig, lw=2,label="Forecast (GRU)",color="orange")
plt.legend()
plt.title("Data and Forecast"); plt.legend(); plt.tight_layout(); plt.show()
Again the predictions do not look as realistic as those produced by the LSTM.
Simulation Three¶
The following dataset is simulated according to the model:
for some fixed . This equation is for . For , we take to be linear. We take to be a large value.
n = 1500
truelag = 85
rng = np.random.default_rng(seed=0)
#sig_noise = 0.05
sig_noise = 0.01
eps = rng.normal(loc=0, scale=sig_noise, size=n)
y_sim = np.full(shape=n, fill_value=-999.0)
y_sim[0:truelag] = np.linspace(-1, 1, truelag)
for i in range(truelag, n):
y_sim[i] = 0.9 * np.sin(0.5 * np.pi * y_sim[i - truelag]) + eps[i]
#y_sim = y_sim + eps
plt.figure(figsize = (12, 6))
plt.plot(y_sim)
plt.show()
Let us obtain predictions for future values using the actual function from which the data were generated. These predictions (which we can refer to as Oracle Predictions) will form the benchmark which we can use to compare the predictions of various methods (AR, LSTM, RNN, GRU).
# Oracle Predictions
n_future = 1000
y_extended = np.concatenate([y_sim, np.empty(n_future, dtype=float)])
for t in range(n, n+n_future):
y_extended[t] = 0.9 * np.sin(0.5 * np.pi * y_extended[t - truelag])
oracle_preds = y_extended[n:]
n = len(y_sim)
tme = range(1, n+1)
tme_future = range(n+1, n+n_future+1)
plt.figure(figsize = (12, 6))
plt.plot(tme, y_sim, label = 'Data')
plt.plot(tme_future, oracle_preds, label = 'Forecast (Oracle)', color = 'green')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
Let us now fit AR(p) to this data. Even if is taken to be the correct value, this model will not work well (because the actual function which generated the data is nonlinear).
# Let us fit AR(p)
p = truelag
#p = 50
ar = AutoReg(y_sim, lags=p).fit()
n = len(y_sim)
tme = range(1, n+1)
tme_future = range(n+1, n+n_future+1)
fcast = ar.get_prediction(start=n, end=n+n_future-1).predicted_mean
plt.figure(figsize = (12, 7))
plt.plot(tme, y_sim, label = 'Data')
plt.plot(tme_future, oracle_preds, label = 'Forecast (Oracle)', color = 'green')
plt.plot(tme_future, fcast, label = 'Forecast (AR(p))', color = 'black')
plt.axvline(x=n, color='gray', linestyle='--')
plt.legend()
plt.show()
The AR predictions seem to be “expanding” compared to the Oracle predictions.
Below we calculate the mean square error and median absolute error of the discrepancy between the AR forecasts and the forecasts obtained by using the Oracle function .
rmse_ar = np.sqrt(np.mean((fcast - oracle_preds)**2))
median_abs_error_ar = np.median(np.abs(fcast - oracle_preds))
print(rmse_ar)
print(median_abs_error_ar)0.18216756914528376
0.14333472168608313
mu, sig = y_sim.mean(), y_sim.std()
y_std = (y_sim - mu) / sig
X = torch.tensor(y_std[:-1], dtype=torch.float32)
Y = torch.tensor(y_std[1: ], dtype=torch.float32)
X = X.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
Y = Y.unsqueeze(0).unsqueeze(-1) # shape (1, seq_len, 1)
seq_len = X.size(1)
print(seq_len)1499
The code below fits the LSTM, and uses it to obtain predictions.
class lstm_net(nn.Module):
def __init__(self, nh):
super().__init__()
self.rnn = nn.LSTM(input_size=1, hidden_size=nh, batch_first=True)
# input_size: the number of features in the input x
# hidden_size: the number of features in the hidden state h(=r in our notation)
# batch_first: If True, then the input and output tensors are provided
# as (batch, seq_len, feature) instead of (seq_len, batch, feature).
# In our case batch = 1, feature = 1
self.fc = nn.Linear(nh, 1) # many-to-many
def forward(self, x, hc=None):
out, hc = self.rnn(x, hc)
# out: (batch, seq_len, hidden_size)
# hc = (h_n, c_n)
# the final hidden state h_n(=r_n) and c_n(=s_n in our notation)
# h_n: (1, hidden_size), c_n: (1, hidden_size)
out = self.fc(out) # (batch, seq_len, 1)
return out, hc
torch.manual_seed(0)
np.random.seed(0)
nh = 32
model = lstm_net(nh)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)n_epochs = 1000
for epoch in range(1, n_epochs + 1):
#model.train() # Set the model to training mode
opt.zero_grad()
pred, _ = model(X) # hidden state auto-reset every epoch
loss = criterion(pred, Y)
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch:4d}/{n_epochs} | loss = {loss.item():.6f}")epoch 100/1000 | loss = 0.127903
epoch 200/1000 | loss = 0.090235
epoch 300/1000 | loss = 0.071659
epoch 400/1000 | loss = 0.063751
epoch 500/1000 | loss = 0.067751
epoch 600/1000 | loss = 0.050256
epoch 700/1000 | loss = 0.042608
epoch 800/1000 | loss = 0.038524
epoch 900/1000 | loss = 0.034826
epoch 1000/1000 | loss = 0.031501
#model.eval()
with torch.no_grad():
_, hc = model(X)
preds = np.zeros(n_future, dtype=np.float32)
last_in = torch.tensor([[y_std[-1]]], dtype=torch.float32) # (1, 1, 1) after view
for t in range(n_future):
out, hc = model(last_in.view(1, 1, 1), hc) # reuse hidden state
next_val = out.squeeze().item()
preds[t] = next_val
last_in = torch.tensor([[next_val]], dtype=torch.float32)lstm_preds_orig = preds * sig + mu
tme_pred_axis = np.arange(n, n+n_future)
plt.figure(figsize=(12,6))
plt.plot(np.arange(n), y_sim, lw = 2, label = "Data")
plt.plot(tme_pred_axis, oracle_preds, color = 'green', label = 'Forecast (oracle)')
plt.plot(tme_pred_axis, lstm_preds_orig, lw = 2, color="r", label = "Forecast (LSTM)")
plt.plot(tme_pred_axis, fcast, lw = 2, color = "black", label = "Forecast (AR)")
plt.xlabel("Time"); plt.ylabel("Data")
plt.title("Data and Forecast")
plt.legend()
plt.tight_layout()
plt.show()
Visually the LSTM predictions clearly look better compared to the AR predictions. Below we quantify the discrepancy in predictions compared to the Oracle predictions.
rmse_lstm = np.sqrt(np.mean((lstm_preds_orig - oracle_preds)**2))
median_abs_error_lstm = np.median(np.abs(lstm_preds_orig - oracle_preds))
print(rmse_ar, rmse_lstm)
print(median_abs_error_ar, median_abs_error_lstm)0.18216756914528376 0.21048515649384567
0.14333472168608313 0.05856568465625367
Even though the predictions from LSTM are clearly better than those from AR as we can see from the plot above, AR produces smaller RMSE compared to LSTM. This is because the underlying function has many sharp tranisitions, and RMSE is sensitive to outlier error values. Even if LSTM generally performs well over the domain, a few large errors around the transition points can make RMSE larger. To address this issue, we can use more robust evaluation metrics. For example, we can think of using the median of absolute errors as we did above.
Please note that, even if you copy this code exactly and set the same random seeds, you may still get slightly different results on your own computer. This is because different machines use different versions of NumPy, PyTorch, and statsmodels, and these libraries can change how random numbers are generated, how models are fit, or how LSTMs are trained. Small numerical differences—often invisible—can grow during forecasting, especially for neural networks. So the overall behavior will most likely be the same, but the exact numbers may not match perfectly across computers.