For this lab, we will give some practical examples for concepts related to time series characteristics discussed in Lectures 1 and 2. This will include the following concepts:
Loading data from the
astsalibrary (from your Time Series book)Generating white noise
Computing moving averages
Generating data from an autoregressive process
Random walks and random walks + drift
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
!pip install astsa # Only need to do this if you don't have it installed already
import astsa
# Set the random seed, this is so you will generate the same answers
# each time (for example, when generating white noise)
np.random.seed(42) Requirement already satisfied: astsa in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (0.1)
Requirement already satisfied: pandas in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from astsa) (2.3.3)
Requirement already satisfied: numpy>=1.22.4 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2.2.6)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2025.3)
Requirement already satisfied: six>=1.5 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->astsa) (1.17.0)
Data from Time Series Analysis and Its Applications¶
First, we will use the astsa library to load and plot some of the data from Chapter 1. You can use this library yourself if you are interested in looking further at any of the examples.
As we go on, you can think about how to fit models to these data or test assumptions about these data.
# Let's print all the possible datasets we could load from the book
# These are functions named `load_X`
dir(astsa.datasets)['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'load_EQ5',
'load_EXP6',
'load_Hare',
'load_Lynx',
'load_chicken',
'load_djia',
'load_fmri1',
'load_gtemp_land',
'load_gtemp_ocean',
'load_jj',
'load_rec',
'load_soi',
'load_speech',
'load_sunspotz',
'utils']Dow Jones Industrial Average Data¶
import matplotlib.dates as mdates
locator = mdates.AutoDateLocator(minticks=7, maxticks=10)
djia_data = astsa.load_djia()
# Calculate the return
djia_return = np.diff(np.log(djia_data['Close']))
plt.figure()
plt.subplot(2,1,1)
plt.plot(djia_data['Date'],djia_data['Close'])
plt.gca().xaxis.set_major_locator(locator)
plt.gca().set_xticklabels([]) # Hide labels since they're the same for both subplots
plt.ylabel('Returns')
plt.gca().grid(True)
plt.subplot(2,1,2)
plt.plot(djia_data['Date'][1:],djia_return)
plt.gca().xaxis.set_major_locator(locator)
plt.gca().tick_params(axis='x', labelrotation=45)
plt.ylabel('Returns')
plt.gca().grid()
plt.tight_layout()
fMRI data¶
fmri_data = astsa.load_fmri1()
print(fmri_data)
plt.subplot(3,1,1)
plt.plot(fmri_data['cort1'])
plt.plot(fmri_data['cort2'])
plt.ylabel('BOLD')
plt.subplot(3,1,2)
plt.plot(fmri_data['thal1'])
plt.plot(fmri_data['thal2'])
plt.ylabel('BOLD')
plt.subplot(3,1,3)
plt.plot(fmri_data['cere1'])
plt.plot(fmri_data['cere2'])
plt.xlabel('Time (s)')
plt.ylabel('BOLD')
plt.tight_layout()
time cort1 cort2 cort3 cort4 thal1 thal2 cere1 cere2
0 1 -0.336 -0.088 -0.579 -0.221 -0.222 -0.046 -0.354 -0.028
1 2 -0.192 -0.359 -0.475 -0.058 0.072 -0.039 -0.346 -0.032
2 3 0.062 0.062 0.063 0.192 0.145 -0.256 -0.337 0.272
3 4 0.128 0.221 0.234 -0.004 -0.104 -0.030 0.149 0.042
4 5 0.358 0.199 0.388 0.255 0.035 -0.081 0.311 -0.080
.. ... ... ... ... ... ... ... ... ...
123 124 -0.500 -0.306 -0.279 -0.040 -0.166 0.211 -0.006 -0.191
124 125 -0.443 -0.213 -0.456 -0.103 -0.230 0.156 -0.124 -0.103
125 126 -0.497 -0.526 -0.457 0.376 -0.170 0.126 -0.087 -0.253
126 127 -0.401 -0.081 -0.294 -0.016 -0.186 0.047 -0.081 -0.398
127 128 -0.419 -0.199 -0.394 0.222 -0.044 0.049 -0.031 -0.367
[128 rows x 9 columns]

White Noise¶
In Lecture 2, we talked about white noise, which is a special case of a time series generated from uncorrelated random variables, with mean 0 and finite variance . The name comes from the analogy with white light, indicating that all possible periodic oscillations are present with equal strength (we will see later how this looks in power spectral analysis).
Let’s create function that returns nt samples of independent/iid Gaussian white noise, that is, where .
def white_noise(nt, var=1):
'''
Generate a time series with uncorrelated random variables, `w_t`, with mean 0 and finite variance `var`.
If var=1 this is the standard normal distribution.
Inputs:
nt [int] : number of time points
var [float] : variance
Output:
w [np.array] = array of length nt
'''
w = np.sqrt(var) * np.random.randn(nt,)
return wNow let’s plot some white noise data for 250 time points, as in Fig. 1.9 (SS)
nt = 250
w = white_noise(nt)
plt.plot(w)
plt.xlabel('Time')
plt.ylabel('w')
plt.title('White noise');
Since we are drawing the data randomly at each time point, we will get a different time series if we repeat this process to generate another white noise sample. Next, let’s create a matrix of nt by nsamps, where nsamps = 100 and nt=250. Then plot these time series on top of one another. What do you notice about the expected mean and variance over time?
nsamps = 100
w_matrix = np.zeros((nt, nsamps))
for n in np.arange(nsamps):
w_matrix[:,n] = white_noise(nt)
plt.plot(w_matrix, color='gray', linewidth=0.1);
plt.plot(w_matrix.mean(1), color='y', linewidth=2)
plt.plot(w_matrix.var(1), color='k', linewidth=2);
You see here that, as expected, the mean and variance don’t change over time, and are pretty close to 0 and 1, respectively.
Moving average and filtering¶
In the examples given by Shumway and Stoffer in Chapter 1, the signals vary in terms of their smoothness. Sometimes this is due to underlying characteristics of the data, but sometimes this is due to sampling (in fact, sometimes undersampling a time series, such as a sound, will lead to unwanted “spikiness” in the data and distortions called “aliasing”.
So what if we want to smooth our data? One way is to replace the white noise series by a moving average, e.g.:
This uses both the current data and the past and future neighbors, which can also be called lags [-1, 0, 1]. This is sometimes called an “acausal” moving average filter because it requires data from both the past and the future. On the other hand, a “causal” moving average filter would consider only nearest points in the past (negative or 0 lags only), for example:
Let’s compare what these do to our time series, starting with the white noise we made above.
def moving_average(w, n=3):
'''
Create an acausal moving average of the time series `w` using an `n`-point moving average (default 3)
'''
v = np.zeros((len(w),)) * np.nan
half_win = n // 2
for t in np.arange(half_win, len(w) - half_win):
v[t] = np.mean(w[t-half_win : t+half_win+1])
return v# Plot the white noise and the moving average of the white noise on top
plt.plot(w)
plt.plot(moving_average(w))
# How does this change on the number of points included in the average? What happens? Try some
# other values of n
plt.plot(w)
plt.plot(moving_average(w))
plt.plot(moving_average(w, n=10)) # FILL IN, try several...
Below let’s create the moving average function assuming negative lags (a causal filter)
def moving_average_causal(w, n=3):
'''
Create a causal moving average of the time series `w` using an `n`-point moving average (default 3)
Inputs:
w (np.array) : original time series
n (int) : number of points incorporated in the moving average
Output:
v (np.array) : smoothed time series
'''
v = np.zeros((len(w),)) * np.nan
for t in np.arange(n, len(w) - n):
v[t] = np.mean(w[t-n : t])
return vNow let’s look at how this causal filtering compares to the original centered n-point moving average. What do you notice?
n = 3
plt.figure(figsize=(12,3))
#plt.plot(w, label='original')
plt.plot(moving_average(w, n), label='centered n-point MA')
plt.plot(moving_average_causal(w, n), label='causal n-point MA')
plt.legend()
Autoregression¶
The smoothed moving average data we generated here still doesn’t have a lot of the quasiperiodic characteristics of time series like the speech series and fMRI series examples from lecture (Fig. 1.3 and 1.7 in SS). Instead, we can generate time series with oscillatory behavior in a few ways, one of which is to use the concept of autoregression, where we predict the current value of a time series as a function of past values, e.g.
We will start with some initial values for the first two values of so that
def generate_autoreg(w):
'''
Generate some autocorrelated data starting with white noise
time series `w`, for `nt` time points.
Inputs:
w [np.array] : white noise time series
nt [int] : number of time points to generate
Output:
x [np.array] : autocorrelated time series
'''
nt = len(w)
x = np.zeros((nt,))
x[0] = w[0]
#x[1] = w[1]
for t in np.arange(1,nt):
x[t] = 1.5*x[t-1] - 0.75*x[t-2] + w[t]
return xnt = 250
x = generate_autoreg(white_noise(nt))
plt.plot(x)
Data as signal plus white noise¶
Many datasets can be modeled as , where is some signal of interest and is white noise. We will talk later about how we might find or other components of the signal. But for now, let’s look at a few examples generating these types of data.
Random walk¶
A random walk is the term for a time series in which the value of the time series at time is the value of the series at time plus a random movement determined by (white noise). This is a classic example of a nonstationary process (which we’ll get into later). Let’s first generate a function for this:
def random_walk(nt, x0=0):
'''
Generate a random walk `x` for `nt` time points using initial value `x0=0`
Inputs:
nt (int) : number of time points
x0 (float) :
'''
x = np.zeros((nt,))
x[0] = x0
for n in np.arange(1,nt):
x[n] = x[n-1] + white_noise(1)[0]
return x# Let's plot the random walk for `nt=250` time points
nt = 250
x0 = 0
x = random_walk(nt, x0=x0)
plt.plot(x)
# What do you notice about the data? Try running this cell a few times..
# Let's now look at what happens if we simulate 100 random walks for
# 1500 time points and plot them on top of one another.
# What do you see about how the mean and variance
# of the signals change over time? How does this differ from white noise?
nwalks = 100
nt=5000
all_r = np.zeros((nt, nwalks))
for n in np.arange(nwalks):
all_r[:,n]=random_walk(nt)
plt.plot(all_r[:,n])
plt.xlabel('Time')
plt.figure()
plt.plot(all_r.mean(1))
plt.ylabel('Mean')
plt.figure()
plt.plot(all_r.var(1))
plt.ylabel('Variance')



Random Walk with Drift¶
A random walk with drift is a specific modification of the random walk model in which we add a constant to our random walk:
which can also be written:
Let’s now write a function for this:
def random_walk_drift(nt, drift, x0=0):
'''
Create a random walk with drift for `nt` time points, `drift` drift, and
with initial value `x0`.
Inputs:
nt (int) : number of time points
drift (float) : constant drift
x0 (float) : initial value
'''
x = np.zeros((nt,))
x[0] = x0
for n in np.arange(1,nt):
x[n] = x[n-1] + white_noise(1) + drift
return x# Now let's plot a single example of the random walk. Try running a few times
# with different parameters. How does the `drift` parameter influence what
# you get? What about the properties of the white noise itself (if you go
# back...?)
nt = 250
drift = 0.1
plt.figure()
plt.plot(random_walk_drift(nt, drift))/var/folders/3f/h3pnh73d18ldksmkrp_fvtk00000gn/T/ipykernel_79118/2370165387.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
x[n] = x[n-1] + white_noise(1) + drift

# Let's compare the mean and variance of the random
# walk with drift to our example above. Let's create
# a matrix of random walks with drift - `nwalks` samples
# and `nt` time points for drift `drift`.
nwalks = 1000
nt = 1500
drift = 0.1
all_r_drift = np.zeros((nt, nwalks))
for n in np.arange(nwalks):
all_r_drift[:,n]=random_walk_drift(nt, drift)
plt.plot(all_r_drift[:,n])
plt.xlabel('Time')/var/folders/3f/h3pnh73d18ldksmkrp_fvtk00000gn/T/ipykernel_79118/2370165387.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
x[n] = x[n-1] + white_noise(1) + drift

# Now plot the mean and variance over time. How do these change compare
# to the other graphs you created?
plt.figure()
plt.plot(all_r_drift.mean(1))
plt.figure()
plt.plot(all_r_drift.var(1))

Load some data from Shumway and Stoffer examples¶
# Try applying the moving average to the DJIA data
# (where each sample is from one day). What happens
# when you apply the average over a month, a year, etc?
# Does this change how you might interpret the data?
n = 30
plt.plot(djia_data['Close'])
plt.plot(moving_average(djia_data['Close'], n=n))
# Let's do the same thing with the speech data
speech_data = astsa.load_speech()
speech_dataplt.plot(speech_data['Value'])
# Plot the smoothed speech data
plt.plot(moving_average(speech_data['Value'],130))
# Try adding white noise to some of the datasets. If we just
# add the white noise with the default variance, it may not affect our signal
# that much, so try changing the `var` parameter in our `white_noise`
# function
noise_variance = np.var(speech_data['Value'])
plt.figure()
plt.subplot(2,1,1)
plt.plot(speech_data['Value'])
plt.subplot(2,1,2)
plt.plot(speech_data['Value'] + white_noise(len(speech_data['Value']), var=noise_variance*2))
# Extra: Try adding some drift to any of these datasetsspeech_drift=speech_data['Value']+random_walk_drift(len(speech_data['Value']), drift=10)
plt.plot(speech_drift)/var/folders/3f/h3pnh73d18ldksmkrp_fvtk00000gn/T/ipykernel_79118/2370165387.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
x[n] = x[n-1] + white_noise(1) + drift
