Lab 1 - Stat 153/248 - Solutions

For this lab, we will give some practical examples for concepts related to time series characteristics discussed in Lectures 1 and 2. This will include the following concepts:

Loading data from the astsa library (from your Time Series book)
Generating white noise
Computing moving averages
Generating data from an autoregressive process
Random walks and random walks + drift

import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
!pip install astsa # Only need to do this if you don't have it installed already
import astsa

# Set the random seed, this is so you will generate the same answers
# each time (for example, when generating white noise)
np.random.seed(42)

Requirement already satisfied: astsa in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (0.1)
Requirement already satisfied: pandas in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from astsa) (2.3.3)
Requirement already satisfied: numpy>=1.22.4 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2.2.6)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from pandas->astsa) (2025.3)
Requirement already satisfied: six>=1.5 in /Users/liberty/anaconda3/envs/stat153_sp26/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->astsa) (1.17.0)

Data from Time Series Analysis and Its Applications¶

First, we will use the astsa library to load and plot some of the data from Chapter 1. You can use this library yourself if you are interested in looking further at any of the examples.

As we go on, you can think about how to fit models to these data or test assumptions about these data.

# Let's print all the possible datasets we could load from the book
# These are functions named `load_X`

dir(astsa.datasets)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'load_EQ5',
 'load_EXP6',
 'load_Hare',
 'load_Lynx',
 'load_chicken',
 'load_djia',
 'load_fmri1',
 'load_gtemp_land',
 'load_gtemp_ocean',
 'load_jj',
 'load_rec',
 'load_soi',
 'load_speech',
 'load_sunspotz',
 'utils']

Dow Jones Industrial Average Data¶

import matplotlib.dates as mdates
locator = mdates.AutoDateLocator(minticks=7, maxticks=10)

djia_data = astsa.load_djia()
# Calculate the return
djia_return = np.diff(np.log(djia_data['Close']))

plt.figure()
plt.subplot(2,1,1)
plt.plot(djia_data['Date'],djia_data['Close'])
plt.gca().xaxis.set_major_locator(locator)
plt.gca().set_xticklabels([]) # Hide labels since they're the same for both subplots
plt.ylabel('Returns')
plt.gca().grid(True)

plt.subplot(2,1,2)
plt.plot(djia_data['Date'][1:],djia_return)
plt.gca().xaxis.set_major_locator(locator)
plt.gca().tick_params(axis='x', labelrotation=45)
plt.ylabel('Returns')
plt.gca().grid()

plt.tight_layout()

fMRI data¶

fmri_data = astsa.load_fmri1()

print(fmri_data)

plt.subplot(3,1,1)
plt.plot(fmri_data['cort1'])
plt.plot(fmri_data['cort2'])
plt.ylabel('BOLD')

plt.subplot(3,1,2)
plt.plot(fmri_data['thal1'])
plt.plot(fmri_data['thal2'])
plt.ylabel('BOLD')

plt.subplot(3,1,3)
plt.plot(fmri_data['cere1'])
plt.plot(fmri_data['cere2'])
plt.xlabel('Time (s)')
plt.ylabel('BOLD')

plt.tight_layout()

     time  cort1  cort2  cort3  cort4  thal1  thal2  cere1  cere2
0       1 -0.336 -0.088 -0.579 -0.221 -0.222 -0.046 -0.354 -0.028
1       2 -0.192 -0.359 -0.475 -0.058  0.072 -0.039 -0.346 -0.032
2       3  0.062  0.062  0.063  0.192  0.145 -0.256 -0.337  0.272
3       4  0.128  0.221  0.234 -0.004 -0.104 -0.030  0.149  0.042
4       5  0.358  0.199  0.388  0.255  0.035 -0.081  0.311 -0.080
..    ...    ...    ...    ...    ...    ...    ...    ...    ...
123   124 -0.500 -0.306 -0.279 -0.040 -0.166  0.211 -0.006 -0.191
124   125 -0.443 -0.213 -0.456 -0.103 -0.230  0.156 -0.124 -0.103
125   126 -0.497 -0.526 -0.457  0.376 -0.170  0.126 -0.087 -0.253
126   127 -0.401 -0.081 -0.294 -0.016 -0.186  0.047 -0.081 -0.398
127   128 -0.419 -0.199 -0.394  0.222 -0.044  0.049 -0.031 -0.367

[128 rows x 9 columns]

White Noise¶

In Lecture 2, we talked about white noise, which is a special case of a time series generated from uncorrelated random variables, $w_t$ with mean 0 and finite variance $\sigma^2_w$ . The name comes from the analogy with white light, indicating that all possible periodic oscillations are present with equal strength (we will see later how this looks in power spectral analysis).

Let’s create function that returns nt samples of independent/iid Gaussian white noise, that is, where $w_t \sim \mbox{iid } \mathcal{N}(0,\sigma^2_w)$ .

def white_noise(nt, var=1):
    '''
    Generate a time series with uncorrelated random variables, `w_t`, with mean 0 and finite variance `var`.
    If var=1 this is the standard normal distribution.
    Inputs:
        nt [int] : number of time points
        var [float] : variance
    Output:
        w [np.array] = array of length nt
    '''
    w = np.sqrt(var) * np.random.randn(nt,) 
    return w

Now let’s plot some white noise data for 250 time points, as in Fig. 1.9 (SS)

nt = 250
w = white_noise(nt)
plt.plot(w)
plt.xlabel('Time')
plt.ylabel('w')
plt.title('White noise');

Since we are drawing the data randomly at each time point, we will get a different time series if we repeat this process to generate another white noise sample. Next, let’s create a matrix of nt by nsamps, where nsamps = 100 and nt=250. Then plot these time series on top of one another. What do you notice about the expected mean and variance over time?

nsamps = 100
w_matrix = np.zeros((nt, nsamps))
for n in np.arange(nsamps):
    w_matrix[:,n] = white_noise(nt)

plt.plot(w_matrix, color='gray', linewidth=0.1);
plt.plot(w_matrix.mean(1), color='y', linewidth=2)
plt.plot(w_matrix.var(1), color='k', linewidth=2);

You see here that, as expected, the mean and variance don’t change over time, and are pretty close to 0 and 1, respectively.

Moving average and filtering¶

In the examples given by Shumway and Stoffer in Chapter 1, the signals vary in terms of their smoothness. Sometimes this is due to underlying characteristics of the data, but sometimes this is due to sampling (in fact, sometimes undersampling a time series, such as a sound, will lead to unwanted “spikiness” in the data and distortions called “aliasing”.

So what if we want to smooth our data? One way is to replace the white noise series $w_t$ by a moving average, e.g.:

$v_t = \frac{1}{3} (w_{t-1}+w_t+w_{t+1})$

This uses both the current data and the past and future neighbors, which can also be called lags [-1, 0, 1]. This is sometimes called an “acausal” moving average filter because it requires data from both the past and the future. On the other hand, a “causal” moving average filter would consider only nearest points in the past (negative or 0 lags only), for example:

$v_t = \frac{1}{3} (w_{t-2}+w_{t-1}+w_t)$

Let’s compare what these do to our time series, starting with the white noise we made above.

def moving_average(w, n=3):
    '''
    Create an acausal moving average of the time series `w` using an `n`-point moving average (default 3)
    '''
    v = np.zeros((len(w),)) * np.nan
    half_win = n // 2
    for t in np.arange(half_win, len(w) - half_win):
        v[t] = np.mean(w[t-half_win : t+half_win+1])
    return v

# Plot the white noise and the moving average of the white noise on top
plt.plot(w)
plt.plot(moving_average(w))

# How does this change on the number of points included in the average? What happens? Try some 
# other values of n
plt.plot(w)
plt.plot(moving_average(w))
plt.plot(moving_average(w, n=10)) # FILL IN, try several...

Below let’s create the moving average function assuming negative lags (a causal filter) $v_t = \frac{1}{3} (w_{t-2}+w_{t-1}+w_t)$

def moving_average_causal(w, n=3):
    '''
    Create a causal moving average of the time series `w` using an `n`-point moving average (default 3)
    Inputs:
        w (np.array) : original time series
        n (int) : number of points incorporated in the moving average
    Output: 
        v (np.array) : smoothed time series
    '''
    v = np.zeros((len(w),)) * np.nan 
    for t in np.arange(n, len(w) - n):
        v[t] = np.mean(w[t-n : t])
    return v

Now let’s look at how this causal filtering compares to the original centered n-point moving average. What do you notice?

n = 3
plt.figure(figsize=(12,3))
#plt.plot(w, label='original')
plt.plot(moving_average(w, n), label='centered n-point MA')
plt.plot(moving_average_causal(w, n), label='causal n-point MA')
plt.legend()

Autoregression¶

The smoothed moving average data we generated here still doesn’t have a lot of the quasiperiodic characteristics of time series like the speech series and fMRI series examples from lecture (Fig. 1.3 and 1.7 in SS). Instead, we can generate time series with oscillatory behavior in a few ways, one of which is to use the concept of autoregression, where we predict the current value $x_t$ of a time series as a function of past values, e.g.

$x_t = 1.5x_{t-1} + 0.75x_{t-2} + w_t$

We will start with some initial values for the first two values of $x$ so that $x_0=x_1=0$

def generate_autoreg(w):
    '''
    Generate some autocorrelated data starting with white noise
    time series `w`, for `nt` time points.
    Inputs:
        w [np.array] : white noise time series
        nt [int] : number of time points to generate
    Output:
        x [np.array] : autocorrelated time series
    '''
    nt = len(w)
    x = np.zeros((nt,))
    
    x[0] = w[0]
    #x[1] = w[1]
    for t in np.arange(1,nt):
        x[t] = 1.5*x[t-1] - 0.75*x[t-2] + w[t]
    return x

nt = 250
x = generate_autoreg(white_noise(nt))
plt.plot(x)

Data as signal plus white noise¶

Many datasets can be modeled as $v_t = s_t + w_t$ , where $s_t$ is some signal of interest and $w_t$ is white noise. We will talk later about how we might find $s_t$ or other components of the signal. But for now, let’s look at a few examples generating these types of data.

Random walk¶

A random walk is the term for a time series in which the value of the time series at time $t$ is the value of the series at time $t-1$ plus a random movement determined by $w_t$ (white noise). This is a classic example of a nonstationary process (which we’ll get into later). Let’s first generate a function for this:

def random_walk(nt, x0=0):
    '''
    Generate a random walk `x` for `nt` time points using initial value `x0=0`
    Inputs:
        nt (int) : number of time points
        x0 (float) : 
    '''
    x = np.zeros((nt,))
    x[0] = x0
    for n in np.arange(1,nt):
        x[n] = x[n-1] + white_noise(1)[0]
    return x

# Let's plot the random walk for `nt=250` time points
nt = 250
x0 = 0
x = random_walk(nt, x0=x0)
plt.plot(x)

# What do you notice about the data? Try running this cell a few times..

# Let's now look at what happens if we simulate 100 random walks for
# 1500 time points and plot them on top of one another. 
# What do you see about how the mean and variance
# of the signals change over time? How does this differ from white noise?

nwalks = 100
nt=5000
all_r = np.zeros((nt, nwalks))
for n in np.arange(nwalks):
    all_r[:,n]=random_walk(nt)
    plt.plot(all_r[:,n])
    plt.xlabel('Time')

plt.figure()
plt.plot(all_r.mean(1))
plt.ylabel('Mean')
plt.figure()
plt.plot(all_r.var(1))
plt.ylabel('Variance')

Random Walk with Drift¶

A random walk with drift is a specific modification of the random walk model in which we add a constant $\delta>0$ to our random walk:

$x_t = \delta + x_{t-1} + w_t$

which can also be written:

$x_t = \delta t + \displaystyle\sum_{j=1}^t w_j$

Let’s now write a function for this:

def random_walk_drift(nt, drift, x0=0):
    '''
    Create a random walk with drift for `nt` time points, `drift` drift, and
    with initial value `x0`.
    Inputs:
        nt (int) : number of time points
        drift (float) : constant drift
        x0 (float) : initial value
    '''
    x = np.zeros((nt,))
    x[0] = x0
    for n in np.arange(1,nt):
        x[n] = x[n-1] + white_noise(1) + drift
    return x

# Now let's plot a single example of the random walk. Try running a few times
# with different parameters. How does the `drift` parameter influence what
# you get? What about the properties of the white noise itself (if you go 
# back...?)
nt = 250
drift = 0.1
plt.figure()
plt.plot(random_walk_drift(nt, drift))

/var/folders/3f/h3pnh73d18ldksmkrp_fvtk00000gn/T/ipykernel_79118/2370165387.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  x[n] = x[n-1] + white_noise(1) + drift

# Let's compare the mean and variance of the random
# walk with drift to our example above. Let's create
# a matrix of random walks with drift - `nwalks` samples
# and `nt` time points for drift `drift`.
nwalks = 1000
nt = 1500
drift = 0.1
all_r_drift = np.zeros((nt, nwalks))
for n in np.arange(nwalks):
    all_r_drift[:,n]=random_walk_drift(nt, drift)
    plt.plot(all_r_drift[:,n])
    plt.xlabel('Time')

/var/folders/3f/h3pnh73d18ldksmkrp_fvtk00000gn/T/ipykernel_79118/2370165387.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  x[n] = x[n-1] + white_noise(1) + drift

# Now plot the mean and variance over time. How do these change compare
# to the other graphs you created?

plt.figure()
plt.plot(all_r_drift.mean(1))

plt.figure()
plt.plot(all_r_drift.var(1))

Load some data from Shumway and Stoffer examples¶

# Try applying the moving average to the DJIA data 
# (where each sample is from one day). What happens 
# when you apply the average over a month, a year, etc?
# Does this change how you might interpret the data?
n = 30
plt.plot(djia_data['Close'])
plt.plot(moving_average(djia_data['Close'], n=n))

# Let's do the same thing with the speech data
speech_data = astsa.load_speech()
speech_data

plt.plot(speech_data['Value'])
# Plot the smoothed speech data
plt.plot(moving_average(speech_data['Value'],130))

# Try adding white noise to some of the datasets. If we just
# add the white noise with the default variance, it may not affect our signal 
# that much, so try changing the `var` parameter in our `white_noise` 
# function

noise_variance = np.var(speech_data['Value']) 
plt.figure()
plt.subplot(2,1,1)
plt.plot(speech_data['Value'])
plt.subplot(2,1,2)
plt.plot(speech_data['Value'] + white_noise(len(speech_data['Value']), var=noise_variance*2))

# Extra: Try adding some drift to any of these datasets

speech_drift=speech_data['Value']+random_walk_drift(len(speech_data['Value']), drift=10)
plt.plot(speech_drift)

/var/folders/3f/h3pnh73d18ldksmkrp_fvtk00000gn/T/ipykernel_79118/2370165387.py:13: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  x[n] = x[n-1] + white_noise(1) + drift