Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

In this lecture, we will discuss a specific application of time-lagged regression models to understand speech processing in the brain. Here, we are relating our input sound stimulus xx at different time lags to our output neural time series yy. This is typically written as:

yt=τβτxtτy_t = \sum_{\tau} \beta_\tau x_{t-\tau}

where τ\tau is the time delay.

These models have been used to show which stimulus features may drive neural activity in different brain areas - for example, frequency features from a spectrogram, phoneme features, word onsets, visual features, and more.

In the case of using a spectrogram as the input xx, these are sometimes also called a spectrotemporal receptive field (STRF) model. In neuroscience, the “STRF” itself refers to the β\beta coefficients of this regression model, which can be thought of as a linear filter that describes which spectrotemporal features of a stimulus will increase or decrease activity in a given brain area. This technique has been applied widely to various types of brain related time series data, including scalp electroencephalography (EEG), intracranial EEG, magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI). Other terms used for the β\beta coefficients are “TRF weights” or “mTRF” weights (for multivariate temporal receptive field or multivariate temporal response function).

References:

The STRF as a time-lagged regression

For a single time point tt, single output, FF frequency features, and DD delays:

yt=f=1Fd=0D1βf,dxf,tdy_t = \sum_{f=1}^{F} \sum_{d=0}^{D-1} \beta_{f,d}\, x_{f,\, t-d}

The response at time tt is a weighted sum over all (frequency, delay) pairs of the spectrogram values at the corresponding earlier time points. The weight βf,d\beta_{f,d} is the STRF — it tells you how much frequency ff at delay dd contributes to the response.

Predicting one response time point

Collect the weights into a matrix BRF×DB \in \mathbb{R}^{F \times D} and the relevant slice of stimulus history into a matrix XtRF×DX_t \in \mathbb{R}^{F \times D}:

B=[β1,0β1,1β1,D1β2,0β2,1β2,D1βF,0βF,1βF,D1],Xt=[x1,tx1,t1x1,tD+1x2,tx2,t1x2,tD+1xF,txF,t1xF,tD+1]B = \begin{bmatrix} \beta_{1,0} & \beta_{1,1} & \cdots & \beta_{1,D-1} \\ \beta_{2,0} & \beta_{2,1} & \cdots & \beta_{2,D-1} \\ \vdots & & & \vdots \\ \beta_{F,0} & \beta_{F,1} & \cdots & \beta_{F,D-1} \end{bmatrix}, \quad X_t = \begin{bmatrix} x_{1,t} & x_{1,t-1} & \cdots & x_{1,t-D+1} \\ x_{2,t} & x_{2,t-1} & \cdots & x_{2,t-D+1} \\ \vdots & & & \vdots \\ x_{F,t} & x_{F,t-1} & \cdots & x_{F,t-D+1} \end{bmatrix}

Then:

yt=f,dBf,d(Xt)f,d=B,XtFy_t = \sum_{f,d} B_{f,d}\, (X_t)_{f,d} = \langle B, X_t \rangle_F

where ,F\langle \cdot, \cdot \rangle_F is the Frobenius inner product (sum of elementwise products). This is the “STRF as a 2D filter” view — the response is the inner product of the STRF kernel with the recent stimulus history.

Vectorize and stack across time

Flatten BB into a column vector βRFD\beta \in \mathbb{R}^{FD} and flatten each XtX_t into a row vector.

β=vec(B)=[β1,0β1,1β1,D1β2,0βF,D1],xtT=[x1,tx1,t1x1,tD+1x2,txF,tD+1]\beta = \mathrm{vec}(B) = \begin{bmatrix} \beta_{1,0} \\ \beta_{1,1} \\ \vdots \\ \beta_{1,D-1} \\ \beta_{2,0} \\ \vdots \\ \beta_{F,D-1} \end{bmatrix}, \quad \mathbf{x}_t^T = \begin{bmatrix} x_{1,t} & x_{1,t-1} & \cdots & x_{1,t-D+1} & x_{2,t} & \cdots & x_{F,t-D+1} \end{bmatrix}

Then for each time point:

yt=xtTβy_t = \mathbf{x}_t^T \beta

Stack the xtT\mathbf{x}_t^T as rows of a design matrix XRT×FD\mathbf{X} \in \mathbb{R}^{T \times FD}:

X=[x1Tx2TxTT]\mathbf{X} = \begin{bmatrix} \mathbf{x}_1^T \\ \mathbf{x}_2^T \\ \vdots \\ \mathbf{x}_T^T \end{bmatrix}

and you get the standard linear regression form:

y=Xβ\mathbf{y} = \mathbf{X} \beta

with yRT\mathbf{y} \in \mathbb{R}^T, XRT×FD\mathbf{X} \in \mathbb{R}^{T \times FD}, βRFD\beta \in \mathbb{R}^{FD}.

STRFs and linear regression

A STRF is fundamentally a linear regression. The “spectrotemporal” part is for interpretation, and just involves reshaping the β\beta matrix so we can better interpret it in terms of the multiple features (frequencies in a spectrogram) and time delays, even though we actually estimate the regression with everything flattened.

Multi-electrode version

In many cases in real world problems, we have multiple electrodes we are trying to fit from the brain. If you have EE electrodes, replace y\mathbf{y} with YRT×E\mathbf{Y} \in \mathbb{R}^{T \times E} and β\beta with BRFD×E\mathbf{B} \in \mathbb{R}^{FD \times E}:

Y=XB\mathbf{Y} = \mathbf{X} \mathbf{B}

Same design matrix, just solving EE regressions in parallel.

import numpy as np
import sys
import os
import h5py
import scipy.io
import glob
from matplotlib import pyplot as plt
import re

from matplotlib import cm
plt.ion()

from ridge.utils import make_delayed
from ridge.ridge import cv_ridge
import logging
logging.basicConfig(level=logging.INFO) 
# Define a z-scoring function
zs = lambda x: (x-x.mean(0))/x.std(0)

Get the data!

Here we will load in the data from our experiment. We have neural data ytrain and ytest, which are matrices of dimension [time points x electrodes], where each electrode is taken from a session where a patient with epilepsy was listening to a set of movie clips. Here we will look at data from 3 example electrodes from one patient listening to movie clips.

image of electrodes in the brain for this dataset
data_file = 'MTdata_TCH28.hf5'

# Here we will read in the contents of the file
xtrain = {}
xtest = {}
with h5py.File(data_file, 'r') as hf:
    # This is neural data [time points x electrodes]
    ytrain = hf['ytrain'][:]
    ytest = hf['ytest'][:]
    
    # This is the spectrogram of the sounds the person heard
    xtrain['spec'] = hf['xtrain_spec'][:]
    xtest['spec'] = hf['xtest_spec'][:]
    
    # These are the phoneme features for the words the person heard
    xtrain['phn'] = hf['xtrain_phn'][:]
    xtest['phn'] = hf['xtest_phn'][:]

    # Sampling rate of the data
    fs = hf.attrs['fs']
print(ytrain.shape, ytest.shape)

# Show whether the data have been z-scored
print(ytrain.mean(), ytrain.std())
print(ytest.mean(), ytest.std())
print(xtrain['spec'].mean(), xtrain['spec'].std())
print(xtest['spec'].mean(), xtest['spec'].std())
(161034, 3) (13563, 3)
1.8826142342050505e-18 0.9999999999999961
1.1176174172048075e-17 1.0000000000000016
2.188539047263371e-15 1.0
-1.2874952646199382e-15 0.9999999999999996
# Let's look at what we're using for our xtrain matrices

stim_types = ['spec', 'phn']

for stim_type in stim_types:
    ntimes, nfeats = xtest[stim_type].shape
    print(f'{ntimes} time points, {nfeats} {stim_type} features')
    nsec_to_show = 20
    
    plt.figure(figsize=(10,3))
    plt.imshow(xtest[stim_type].T, aspect='auto', cmap=cm.magma, interpolation='nearest')
    plt.gca().invert_yaxis()
    ticks = np.arange(0, ntimes, fs*10)
    plt.gca().set_xticks(ticks)
    plt.gca().set_xticklabels((ticks / fs).astype(int))
    plt.gca().set_xlim([0,nsec_to_show*fs])
    plt.title(f'xtest - {stim_type}')
    plt.colorbar()
    
    plt.xlabel('Time (s)')
    plt.ylabel('Feature (bin)')
13563 time points, 80 spec features
13563 time points, 14 phn features
<Figure size 1000x300 with 2 Axes>
<Figure size 1000x300 with 2 Axes>
# Let's show the response data that goes with this test set:

ntimes,nelecs = ytest.shape
print(f'{ntimes} time points, {nelecs} electrodes')

plt.figure(figsize=(10,3))
for ch in np.arange(nelecs):
    plt.subplot(nelecs,1,ch+1)
    plt.plot(ytest[:,ch]+ch)
    ticks = np.arange(0, ntimes, fs*10)
    plt.gca().set_xticks(ticks)
    plt.gca().set_xlim([0,nsec_to_show*fs])
    #plt.colorbar(label='z-score')
    if ch == nelecs-1:
        plt.xlabel('Time (s)')
        plt.ylabel('Z')
        plt.gca().set_xticklabels((ticks / fs).astype(int))
    else:
        plt.gca().set_xticklabels([])

13563 time points, 3 electrodes
<Figure size 1000x300 with 3 Axes>

How do we choose the lags?

If we want to fit a lagged regression model, we first have to choose what lags we want to consider in our model. Usually, we do this based on prior knowledge about a particular brain area (say, if we know that most responses occur within 500 milliseconds, we probably don’t need to search beyond that). However, we can also use the cross-correlation function to look at where our peak lead-lag relationships lie between xx and yy. We’ll do that here, using a maximum possible lag between the sound features and the neural recordings of 0.6 seconds, which is reasonable based on prior literature.

from scipy.signal import correlate

stim_type = 'spec'

ntimes, nelecs = ytrain.shape
ntimes, nfeats = xtrain[stim_type].shape

max_lag_sec = 0.6 # seconds
max_lag = int(max_lag_sec * fs) 

lags = np.arange(-max_lag, max_lag + 1)
ccf = np.zeros((nelecs, nfeats, len(lags)))

for e in range(nelecs):
    for f in range(nfeats):
        full = correlate(ytrain[:, e], xtrain[stim_type][:, f], mode='full') / ntimes
        mid = len(full) // 2
        ccf[e, f] = full[mid - max_lag : mid + max_lag + 1]

Plotting lag relationships

Now we will plot the average cross-correlation function across features to determine how we want to structure our lags for the regression.

lags_s = lags / fs

# Average |CCF| across features, then show per-electrode traces + mean
ccf_abs = np.abs(ccf).mean(axis=1)  # [n_elec, n_lags]

fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(lags_s, ccf_abs.T, color='gray', alpha=0.3, lw=0.8)
ax.plot(lags_s, ccf_abs.mean(0), color='C0', lw=2, label='mean across elecs')
ax.axvline(0, color='k', ls='--', lw=0.8)
ax.set_xlabel('lag (s)  [positive = spectrogram leads neural]')
ax.set_ylabel('|CCF|, averaged across spec features')
ax.set_title('Stimulus–response cross-correlation envelope')
ax.legend()
plt.tight_layout()
plt.show()
<Figure size 700x400 with 1 Axes>

Create delay matrices

We now have the prerequisite matrices to perform our regression (ytrain and xtrain, and cross-validation test set ytest and xtest). To include time delays, we can set up a stacked matrix of our stimulus at different time delays. This actually has a special name -- it’s called a Toeplitz matrix). As a toy example, say we have a spectrogram with n time points and 3 frequencies.

[x1,1x1,2x1,3x2,1x2,2x2,3x3,1x3,2x3,3xn,1xn,2xn,3]\begin{bmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \\ \vdots & \vdots & \vdots \\ x_{n,1} & x_{n,2} & x_{n,3} \\ \end{bmatrix}

Our stacked delay matrix would look something like this:

[x1,1x1,2x1,3000000000x2,1x2,2x2,3x1,1x1,2x1,3000000x3,1x3,2x3,3x2,1x2,2x2,3x1,1x1,2x1,3000x4,1x4,2x4,3x3,1x3,2x3,3x2,1x2,2x2,3000xn,1xn,2xn,3xn1,1xn1,2xn1,3xn2,1xn2,2xn2,3xnd+1,1xnd+1,2xnd+1,3]\begin{bmatrix} x_{1,1} & x_{1,2} & x_{1,3} & 0 & 0 & 0 & 0 & 0 & 0 & \ldots & 0 & 0 & 0 \\ x_{2,1} & x_{2,2} & x_{2,3} & x_{1,1} & x_{1,2} & x_{1,3} & 0 & 0 & 0 & \ldots & 0 & 0 & 0 \\ x_{3,1} & x_{3,2} & x_{3,3} & x_{2,1} & x_{2,2} & x_{2,3} & x_{1,1} & x_{1,2} & x_{1,3} & \ldots & 0 & 0 & 0 \\ x_{4,1} & x_{4,2} & x_{4,3} & x_{3,1} & x_{3,2} & x_{3,3} & x_{2,1} & x_{2,2} & x_{2,3} & \ldots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ x_{n,1} & x_{n,2} & x_{n,3} & x_{n-1,1} & x_{n-1,2} & x_{n-1,3} & x_{n-2,1} & x_{n-2,2} & x_{n-2,3} & \ldots & x_{n-d+1,1} & x_{n-d+1,2} & x_{n-d+1,3} \\ \end{bmatrix}
# Create the delayed matrices

delay_min = 0 # Could have this be a negative number
delay_max = 0.4  # positive number for sound leading the neural response
delays = np.arange(np.floor(delay_min*fs), np.ceil(delay_max*fs), dtype=int) 
print(delays)

print('Creating delayed training set matrix')
xtraind = make_delayed(xtrain[stim_type], delays)

print('Creating delayed test set matrix')
xtestd = make_delayed(xtest[stim_type], delays)

print(f'Training set is {xtrain[stim_type].shape[0]} time points by {xtraind.shape[1]} features')
print(f'Test set is {xtest[stim_type].shape[0]} time points by {xtestd.shape[1]} features')

print(f'{xtraind.shape[1]} features should be # original features {xtrain[stim_type].shape[1]} x # delays {len(delays)}')
print(xtraind.shape[1] == xtrain[stim_type].shape[1]*len(delays))
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]
Creating delayed training set matrix
Creating delayed test set matrix
Training set is 161034 time points by 3200 features
Test set is 13563 time points by 3200 features
3200 features should be # original features 80 x # delays 40
True

Look at delayed stimulus matrix

Here I’ll plot only a subset of the delayed matrix so you can see its structure. Again, this is transposed so that time is on the x axis. Red lines are shown so you can appreciate the small shifts of the matrices as a function of delay.

fig, axes = plt.subplots(figsize=(10,6))
plt.imshow(xtraind[0:4000,:].T, cmap = cm.magma, aspect='auto', interpolation='nearest')
plt.gca().xaxis.grid(color='w')
plt.xlabel('Time bin')
plt.ylabel('Feature x delay')
<Figure size 1000x600 with 1 Axes>

Covariance matrix

In most STRF analyses, we must normalize by autocorrelations in the stimulus (frequencies that always appear together, or temporal correlations that occur as a result of smoothly varying signals). We do this by calculating the covariance of the delayed stimulus. This will tell us which frequencies/features covary with one another in our stimulus, and how they covary across time.

# Calculate covariance matrix for training data
dtype = np.single
covmat = np.array(np.dot(xtraind.astype(dtype).T, xtraind.astype(dtype)))
# Show covariance matrix
plt.figure(figsize=(5,5))
plt.imshow(covmat[:80,:80], cmap=cm.Reds)
plt.colorbar()
<Figure size 500x500 with 2 Axes>
# add an intercept term

xtraind_int = np.hstack((np.ones((xtraind.shape[0],1)), xtraind))
xtestd_int = np.hstack((np.ones((xtestd.shape[0],1)), xtestd))

OLS

Let’s first look at fitting the model using OLS. Here we’ll use numpy least squares for simplicity, though you can use any OLS solver.

print(xtraind_int.shape)
print(ytrain.shape)
(161034, 3201)
(161034, 3)
# Fit OLS
beta, SSE, rank, s = np.linalg.lstsq(xtraind_int, ytrain, rcond=None)

Cross validation performance

Let’s look at performance on held out data by computing predictions from our fitted beta values on our new xtestd and compare these predictions to the true held out data ytest.

# Let's look at performance on predicted versus actual data
ytest_hat = np.dot(xtestd_int, beta)

# Calculate correlation between ytest_hat and ytest for an example electrode
elec = 2
r_test = np.corrcoef(ytest_hat[:,elec], ytest[:,elec])[0,1]

ytrain_hat = np.dot(xtraind_int, beta)
plt.figure(figsize=(12,4))
plt.subplot(2,1,1)
plt.imshow(xtestd[:,:nfeats].T, aspect='auto', cmap=cm.magma, interpolation='nearest')
plt.gca().set_xlim([2000,4000])
plt.ylabel('Feature bin')
plt.gca().invert_yaxis()

plt.subplot(2,1,2)
plt.plot(ytest[:,elec], label='ytest')
plt.plot(ytest_hat[:,elec], label='ytest_hat')
plt.xlabel('Time bin')
plt.ylabel('Z')
plt.gca().set_xlim([2000,4000])
plt.title(r_test)
plt.legend()
plt.tight_layout()
<Figure size 1200x400 with 2 Axes>

Residuals plot

For completeness, we’ll plot the residuals and their autocorrelation function. Note that for this particular type of model, it’s often very common that the stimulus will not drive the majority of the variance in the data, and that in fact, other ongoing oscillatory processes will dominate the signal that we can’t get rid of just through regression with stimulus features. Although this is objectively true, if we are able to use our models to do prediction on new data, we consider this useful, even if we aren’t modeling all possible sources of variance.

resid = ytest-ytest_hat
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(resid[:,2]);
<Figure size 640x480 with 1 Axes>

Plotting the beta coefficients (the STRF)

Now we’ll plot the beta coefficients, showing how reshaping them according to the number of delays and features helps us with interpretability.

print(beta.shape)
plt.plot(beta[:,0])
(3201, 3)
<Figure size 640x480 with 1 Axes>
fig = plt.figure(figsize=(10,3))

for c in np.arange(nelecs):
    ax = fig.add_subplot(1,3,c+1)
    strf = beta[1:,c].reshape(len(delays),-1)
    smax = np.abs(strf).max()
    plt.imshow(strf.T, vmin=-smax, vmax=smax, cmap = cm.RdBu_r, aspect='auto', interpolation='nearest')
    plt.title(f'elec {c}: r={np.corrcoef(ytest_hat[:,c], ytest[:,c])[0,1]:.2f}')
    if c==0:
        plt.xlabel('Time (s)')
        plt.ylabel('Freq.')
    ax.set_ylim(ax.get_ylim()[::-1]) # This just reverses the y axis so low frequency is at the bottom
    ax.xaxis.set_ticks([0,len(delays)])
    ax.xaxis.set_ticklabels([0, -len(delays)/fs])
    plt.gca().invert_xaxis() # Invert the x-axis so Time (s) is on the x rather than Time delay (s)

    ax.yaxis.set_ticks([])

plt.tight_layout();
<Figure size 1000x300 with 3 Axes>

Problems with OLS

In looking at these weight matrices for each of the three electrodes, we can see that our linear model is performing decently well on held-out data (correlations up to 0.5, which for neural data is very good considering underlying measurement noise and physiological noise). However, the weight matrices themselves are not very interpretable. Ideally, we should be able to read these and interpret large positive values as features in the past that drive a strong neural response at time 0. That’s not so obvious here.

One big issue is that our stimulus features are highly correlated! Sounds (and spectrograms) don’t represent independent samples, they evolve relatively slowly over time. So one fix here is to use regularization -- in this case, we’ll use ridge regression! Autocorrelation in the stimulus produces highly correlated columns in the delay-embedded design matrix, and ridge stabilizes estimation when columns are correlated.

# For logging compute times, debug messages
import logging
logging.basicConfig(level=logging.DEBUG) 

# Regularization parameters (alphas - also sometimes called lambda)
alphas = np.logspace(2,8,15) # Gives you 15 values between 10^2 and 10^8
nalphas = len(alphas)

use_corr = True # Use correlation between predicted and validation set as metric for goodness of fit
single_alpha = False # If False, use best alpha for each electrode. Usually this is what you should do.
nfolds = 3 # How many folds of cross-validation we want to do to find the ridge parameter (this is a pretty low number)
chunklen = int(len(delays)*4) # We will randomize the data in chunks - we should not randomly choose 
                              # time points as that will remove correlation structure that we need to keep
nchunks = np.floor(0.2*xtraind.shape[0]/chunklen).astype('int')

nchans = ytrain.shape[1] # Number of electrodes/sensors


wt, corrs, valphas, allRcorrs, valinds, pred, Pstim = cv_ridge(xtraind_int, ytrain, xtestd_int, ytest, 
                                                                      alphas, nfolds, chunklen, nchunks, 
                                                                      use_corr=use_corr,  single_alpha = single_alpha, 
                                                                      use_svd=False)
INFO:ridge_corr:Selecting held-out test set..
INFO:ridge_corr:Doing Eigenvalue decomposition...
Cmode = False
Number of time points is greater than the number of features
Rstim shape (not cmode): 
(128874, 3201)
Covmat shape: 
(3201, 3201)
(3201, 3201) (3201, 128874) (128874, 3)
INFO:ridge_corr:Training: alpha=100.000, mean corr=0.29279, max corr=0.35569, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=268.270, mean corr=0.29445, max corr=0.35680, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=719.686, mean corr=0.29756, max corr=0.35885, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=1930.698, mean corr=0.30211, max corr=0.36185, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=5179.475, mean corr=0.30715, max corr=0.36514, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=13894.955, mean corr=0.31126, max corr=0.36732, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=37275.937, mean corr=0.31354, max corr=0.36703, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=100000.000, mean corr=0.31332, max corr=0.36366, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=268269.580, mean corr=0.30987, max corr=0.35686, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=719685.673, mean corr=0.30193, max corr=0.34511, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=1930697.729, mean corr=0.28603, max corr=0.32362, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=5179474.679, mean corr=0.25800, max corr=0.28702, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=13894954.944, mean corr=0.21857, max corr=0.23619, over-under(0.20)=2
INFO:ridge_corr:Training: alpha=37275937.203, mean corr=0.17604, max corr=0.18820, over-under(0.20)=0
INFO:ridge_corr:Training: alpha=100000000.000, mean corr=0.14619, max corr=0.16419, over-under(0.20)=0
INFO:counter:1/3 items complete (19.49 seconds/item, 00:00:38 remaining)
INFO:ridge_corr:Selecting held-out test set..
INFO:ridge_corr:Doing Eigenvalue decomposition...
Cmode = False
Number of time points is greater than the number of features
Rstim shape (not cmode): 
(128874, 3201)
Covmat shape: 
(3201, 3201)
(3201, 3201) (3201, 128874) (128874, 3)
INFO:ridge_corr:Training: alpha=100.000, mean corr=0.30239, max corr=0.37006, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=268.270, mean corr=0.30412, max corr=0.37146, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=719.686, mean corr=0.30726, max corr=0.37402, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=1930.698, mean corr=0.31170, max corr=0.37765, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=5179.475, mean corr=0.31654, max corr=0.38161, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=13894.955, mean corr=0.32087, max corr=0.38515, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=37275.937, mean corr=0.32443, max corr=0.38816, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=100000.000, mean corr=0.32685, max corr=0.39055, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=268269.580, mean corr=0.32640, max corr=0.39120, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=719685.673, mean corr=0.31960, max corr=0.38626, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=1930697.729, mean corr=0.30126, max corr=0.36817, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=5179474.679, mean corr=0.26836, max corr=0.33084, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=13894954.944, mean corr=0.22180, max corr=0.27328, over-under(0.20)=2
INFO:ridge_corr:Training: alpha=37275937.203, mean corr=0.16681, max corr=0.20258, over-under(0.20)=1
INFO:ridge_corr:Training: alpha=100000000.000, mean corr=0.12503, max corr=0.15051, over-under(0.20)=0
INFO:counter:2/3 items complete (18.87 seconds/item, 00:00:18 remaining)
INFO:ridge_corr:Selecting held-out test set..
INFO:ridge_corr:Doing Eigenvalue decomposition...
Cmode = False
Number of time points is greater than the number of features
Rstim shape (not cmode): 
(128874, 3201)
Covmat shape: 
(3201, 3201)
(3201, 3201) (3201, 128874) (128874, 3)
INFO:ridge_corr:Training: alpha=100.000, mean corr=0.30072, max corr=0.35952, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=268.270, mean corr=0.30229, max corr=0.36052, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=719.686, mean corr=0.30511, max corr=0.36222, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=1930.698, mean corr=0.30893, max corr=0.36429, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=5179.475, mean corr=0.31264, max corr=0.36598, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=13894.955, mean corr=0.31504, max corr=0.36702, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=37275.937, mean corr=0.31588, max corr=0.36804, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=100000.000, mean corr=0.31509, max corr=0.36930, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=268269.580, mean corr=0.31181, max corr=0.36921, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=719685.673, mean corr=0.30464, max corr=0.36482, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=1930697.729, mean corr=0.29028, max corr=0.35112, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=5179474.679, mean corr=0.26313, max corr=0.32107, over-under(0.20)=3
INFO:ridge_corr:Training: alpha=13894954.944, mean corr=0.22231, max corr=0.27313, over-under(0.20)=2
INFO:ridge_corr:Training: alpha=37275937.203, mean corr=0.17508, max corr=0.21549, over-under(0.20)=1
INFO:ridge_corr:Training: alpha=100000000.000, mean corr=0.14009, max corr=0.17190, over-under(0.20)=0
INFO:counter:3/3 items complete (17.57 seconds/item, 00:00:00 remaining)
INFO:ridge_corr:Finding best alpha for each electrode..
INFO:ridge_corr:Computing weights for each response using entire training set..
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3201)
Covmat shape: 
(3201, 3201)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Predicting responses for predictions set..
(3201, 3201) (3201, 161034) (161034, 3)

Check alpha values

We can use the function check_alphas to see what the best regularization parameter α\alpha is for each channel. This will plot each electrode as a curve, with each correlation value vs. alpha. For most “good” electrodes there will be a peak value in the middle of the range of alphas. If your model is consistently showing that the average best α\alpha is the smallest or largest value, you should widen your range appropriately and re-run the model.

fig = plt.figure()
plt.semilogx(alphas, allRcorrs.mean(2))  # Log space alphas
# Plot a line at the maximum alpha. This should be in the middle
plt.axvline(alphas[allRcorrs.mean(2).mean(1).argmax()])
plt.xlabel('Alpha')
plt.ylabel('Correlation')
plt.show()
<Figure size 640x480 with 1 Axes>

Show predictions

Now we will show the predicted versus actual activity for all electrodes. If these are well-modeled by the STRF, you should see that the predicted and actual activity look fairly similar. If a larger regularization value alpha was chosen, the predicted activity will tend to look smoother.

# How much time to show?
ntimes = int(30*fs) # Show 30 seconds of time
ntimes_start = int(5*fs)  # Start 5 seconds in since there is a lot of silence at the beginning
times = np.arange(ntimes_start, ntimes_start+ntimes)/fs

# Plot predictions vs. actual response
print("Prediction matrix shape: ", pred.shape)
print("Response matrix shape: ", ytest.shape)

fig = plt.figure(figsize=(10,5))
plt.subplot(nelecs+1,1,1)
plt.imshow(xtestd[ntimes_start:ntimes_start+ntimes,:nfeats].T, aspect='auto', cmap=cm.magma, interpolation='nearest')
xticks = np.arange(0, len(times), step=500)
plt.gca().set_xticks(xticks)
plt.gca().set_xticklabels([int(times[x]) for x in xticks])
plt.gca().invert_yaxis()

# Loop through the best channels
for i in np.arange(nelecs):
    # Get the predicted neural response 
    prediction = pred[ntimes_start:ntimes_start+ntimes, i]
    prediction = prediction/prediction.max() # Rescale to max
    
    actual_resp = ytest[ntimes_start:ntimes_start+ntimes,i]
    actual_resp = actual_resp/actual_resp.max(0) # Rescale to max
    
    plt.subplot(nelecs+1,1,i+2)
    plt.plot(times, actual_resp, color='k', label='actual')
    plt.plot(times, prediction.T, color='r',label='pred')
    
    plt.title(f'Channel {i}, r={corrs[i]:.2f}')
    plt.gca().set_xlim([ntimes_start/fs,(ntimes_start+ntimes)/fs])
    plt.legend(loc='upper right')
plt.tight_layout()
Prediction matrix shape:  (13563, 3)
Response matrix shape:  (13563, 3)
<Figure size 1000x500 with 4 Axes>

Visualizing the STRF filters

Here we will show the STRF filters we’ve derived for each channel. These filters show which spectrotemporal features of the stimulus best predict an increase or decrease in the observed neural activity.

# Plot all of the STRFs, using a separate regularization parameter for each (whichever gives the best performance)

fig = plt.figure(figsize=(10,3))
print(wt.shape)

for c in np.arange(nelecs):
    ax = fig.add_subplot(1,nelecs,c+1)
    strf = wt[1:,c].reshape(len(delays),-1)
    smax = np.abs(strf).max()
    plt.imshow(strf.T, vmin=-smax, vmax=smax, cmap = cm.RdBu_r, aspect='auto', interpolation='nearest')
    plt.title(f'elec {c}: r={corrs[c]:.2f}')
    if c==0:
        plt.xlabel('Time (s)')
        plt.ylabel('Freq.')
    ax.set_ylim(ax.get_ylim()[::-1]) # This just reverses the y axis so low frequency is at the bottom
    ax.xaxis.set_ticks([0,len(delays)])
    ax.xaxis.set_ticklabels([0, -len(delays)/fs])
    plt.gca().invert_xaxis() # Invert the x-axis so Time (s) is on the x rather than Time delay (s)

    ax.yaxis.set_ticks([])
    #title('alpha=%3.3g'%(alphas[best_alphas_indiv[c]]))
    #colorbar()

plt.tight_layout();
(3201, 3)
<Figure size 1000x300 with 3 Axes>

How does the selection of regularization parameter α\alpha affect the observed coefficients?

It is important to choose a range of α\alpha values and determine which yield the best predictions on held out data, since the regularization parameter itself can affect the structure of your STRF.

One of the things we did above was just to choose the best regularization parameter for each electrode separately. However, this choice does affect what the STRF filters look like, so here we will do an exercise where we fit all possible alphas and show the weights. This is just for educational purposes and is not typically needed in an analysis.

from ridge.ridge import eigridge

# Get weights for all alphas 
all_alpha_wts = []
for alpha in alphas:
    print("Fitting model for alpha=%.3g"%(alpha))
    tmp_wt = eigridge(xtraind, ytrain, alpha)
    all_alpha_wts.append(tmp_wt)

awt=np.array(all_alpha_wts)
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
Fitting model for alpha=100
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=268
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=720
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=1.93e+03
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
(3200, 3200) (3200, 161034) (161034, 3)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
Fitting model for alpha=5.18e+03
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=1.39e+04
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=3.73e+04
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=1e+05
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=2.68e+05
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=7.2e+05
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=1.93e+06
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=5.18e+06
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=1.39e+07
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=3.73e+07
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
INFO:ridge_corr:Doing Eigenvalue decomposition on the full stimulus matrix...
(3200, 3200) (3200, 161034) (161034, 3)
Fitting model for alpha=1e+08
Cmode = False
Number of time points is greater than the number of features
stim shape (not cmode): 
(161034, 3200)
Covmat shape: 
(3200, 3200)
INFO:ridge_corr:Computing weights
(3200, 3200) (3200, 161034) (161034, 3)

Calculating performance on the validation set for each alpha

Next, we need to calculate the predicted response to our validation set for our assessment of model performance. Normally we would only do this for the best alpha found in the previous step, but here we will calculate all STRFs for all channels and all alphas so we can compare the correlations later.

The predictions are normally returned by cv_ridge, but here we can also calculate them directly by getting the dot product between the delayed stimulus matrix and the STRF weights. This is exactly calculating Y^=Xβ\hat{Y} = X \beta.

print("Calculating predicted response to validation set")
wt_array = np.dstack(awt)
print(wt_array.shape)
vPred_alpha = [ [ np.dot(xtestd, wt_array[:,ch,alph]) for ch in np.arange(nchans)] for alph in np.arange(nalphas)]
vPred_alpha = np.array(vPred_alpha)
Calculating predicted response to validation set
(3200, 3, 15)

Get model performance for each alpha value

Since we didn’t get them for free, we will now also manually calculate the correlations between each prediction and the actual held out data vResp.

print("Calculating correlations on validation set")
vcorr  = [ [ np.corrcoef(vPred_alpha[alph][ch], ytest[:,ch])[0,1] for ch in np.arange(nchans)] for alph in np.arange(nalphas)]
vcorr = np.array(vcorr)
print("Done calculating correlations")
print("Correlation matrix shape: ", vcorr.shape)
Calculating correlations on validation set
Done calculating correlations
Correlation matrix shape:  (15, 3)

How does the choice of α\alpha affect the STRF structure?

Here we will plot the STRF arrays for the 5 best channels for each alpha regularization value. Each row is an electrode, each column is an alpha value. You will see that as the alpha value increases, the STRF weights become broader and smoother. The r-value should peak somewhere in the middle of the range.

Here we will indicate the best weight matrix and alphas with red titles. These are the weights and correlations returned by bootstrap_ridge.

# Show how regularization parameter changes STRFs
fig = plt.figure(figsize=(20,10))
fig.clf()
axes = [fig.add_subplot(nelecs,len(alphas),ii+1) for ii in range((len(alphas))*nelecs)]
p = 0
for ch in np.arange(nelecs): # loop through all electrodes
    for a in np.arange(len(alphas)): # loop through the alpha regularization parameter
        strf = wt_array[:,ch,a].reshape(len(delays),-1)
        smax = np.abs(strf).max()
        axes[p].imshow(strf.T,vmin=-smax, vmax=smax, cmap = cm.RdBu_r, aspect='auto', interpolation='nearest')
        axes[p].xaxis.set_ticks([])
        axes[p].yaxis.set_ticks([])
        axes[p].set_ylim(axes[p].get_ylim()[::-1]) # This just reverses the y axis so low frequency is at the bottom
        axes[p].set_xlim(axes[p].get_xlim()[::-1]) # Invert the x-axis so Time (s) is on the x rather than Time delay (s)
        axes[p].set_title(f'a={alphas[a]:.2f}\nr={vcorr[a,ch]:.3f}')
        if a == vcorr[:,ch].argmax():
            axes[p].set_title(f'a={alphas[a]:.2f}\nr={vcorr[a,ch]:.3f}', color='r')
        p+=1

plt.tight_layout()
#fig.subplots_adjust(hspace=.5, wspace=.8)
        
<Figure size 2000x1000 with 45 Axes>

Putting it together

Here you’ve now seen a real-world example of a time-lagged regression that incorporates all of the following topics we’ve discussed so far in class:

  1. Spectral analysis (using the spectrogram of the sound as a covariate in regression)

  2. Multiple linear regression with time lags

  3. Cross-validation

  4. Dealing with auto-correlated predictors

  5. OLS vs. ridge

References
  1. Holdgraf, C. R., Rieger, J. W., Micheli, C., Martin, S., Knight, R. T., & Theunissen, F. E. (2017). Encoding and Decoding Models in Cognitive Electrophysiology. Frontiers in Systems Neuroscience, 11. 10.3389/fnsys.2017.00061