Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 2 Notes - Characteristics of Time Series

Review / basic concepts

Stochastic process

White noise

Many real-world time series are a combination of underlying signal sts_t plus noise wtw_t. In some (nice) cases, wtw_t is white noise.

White noise is a special case of uncorrelated variables in sequence, e.g. xtx_t where t=1,2,3,...t=1,2,3,.... White noise, unlike most real time series, is iid, with mean 0 and variance σw2\sigma^2_w. One very useful case is white noise from a Gaussian distribution, where we can write: wtiid N(0,σw2)w_t \sim \mbox{iid } \mathcal{N}(0,\sigma^2_w).

If all time series could be described in this way, classical statistics would suffice.

What does white noise look like?

White noise time series

Moving average

One way of smoothing a time series (including white noise) is to average the value at a time point tt with its neighbors t1t-1 and t+1t+1 (or an even larger window from tpt-p to t+pt+p). For example:

vt=13(wt1+wt+wt+1)v_t = \frac{1}{3}(w_{t-1}+w_t+w_{t+1})

or more generally

vt=1nk=n12n12wt+kv_t = \frac{1}{n} \displaystyle\sum_{k=-\frac{n-1}{2}}^{\frac{n-1}{2}} w_{t+k} for odd nn

This is inherently a low-pass filter (lets low frequency signals pass, gets rid of high frequency). It preserves trends slower than n\sim n samples, and suppresses oscillations with period n\lesssim n samples.

When do we use this?

We might use this to reveal trends in noisy time series data, remove fluctuations we consider “noise”, or do simple online smoothing (especially if we choose the window to include only data in the past). However, this is the most basic form of smoothing and is typically replaced by more complex methods such as exponential moving averages, Kalman filters, median filter, etc.

Autoregression

Another flavor of dataset we might see is data that comes from an autoregressive process. Autoregression = regression or prediction based on past values of the same time series (“auto”).

This might look something like:

xt=1.5xt10.75xt2+wtx_t = 1.5 x_{t-1} - 0.75 x_{t-2} + w_t

You will see what this looks like in Lab 1. Because the data at tt relies on t1t-1 and t2t-2 (the prior two data points), this is an AR(2) process. Generating data in this way can result in oscillatory behavior.

Random Walk

Another simple but important / helpful extension to the idea of white noise is the random walk. The simple random walk is defined as:

xt=xt1+wtx_t = x_{t-1} + w_t with initial condition x0=0x_0 = 0.

Equivalently, xt=j=1twjx_t = \displaystyle\sum_{j=1}^t w_j

The expected value of any time point E[xt]=E[x0]=0\mathbb{E}[x_t] = \mathbb{E}[x_0] = 0, so the mean does not vary over time.

The variance of a random walk, on the other hand, is additive and grows linearly with time:

Var(xt)=Var(j=1twj)=tσ2\operatorname{Var}(x_t) = \operatorname{Var}(\displaystyle\sum_{j=1}^t w_j) = t \sigma^2

Are random walks iid? No! Each time point is not independent, but depends on the past time point. They’re also not identically distributed, since variance is changing with time.

What are some real examples?

Random Walk with Drift

More frequently, we see the concept of the random walk with drift (δ\delta). This extends the idea of the random walk.

xt=δ+xt1+wtx_t = \delta + x_{t-1} + w_t with initial condition x0=0x_0 = 0.

Equivalently, xt=δt+j=1twjx_t = \delta t + \displaystyle\sum_{j=1}^t w_j

Random walk with and without drift

Here, the expected value is related directly to the drift term: E[xt]=δt+x0\mathbb{E}[x_t] = \delta t + x_0. However, if we know the drift and we can condition on a prior observations (xsx_s, we can get a conditional expectation: E[xtxs]=xs+δ(ts)\mathbb{E}[x_t | x_s] = x_s + \delta (t-s)

Again, the variance scales with the number of time points as noise is accumulating step by step.

What are some real examples?

Drift diffusion models

Drift diffusion models are a cognitive model explaining how people accumulate evidence to make decisions. In these models, xtx_t is a decision variable, and the drift δ\delta represents the mean evidence gained per unit time. Usually we also set boundaries for a binary choice: -1 for an incorrect choice or 1 for a correct choice. We can then calculate the reaction time for the decision, which is the first time at which xtx_t hits either of the choice values, at which point the random walk stops. For no drift, we expect a long reaction time and a 50/50 probability of the correct or incorrect choice. High confidence / high information can be represented by a high value of δ\delta. For example, your δ\delta values may be lower if you are doing a visual discrimination task under a lot of noise (uncertainty). δ\delta values could also be increased by motivation or by higher certainty information.

An example of neurons performing something that looks like evidence accumulation is shown from Gold & Shadlen (2007). This is from a decision making task in which a monkey watches movies of moving dots, where a certain percentage of the dots move coherently, making the task either very easy (all the dots moving the same way) or not (very few coherent dots).

neurons during a dot motion task show drift diffusion like characteristics

Signal in noise

More generally, we can see other examples of periodic signals contaminated by white noise. For example:

xt=Acos(2πωt+ϕ)+wtx_t = A*\cos(2\pi\omega t+\phi) + w_t

Where AA is the amplitude of the signal, ω\omega is the frequency of the oscillation, and ϕ\phi is a phase shift.

The ratio of the amplitude of the signal to the standard deviation of the noise determines the SNR - signal to noise ratio. The larger the SNR, the easier it is to recover our signal.

Later, we will use various forms of regression to try to recover these signals!

Next week: