Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

State Space Models

Now we’re going to discuss state space models, which are an extension of the ideas we’ve talked about so far in class. With state space models, we measure some noisy outputs yty_t that are generated by hidden states xtx_t that evolve over time, also called latent variables.

Diagram of a state-space model, with states x_t and observations $y_t

In ordinary regression, we have yt=Xβ+ϵy_t = X \beta + \epsilon, which assumes β\beta is a fixed unknown constant (that we will learn through some fitting procedure). But what if β\beta drifts or changes over time? For example, what if we have βt=βt1+ηt\beta_t = \beta_{t-1} + \eta_t and yt=Xβt+ϵy_t = X\beta_t + \epsilon. This is now a state space model, where the “state” is the time-varying coefficient βt\beta_t.

For ARIMA models, we discussed how an AR(1) process is given by xt=ϕxt1+wtx_t = \phi x_{t-1} + w_t. Suppose we don’t observe xtx_t directly, but we only observe yt=xt+vty_t = x_t + v_t, which is our signal plus some measurement noise. This could be described by a state space model with state equation xt=ϕxt1+wtx_t = \phi x_{t-1} + w_t and observation equation yt=xt+vty_t = x_t + v_t.

Overall, state-space models are characterized by two principles:

  1. There is a hidden or latent process xtx_t called the state process. This is assumed to be a Markov process - which means the future xs;s>t{x_s; s>t} and past xs,s<t{x_s, s<t} are independent conditional on the present, xtx_t.

  2. The observations yty_t are independent given the states xtx_t. That is, the dependence among observations is generated by states.

Extension of AR model to VAR

Before we get into state space models, we should briefly mention an extension of the AR model to multiple dimensions, which is the VAR model (vector autoregressive model). Everything we’ve talked about so far with AR has been for a single series. Sometimes, however, we may have kk multiple series that influence each other. For the VAR model, we write:

xt=α+Φxt1+wtx_t = \alpha + \Phi x_{t-1} + w_t

,

Where xtx_t is now a vector and Φ\Phi is a (k×k)(k\times k) transition matrix that expresses the dependence of xtx_t on xt1x_{t-1}. You can read more about this in Chapter 5.5 of Shumway and Stoffer. State space models are more general extension of this.

Linear Gaussian Model

We can write the basic form of a linear Gaussian state-space model, also called the dynamic linear model (DLM) with the following state equation:

xt=Φxt1+Υut+wtx_t = \Phi x_{t-1} + \Upsilon u_t + w_t

This is an order one, pp-dimensional vector autoregression where wtw_t are p×1p \times 1 white Gaussian noise, wtiidNp(0,Q)w_t \overset{iid}\sim N_p(0,Q). At t=0t=0, we start with a normal vector x0Np(μ0,σ0)x_0 \sim N_p(\mu_0, \sigma_0).

We do not observe this state vector xtx_t directly, instead we see a linearly transformed version of it with noise added:

Observation equation:

yt=Atxt+Γut+vty_t = A_t x_t+\Gamma u_t + v_t

So external inputs can influence yty_t through either Υut\Upsilon u_t (through the state equation) or Γut\Gamma u_t (through the observation equation). As a concrete example, say xtx_t is a person’s true blood pressure, and yty_t is the blood pressure measured by a blood pressure cuff. Say a drug was administered at time tt - this would then change the state equation, because it will modify the actual blood pressure. On the other hand, an input like “which nurse took the reading” or “which blood pressure cuff was used” would belong in the observation equation, since it will affect what the meter reports, but not the underlying physiological cause. Γ=0\Gamma=0 is commonly used, so inputs only drive the state. On the other hand Υ=0\Upsilon=0 might show up when you’re trying to model known measurement artifacts.

Bone marrow transplant example

As an example, we can look at changes in different biomedical markers when a cancer patient undergoes a bone marrow transplant. We have three variables:

These three variables measure distinct aspects of bone marrow function. A transplant is successful when the new marrow is incorporated and starts producing all three of these.

Unfortunately, as is the case with many real-world datasets (especially those with longitudinal follow up), many data points are missing - approximately 40% in this case. The missing values mostly occur after the 35th day. We can use a state space approach to model these three variables and estimate the missing values. Prior work has shown that platelet count at 100 days post transplant is a good indicator of subsequent long term survival, so we may also want to look at this.

We can model these three variables using the state equation:

(xt1xt2xt3)=(ϕ11ϕ12ϕ13ϕ21ϕ22ϕ23ϕ31ϕ32ϕ33)(xt1,1xt1,2xt1,3)+(wt1wt2wt3)\begin{aligned} \begin{pmatrix} x_{t1} \\ x_{t2} \\ x_{t3} \end{pmatrix} &=& \begin{pmatrix} \phi_{11} & \phi_{12} & \phi_{13} \\ \phi_{21} & \phi_{22} & \phi_{23}\\ \phi_{31} & \phi_{32} & \phi_{33}\\ \end{pmatrix} \begin{pmatrix} x_{t-1,1} \\ x_{t-1,2} \\ x_{t-1,3} \\ \end{pmatrix} + \begin{pmatrix} w_{t1}\\ w_{t2}\\ w_{t3}\\ \end{pmatrix} \end{aligned}

The diagonal values of the Φ\Phi matrix give you how much each marker’s own recent value predicts its next value. For example, ϕ11\phi_{11} near 1 means the marker WBC is highly persistent and changes slowly from day to day. On the other hand a value near 0 would indicate today’s value has little to do with yesterday (which would be strange for blood markers and might signal a problem with the data collection).

The off-diagonal entries ϕij\phi_{ij} are coefficients that show how much yesterdays value of marker jj predicts today’s value of marker ii. The matrices thus represent three stacked regressions, for example:

xt1=ϕ11xt1,1+ϕ12xt1,2+ϕ13xt1,3+wt1x_{t1} = \phi_{11} x_{t-1,1} + \phi_{12} x_{t-1, 2} + \phi_{13} x_{t-1,3} + w_{t1}

Meaning today’s log(WBC) value is a weighted sum of yesterday’s WBC, platelets, and hematocrit, plus some noise.

We then have the observation equations yt=Atxt+vty_t = A_t x_t + v_t, where the 3×33 \times 3 matrix AtA_t is either the identity matrix or zero matrix depending on whether a blood sample was taken on that day.

For such a model, we would fit the unknown values through expectation maximization (EM) or through maximum likelihood methods:

In python, we can use statsmodels.tsa.statespace to fit these models.

Filtering, smoothing, and forecasting

For state space models, we want to estimate our underlying unobserved signal xtx_t given the data y1:s=y1,,ysy_{1:s} = {y_1, \dots, y_s} to time ss. In practice, the steps for the state space models include:

  1. Filtering, where we estimate xtx_t using measurements up through time tt (here ss=tt)

  2. Prediction/forecasting, where we have s<ts<t and want to estimate new data

  3. Smoothing, where s>ts>t. This allows us to estimate xtx_t using the entire dataset, including observations after tt. This can be used to better estimate missing values.

Other examples

Advantages of state space models

What are the advantages of state space models as opposed to other models we’ve discussed in this class?

  1. They deal well with missing data and don’t require every time point to be observed. You can still get estimates of the latent state for those missing time points.

  2. We can separate process noise from measurement noise. In ARIMA models, we have just one noise term (innovations/shocks). For real scientific applications, we may have noisy sensors where the measurement reading’s error is distinct from underlying noise in the true signal.

  3. We can have time-varying parameters (unlike fixed β\beta in a regression model).