Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 22 Notes - State Space Models Part 2

Continuation of State Space Models

Examples

To remind you, we talked a bit about some examples last time of where state space models might be useful:

Advantages of state space models

What are the advantages of state space models as opposed to other models we’ve discussed in this class?

  1. They deal well with missing data and don’t require every time point to be observed. You can still get estimates of the latent state for those missing time points.

  2. We can separate process noise from measurement noise. In ARIMA models, we have just one noise term (innovations/shocks). For real scientific applications, we may have noisy sensors where the measurement reading’s error is distinct from underlying noise in the true signal.

  3. We can have time-varying parameters (unlike fixed β\beta in a regression model).

Filtering, smoothing, and forecasting

For state space models, we want to estimate our underlying unobserved signal xtx_t given the data y1:s=y1,,ysy_{1:s} = {y_1, \dots, y_s} to time ss. In practice, the steps for the state space models include:

  1. Filtering, where we estimate xtx_t using measurements up through time tt (here ss=tt)

  2. Prediction/forecasting, where we have s<ts<t and want to estimate new data

  3. Smoothing, where s>ts>t. This allows us to estimate xtx_t using the entire dataset, including observations after tt. This can be used to better estimate missing values.

We can write out the unobserved signal xtx_t using the convention:

xts=E(xty1:s)x_t^s = E(x_t | y_{1:s})

which again, is xtx_t given yy from time 1 to time ss, which can be =t=t (filtering), <t<t (prediction), or >t>t (smoothing).

We also have the Prediction error covariance (using the notation from Shumway and Stoffer and the original Kalman 1960 paper):

Pt1,t2s=E(xt1xt1s)(xt2xt2s)P_{t1,t2}^s = E{(x_{t1}-x_{t1}^s)(x_{t2}-x_{t2}^s)^\prime}

from this, we will look at the Kalman Filter, which gives filtering and forecasting equations (Kalman 1960). As it turns out, modifications of this algorithm were most recently used in the Artemis-II mission to send a crewed flight into lunar orbit and back to Earth. The original Apollo 8 mission and subsequent missions to the moon all used improvements to this original algorithm, in the latest case using 4 navigation Extended Kalman Filters (EKFs) as part of the navigation system. From a technical report related to Artemis-1:

The Artemis Program is NASA’s campaign to explore the Moon and beyond. Artemis-1, the uncrewed exoLEO test flight of the Orion spacecraft, was completed in 2022. There are four navigation Extended Kalman Filters (EKFs) that are part of the Orion navigation system. The Atmospheric Extended Kalman Filter (ATMEKF) estimates the vehicle position, velocity, and attitude (referred to as the vehicle state) during the ascent and entry phases of flight. Once Orion is outside of Earth’s atmosphere, the Earth Orbit Extended Kalman Filter (EOEKF) and Cislunar Extended Kalman Filter (CLEKF) estimate the translational states, depending on the phase of flight, while the Attitude Extended Kalman Filter (ATTEKF) estimates the rotational state of the vehicle. The Kalman filters propagate the vehicle state forward in time using a combination of dynamics models and the output data from the Inertial Measurement Unit (IMU). The filters update the vehicle states and associated uncertainties, in the form of the covariance matrix, using pseudorange measurements from GPS (in ATMEKF/EOEKF), optical navigation measurements of the Earth or Moon (in CLEKF), and star tracker measurements (in ATTEKF). Simultaneously, the Kalman filters estimate error sources in the sensors, which are included in the state vectors as Exponentially Correlated Random Variables (ECRVs).

One note here - the Kalman Filter we’ll show an example for is for a linear system, but the EKF allows us to extend this idea into systems with nonlinear dynamics.

The Kalman Filter

For the state space model with:

(Hidden) State equation:

xt=Φxt1+Υut+wtx_t = \Phi x_{t-1} + \Upsilon u_t + w_t

and

Observation equation:

yt=Atxt+Γut+vty_t = A_t x_t+\Gamma u_t + v_t

with initial conditions x00=μ0x_0^0=\mu_0 and P00=σ0P_0^0 = \sigma_0 for t=1,,nt=1,\dots,n:

xtt1=Φxt1t1+Υut,Ptt1=ΦPt1t1Φ+Q,\begin{aligned} x_t^{t-1} &= \Phi x_{t-1}^{t-1} + \Upsilon u_t,\\ P_t^{t-1} &= \Phi P_{t-1}^{t-1}\Phi^\prime + Q, \end{aligned}

with

xtt=xtt1+Kt(ytAtxtt1Γut),Ptt=[IKtAt]Ptt1,\begin{aligned} x_t^t &= x_t^{t-1} + K_t(y_t-A_t x_t^{t-1}-\Gamma u_t),\\ P_t^t &= [I-K_t A_t]P_t^{t-1}, \end{aligned}

where

Kt=Ptt1At[AtPtt1At+R]1K_t=P_t^{t-1} A_t^\prime [A_tP_t^{t-1}A_t^\prime + R]^{-1}

is called the Kalman gain. The Kalman gain weighs how much we should trust the measurement vs. the prediction. Noisy measurements (with big RR) will shrink this gain, while uncertain predictions with big Ptt1P_t^{t-1}

Running the Kalman Filter

Given a model (Φ,A,Q,R)(\Phi, A, Q,R) and initial conditions (μ0,σ0)(\mu_0, \sigma_0), the filter moves forward through the data one time step at a time.

At each tt we do two things:

  1. Prediction: propagate the previous estimate forward using the dynamics xt1t1xtt1x_{t-1}^{t-1} \rightarrow x_t^{t-1} (Uncertainty grows)

  2. Update: Correct the prediction using new measurements yty_t through the Kalman gain. (Uncertainty shrinks)

The Kalman Smoother

The Kalman Smoother (in the accompanying notebook this is implemented using the Rauch-Tung-Striebel/RTS algorithm, published in 1965). This can be used post hoc to refine state estimates retrospectively, using all data (including future data)!

In practice, this runs using a forward pass (the Kalman filter), followed by a backward pass, which modifies earlier estimates based on the filter’s stored outputs.

When to use the filter vs. smoother

It’s helpful to use the Kalman filter for real-time estimates where you can’t use future data because it doesn’t exist yet. Computationally, the Kalman filter is also very cheap, because you only need data from the previous time step for your estimates.

On the other hand, the Kalman smoother is helpful for post hoc analyses where getting the best possible estimate at each time point is your goal. For example, in the bone marrow transplant data, if we are trying to look at platelet, WBC, and hematocrit data after the fact to make some claims about patient outcomes, we may want to use the smoother to get better estimates.

Examples

Next, we’ll show an example using the Kalman filter and smoother for estimating 2D trajectory data in the accompanying Lecture22 notebook.