Topics: Derivation and properties of Kalman filter; Extended Kalman filter

10.1 Updating LLSE

In many situations, one keeps making observations and one wishes to update the estimate accordingly, hopefully without having to recompute everything from scratch. That is, one hopes for a method that enables to calculate L[X|Y, Z] from L[X|Y] and Z.

The key idea is in the following result.

Theorem 10.1 (LLSE Update—Orthogonal Additional Observation)

Assume that X, Y , and Z are zero-mean and that Y and Z are orthogonal. Then

$$\displaystyle \begin{aligned} L[\mathbf{X} | \mathbf{Y}, \mathbf{Z}] = L[\mathbf{X}|\mathbf{Y}] + L[\mathbf{X}|\mathbf{Z}]. \end{aligned} $$
(10.1)

\({\blacksquare }\)

Proof

Figure 10.1 shows why the result holds. To be convinced mathematically, we need to show that the error

$$\displaystyle \begin{aligned} \mathbf{X} - (L[\mathbf{X}|\mathbf{Y}] + L[\mathbf{X}|\mathbf{Z}]) \end{aligned}$$

is orthogonal to Y and to Z. To see why it is orthogonal to Y, note that the error is

$$\displaystyle \begin{aligned} (\mathbf{X} - L[\mathbf{X}|\mathbf{Y}]) - L[\mathbf{X} | \mathbf{Z}]. \end{aligned}$$

Now, the term between parentheses is orthogonal to Y, by the projection property of L[X|Y]. Also, the second term is linear in Z, and is therefore orthogonal to Y since Z is orthogonal to Y. One shows that the error is orthogonal to Z in the same way. □

Fig. 10.1
figure 1

The LLSE is easy to update after an additional orthogonal observation

A simple consequence of this result is the following fact.

Theorem 10.2 (LLSE Update—General Additional Observation)

Assume that X, Y , and Z are zero-mean. Then

$$\displaystyle \begin{aligned} L[\mathbf{X} | \mathbf{Y}, \mathbf{Z}] = L[\mathbf{X}|\mathbf{Y}] + L[\mathbf{X}|\mathbf{Z} - L[\mathbf{Z} | \mathbf{Y}]]. \end{aligned} $$
(10.2)

\({\blacksquare }\)

Proof

The idea here is that one considers the innovation \(\tilde {\mathbf {Z}} := \mathbf {Z} - L[\mathbf {Z} | \mathbf {Y}]\), which is the information in the new observation Z that is orthogonal to Y.

To see why the result holds, note that any linear combination of Y and Z can be written as a linear combination of Y and \(\tilde {\mathbf {Z}}\). For instance, if L[Z|Y] = C Y, then

$$\displaystyle \begin{aligned} A \mathbf{Y} + B \mathbf{Z} = A \mathbf{Y} + B (\mathbf{Z} - C \mathbf{Y}) + B C \mathbf{Y} = (A + BC) \mathbf{Y} + B \tilde{\mathbf{Z}}. \end{aligned}$$

Thus, the set of linear functions of Y and Z is the same as the set of linear functions of Y and \(\tilde {\mathbf {Z}}\), so that

$$\displaystyle \begin{aligned} L[\mathbf{X} | \mathbf{Y}, \mathbf{Z}] = L[\mathbf{X} | \mathbf{Y}, \tilde{\mathbf{Z}}]. \end{aligned}$$

Thus, (10.2) follows from Theorem 10.1 since Y and \(\tilde {\mathbf {Z}}\) are orthogonal. □

10.2 Derivation of Kalman Filter

We derive the equations for the Kalman filter, as stated in Theorem 9.8. For convenience, we repeat those equations here:

$$\displaystyle \begin{aligned} & \hat X(n) = A \hat X(n-1) + K_n [Y(n) - CA \hat X(n-1)] {} \end{aligned} $$
(10.16)
$$\displaystyle \begin{aligned} & K_n = S_n C'[CS_nC' + \varSigma_W]^{-1} {} \end{aligned} $$
(10.17)
$$\displaystyle \begin{aligned} & S_n = A \varSigma_{n-1}A' + \varSigma_V {} \end{aligned} $$
(10.18)
$$\displaystyle \begin{aligned} & \varSigma_n = (I - K_nC)S_n {} \end{aligned} $$
(10.19)

and

$$\displaystyle \begin{aligned} S_n = \mbox{cov}(X(n) - A \hat X(n-1)) \mbox{ and } \varSigma_n = \mbox{cov}(X(n) - \hat X(n)). \end{aligned} $$
(10.20)

In the algebra, we repeatedly use the fact that

$$\displaystyle \begin{aligned} \mbox{cov}(BV, DW) = B~ \mbox{cov}(V, W)D' \end{aligned}$$

and also that if V  and W are orthogonal, then

$$\displaystyle \begin{aligned} \mbox{cov}(V + W) = \mbox{cov}(V) + \mbox{cov}(W). \end{aligned}$$

The algebra is a bit tedious, but the key steps are worth noting.

Let

$$\displaystyle \begin{aligned} Y^n = (Y(0), \ldots, Y(n)). \end{aligned}$$

Note that

$$\displaystyle \begin{aligned} L\left[X(n) | Y^{n-1}\right] = L\left[AX(n-1) + V(n-1) | Y^{n-1}\right] = A \hat X(n-1). \end{aligned}$$

Hence,

$$\displaystyle \begin{aligned} L\left[Y(n)|Y^{n-1}\right] = L\left[ CX(n) + W(n)|Y^{n-1}\right] = CL\left[X(n)|Y^{n-1}\right] = CA\hat X(n-1), \end{aligned}$$

so that, by Theorem 10.2,

$$\displaystyle \begin{aligned} Y(n) - L\left[Y(n)|Y^{n-1}\right] = Y(n) - CA\hat X(n-1). \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} \hat X(n) &= L[X(n) | Y^{n}] = L\left[X(n) | Y^{n-1}\right] + L\left[X(n) | Y(n) - L\left[Y(n)|Y^{n-1}\right]\right]\\ & = A \hat X(n-1) + K_n\left[Y(n) - CA\hat X(n-1)\right]. \end{aligned} $$

This derivation shows that (10.16) is a fairly direct consequence of the formula in Theorem 10.2 for updating the LLSE.

The calculation of the gain K n is a bit more complex. Let

$$\displaystyle \begin{aligned} \tilde Y(n) = Y(n) - L\left[Y(n)|Y^{n-1}\right] = Y(n) - CA\hat X(n-1). \end{aligned}$$

Then

$$\displaystyle \begin{aligned} K_n = \mbox{cov}\left(X(n), \tilde Y(n)\right) \mbox{cov}\left(\tilde Y(n)\right)^{-1}. \end{aligned}$$

Now,

$$\displaystyle \begin{aligned} \mbox{cov}\left(X(n), \tilde Y(n)\right) = \mbox{cov}\left(X(n) - L\left[X(n) | Y^{n-1}\right], \tilde Y(n)\right), \end{aligned}$$

because \(\tilde Y(n)\) is orthogonal to Y n−1. Also,

$$\displaystyle \begin{aligned} & \mbox{cov}(X(n) - L\left[X(n) | Y^{n-1}\right], \tilde Y(n)) \\ & \quad = \mbox{cov}(X(n) - A \hat X(n-1), Y(n) - CA\hat X(n-1)) \\ & \quad = \mbox{cov}(X(n) - A \hat X(n-1), CX(n) + W(n) - CA\hat X(n-1)) \\ & \quad = S_n C', \end{aligned} $$

by (10.20).

To calculate \(\mbox{cov}(\tilde Y(n))\), we note that

$$\displaystyle \begin{aligned} \mbox{cov}(\tilde Y(n)) = \mbox{cov}\left(CX(n) + W(n) - C L\left[X(n)|Y^{n-1}\right]\right) = CS_nC' + \varSigma_W. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} K_n = S_n C'\left[CS_nC' + \varSigma_W\right]^{-1}. \end{aligned}$$

To show (10.18), we note that

$$\displaystyle \begin{aligned} & S_n = \mbox{cov}\left(X(n) - L\left[X(n)|Y^{n-1}\right]\right) \\ &~~~ = \mbox{cov}\left(AX(n-1) + V(n-1) - A \hat X(n-1)\right) \\ &~~~ = A \varSigma_{n-1} A' + \varSigma_V. \end{aligned} $$

Finally, to derive (10.19), we calculate

$$\displaystyle \begin{aligned} \varSigma_n = \mbox{cov}\left(X(n) - \hat X(n)\right). \end{aligned}$$

We observe that

$$\displaystyle \begin{aligned} & X(n) - L[X(n)|Y^n] = X(n) - A \hat X(n-1) - K_n \left[Y(n) - CA \hat X(n-1)\right] \\ & \quad = X(n) - A\hat X(n-1) - K_n\left[C X(n) + W(n) - CA \hat X(n-1)\right] \\ & \quad = [I - K_nC]\left[X(n) - A \hat X(n-1)\right] - K_n W(n), \end{aligned} $$

so that

$$\displaystyle \begin{aligned} & \varSigma_n = [I - K_nC] S_n [I - K_n C]' + K_n \varSigma_W K_n^{\prime}\\ & \quad = S_n - 2 K_nCS_n + K_n \left[C S_n C' + \varSigma_W\right] K_n^{\prime} \\ & \quad = S_n - 2 K_nCS_n + K_n \left[CS_nC' + \varSigma_W\right] \left[C S_n C' + \varSigma_W\right]^{-1}CS_n \mbox{ by (10.17)}\\ & \quad = S_n - K_nCS_n, \end{aligned} $$

as we wanted to show.

\({\square }\)

10.3 Properties of Kalman Filter

The goal of this section is to explain and justify the following result. The terms observable and reachable are defined after the statement of the theorem.

Theorem 10.3 (Properties of the Kalman Filter)

  1. (a)

    If (A, C) is observable, then Σ n is bounded. Moreover, if Σ 0 = 0 , then

    $$\displaystyle \begin{aligned} \varSigma_n \rightarrow \varSigma \mathit{\mbox{ and }} K_n \rightarrow K, \end{aligned} $$
    (10.37)

    where Σ is a finite matrix.

  2. (b)

    Also, if in addition, \((A, \varSigma _V^{1/2})\) is reachable, then the filter with K n = K is such that the covariance of the error also converges to Σ.

\({\blacksquare }\)

We explain these properties in the subsequent sections. Let us first make a few comments.

  • For some systems, the errors grow without bound. For instance, if one does not observe anything (e.g., C = 0) and if the system is unstable (e.g., X(n) = 2X(n − 1) + V (n)), then Σ n goes to infinity. However, (a) says that “if the observations are rich enough,” this does not happen: one can track X(n) with an error that has a bounded covariance.

  • Part (b) of the theorem says that in some cases, one can use the filter with a constant gain K without having a bigger error, asymptotically. This is very convenient as one does not have to compute a new gain at each step.

10.3.1 Observability

Are the observations good enough to track the state with a bounded error covariance? Before stating the result, we need a precise notion of good observations.

Definition 10.1 (Observability)

We say that (A, C) is observable if the null space of

$$\displaystyle \begin{aligned} \left[ \begin{array}{c} C\\ CA\\ \vdots \\ CA^d \end{array} \right] \end{aligned}$$

is {0}. Here, d is the dimension of X(n). A matrix M has null space {0} if {0} is the only vector v such that M v = 0. ◇

The key result is the following.

Lemma 10.4 (Observability Implies Bounded Error Covariance)

  1. (a)

    If the system is observable, then Σ n is bounded.

  2. (b)

    If in addition, Σ 0 = 0, then Σ n converges to some finite Σ.

Proof

  1. (a)

    Observability implies that there is only one X(0) that corresponds to (Y (0), …, Y (d)) if the system has no noise. Indeed, in that case,

    $$\displaystyle \begin{aligned} X(n) = AX(n-1) \mbox{ and } Y(n) = CX(n). \end{aligned}$$

    Then,

    $$\displaystyle \begin{aligned} X(1) = AX(0), X(2) = A^2X(0), \ldots, X(d) = A^{d - 1} X(0), \end{aligned}$$

    so that

    $$\displaystyle \begin{aligned} Y(0) = CX(0), Y(1) = CAX(0), \ldots, Y(d-1) = CA^{d - 1}X(0). \end{aligned}$$

    Consequently,

    $$\displaystyle \begin{aligned} \left[ \begin{array}{c} Y(0)\\ Y(1)\\ \vdots \\ Y(d) \end{array} \right] = \left[ \begin{array}{c} C\\ CA\\ \vdots \\ CA^{d - 1} \end{array} \right] X(0). \end{aligned}$$

    Now, imagine that there are two different initial states, say X(0) and that give the same outputs Y (0), …, Y (d). Then,

    so that

    The observability property implies that .

    Thus, if (A, C) is observable, one can identify the initial condition X(0) uniquely after d + 1 observations of the output, when there is no noise. Hence, when there is no noise, one can then determine X(1), X(2), … exactly. Thus, when (A, C) is observable, one can determine the state X(n) precisely from the outputs.

    However, our system has some noise. If (A, C) is observable, we are able to identify X(0) from Y (0), …, Y (d), up to some linear function of the noise that has affected those outputs, i.e., up to a linear function of {V (0), …, V (d − 1), W(0), …, W(d)}. Consequently, we can determine X(d) from Y (0), …, Y (d), up to some linear function of {V (0), …, V (d − 1), W(0), …, W(d)}. Similarly, we can determine X(n) from Y (n − d), …, Y (n), up to some linear function of {V (n), …, V (n + d − 1), W(n − d), …, W(n)}.

    This implies that the error between X(n) and \(\hat X(n)\) is a linear combination of d noise contributions, so that Σ n is bounded.

  2. (b)

    One can show that if Σ 0 = 0, i.e., if we know X(0), then Σ n increases in the sense that Σ n − Σ n−1 is nonnegative definite. Being bounded and increasing implies that Σ n converges, and so does K n.

10.3.2 Reachability

Assume that Σ V = QQ′. We say that (A, Q) is reachable if the rank of

$$\displaystyle \begin{aligned}{}[Q, AQ, \ldots, A^{d - 1} Q] \end{aligned}$$

is full. To appreciate the meaning of this property, note that we can write the state equations as

$$\displaystyle \begin{aligned} X(n) = A X(n-1) + Q \eta_n, \end{aligned}$$

where cov(η n) = I. That is, the components of η are orthogonal. In the Gaussian case, the components of η are N(0, 1) and independent. If (A, Q) is reachable, this means that for any \(\mathbf {x} \in \Re ^d\), there is some sequence η 0, …, η d such that if X(0) = 0, then X(d) = x. Indeed,

$$\displaystyle \begin{aligned} X(d) = \sum_{k = 0}^d A^k Q \eta_{d - k} = \left[Q, AQ, \ldots, A^{d - 1} Q\right] \left[ \begin{array}{c} \eta_d \\ \eta_{d-1} \vdots \\ \eta_0 \end{array} \right]. \end{aligned}$$

Since the matrix is full rank, the span of its columns is \(\Re ^d\), which means precisely that there is a linear combination of these columns that is equal to any given vector in \(\Re ^d\).

The proof of part (b) of the theorem is a bit too involved for this course.

10.4 Extended Kalman Filter

The Kalman filter is often used for nonlinear systems. The idea is that if the system is almost linear over a few steps, then one may be able to use the Kalman filter locally and change the matrices A and C as the estimate of the state changes.

The model is as follows:

$$\displaystyle \begin{aligned} & X(n+1) = f(X(n)) + V(n) \\ & Y(n+1) = g(X(n+1)) + W(n+1). \end{aligned} $$

The extended Kalman filter is then

$$\displaystyle \begin{aligned} \hat X(n+1) & = f\left(\hat X(n)\right) + K_n\left[Y(n+1) - g\left(f(\hat X(n))\right)\right] \\ K_n &= S_nC_n^\prime \left[C_nS_nC_n^\prime + \varSigma_W\right]^{-1} \\ S_n & = A_n \varSigma_n A_n^\prime + \varSigma_V \\ \varSigma_{n+1} & = [I - K_n C_n] S_n, \end{aligned} $$

where

$$\displaystyle \begin{aligned}{}[A_n]_{ij} = \frac{\partial}{\partial x_j} f_i\left(\hat X(n)\right) \mbox{ and } [C_n]_{ij} = \frac{\partial}{\partial x_j} g_i \left(\hat X(n)\right). \end{aligned}$$

Thus, the idea is to linearize the system around the estimated state value and then apply the usual Kalman filter.

Note that we are now in the realm of heuristics and that very little can be said about the properties of this filter. Experiments show that it works well when the nonlinearities are small, whatever this means precisely, but that it may fail miserably in other conditions.

10.4.1 Examples

Tracking a Vehicle

In this example, borrowed from “Eric Feron, Notes for AE6531, Georgia Tech.”, the goal is to track a vehicle that moves in the plane by using noisy measurements of distances to 9 points \(p_i \in \Re ^2\). Let \(p(n) \in \Re ^2\) be the position of the vehicle and \(u(n) \in \Re ^2\) be its velocity at time n ≥ 0.

We assume that the velocity changes accruing to a known rule, except for some random perturbation. Specifically, we assume that

$$\displaystyle \begin{aligned} & p(n+1) = p(n) + 0.1u(n) {} \end{aligned} $$
(10.38)
$$\displaystyle \begin{aligned} & u(n+1) = \left[ \begin{array}{c c} 0.85 & 0.15 \\ -0.1 & 0.85 \end{array} \right] u(n) + w(n), {} \end{aligned} $$
(10.39)

where the w(n) are i.i.d. N(0, I). The measurements are

$$\displaystyle \begin{aligned} y_i(n) = ||p(n) - p_i|| + v_i(n), i = 1, 2, \ldots, 9, \end{aligned}$$

where the v i(n) are i.i.d. N(0, 0.32).

Figure 10.2 shows the result of the extended Kalman filter for X(n) = (p(n), u(n)) initialized with \(\hat x(0) = 0\) and Σ 0 = I.

Fig. 10.2
figure 2

The Extended Kalman Filter for the system (10.38)–(10.39)

Tracking a Chemical Reaction

This example concerns estimating the state of a chemical reactor from measurements of the pressure. This example is borrowed from James B. Rawlings and Fernando V. Lima, U. Wisconsin, Madison. There are three components A, B, C in the reactions and they are modeled as shown in Fig. 10.3 where the k i are the kinetic constants.

Fig. 10.3
figure 3

The chemical reactions

Let C A, C B, C C be the concentrations of the A, B, C, respectively. The model is

$$\displaystyle \begin{aligned} \frac{d}{dt} \left[ \begin{array}{c} C_A \\ C_B \\ C_C \end{array} \right] = \left[ \begin{array}{c c} -1 & 0 \\ 1 & - 2 \\ 1 & 1 \end{array} \right] \left[ \begin{array}{c} k_1 C_A - k_{-1} C_B C_C \\ k_2 C_B^2 - k_{-2} C_C \end{array} \right] \end{aligned}$$

and

$$\displaystyle \begin{aligned} y = RT(C_A + C_B + C_C). \end{aligned}$$

As shown in the top part of Fig. 10.4, this filter does not track the concentrations correctly. In fact, some concentrations that the filter estimates are negative!

Fig. 10.4
figure 4

The top two graphs show that the extended Kalman filter does not track the concentrations correctly. The bottom two graphs show convergence after modifying the equations

The bottom graphs show that the filter tracks the concentrations converge after modifying the equations and replacing negative estimates by 0.

The point of this example is that the extended Kalman filter is not guaranteed to converge and that, sometimes, a simple modification makes it converge.

10.5 Summary

  • Updating LLSE;

  • Derivation of Kalman Filter;

  • Observability and Reachability;

  • Extended Kalman Filter.

10.5.1 Key Equations and Formulas

Updating LLSE & zero-mean

⇒ L[X|Y, Z] = L[X|Y] + L[X|Z − L[Z|Y]]

T. 10.2

Observability

⇒ bounded error covariance

L.10.4

Observability + Reachability

⇒ asymptotic filter is good enough

T.9.8

Extended Kalman Filter

Linearize equations

S.10.4

10.6 References

The book Goodwin and Sin (2009) survey filtering and applications to control. The textbook Kumar and Varaiya (1986) is a comprehensive yet accessible presentation of control theory, filtering, and adaptive control. It is available online.