Tracking: B

Walrand, Jean

doi:10.1007/978-3-030-49995-2_10

Jean Walrand²

17k Accesses

Abstract

In Chapter Tracking: A, we explained the estimation of a random variable based on observations. We also described the Kalman filter and we gave a number of examples. In this chapter, we derive the Kalman filter and explain some of its properties. We also discuss the extended Kalman filter.

Section 10.1 explains how to update an estimate as one makes additional observations. Section 10.2 derives the Kalman filter. The properties of the Kalman filter are explained in Sect. 10.3. Section 10.4 shows how the Kalman filter is extended to nonlinear systems.

You have full access to this open access chapter, Download chapter PDF

Topics: Derivation and properties of Kalman filter; Extended Kalman filter

10.1 Updating LLSE

In many situations, one keeps making observations and one wishes to update the estimate accordingly, hopefully without having to recompute everything from scratch. That is, one hopes for a method that enables to calculate L[X|Y, Z] from L[X|Y] and Z.

The key idea is in the following result.

Theorem 10.1 (LLSE Update—Orthogonal Additional Observation)

Assume that X, Y , and Z are zero-mean and that Y and Z are orthogonal. Then

$$\displaystyle \begin{aligned} L[\mathbf{X} | \mathbf{Y}, \mathbf{Z}] = L[\mathbf{X}|\mathbf{Y}] + L[\mathbf{X}|\mathbf{Z}]. \end{aligned} $$

(10.1)

${\blacksquare }$

Proof

Figure 10.1 shows why the result holds. To be convinced mathematically, we need to show that the error

$$\displaystyle \begin{aligned} \mathbf{X} - (L[\mathbf{X}|\mathbf{Y}] + L[\mathbf{X}|\mathbf{Z}]) \end{aligned}$$

is orthogonal to Y and to Z. To see why it is orthogonal to Y, note that the error is

$$\displaystyle \begin{aligned} (\mathbf{X} - L[\mathbf{X}|\mathbf{Y}]) - L[\mathbf{X} | \mathbf{Z}]. \end{aligned}$$

Now, the term between parentheses is orthogonal to Y, by the projection property of L[X|Y]. Also, the second term is linear in Z, and is therefore orthogonal to Y since Z is orthogonal to Y. One shows that the error is orthogonal to Z in the same way. □

A simple consequence of this result is the following fact.

Theorem 10.2 (LLSE Update—General Additional Observation)

Assume that X, Y , and Z are zero-mean. Then

$$\displaystyle \begin{aligned} L[\mathbf{X} | \mathbf{Y}, \mathbf{Z}] = L[\mathbf{X}|\mathbf{Y}] + L[\mathbf{X}|\mathbf{Z} - L[\mathbf{Z} | \mathbf{Y}]]. \end{aligned} $$

(10.2)

${\blacksquare }$

Proof

The idea here is that one considers the innovation $\tilde {\mathbf {Z}} := \mathbf {Z} - L[\mathbf {Z} | \mathbf {Y}]$, which is the information in the new observation Z that is orthogonal to Y.

To see why the result holds, note that any linear combination of Y and Z can be written as a linear combination of Y and $\tilde {\mathbf {Z}}$. For instance, if L[Z|Y] = C Y, then

$$\displaystyle \begin{aligned} A \mathbf{Y} + B \mathbf{Z} = A \mathbf{Y} + B (\mathbf{Z} - C \mathbf{Y}) + B C \mathbf{Y} = (A + BC) \mathbf{Y} + B \tilde{\mathbf{Z}}. \end{aligned}$$

Thus, the set of linear functions of Y and Z is the same as the set of linear functions of Y and $\tilde {\mathbf {Z}}$, so that

$$\displaystyle \begin{aligned} L[\mathbf{X} | \mathbf{Y}, \mathbf{Z}] = L[\mathbf{X} | \mathbf{Y}, \tilde{\mathbf{Z}}]. \end{aligned}$$

Thus, (10.2) follows from Theorem 10.1 since Y and $\tilde {\mathbf {Z}}$ are orthogonal. □

10.2 Derivation of Kalman Filter

We derive the equations for the Kalman filter, as stated in Theorem 9.8. For convenience, we repeat those equations here:

$$\displaystyle \begin{aligned} & \hat X(n) = A \hat X(n-1) + K_n [Y(n) - CA \hat X(n-1)] {} \end{aligned} $$

(10.16)

$$\displaystyle \begin{aligned} & K_n = S_n C'[CS_nC' + \varSigma_W]^{-1} {} \end{aligned} $$

(10.17)

$$\displaystyle \begin{aligned} & S_n = A \varSigma_{n-1}A' + \varSigma_V {} \end{aligned} $$

(10.18)

$$\displaystyle \begin{aligned} & \varSigma_n = (I - K_nC)S_n {} \end{aligned} $$

(10.19)

and

$$\displaystyle \begin{aligned} S_n = \mbox{cov}(X(n) - A \hat X(n-1)) \mbox{ and } \varSigma_n = \mbox{cov}(X(n) - \hat X(n)). \end{aligned} $$

(10.20)

In the algebra, we repeatedly use the fact that

$$\displaystyle \begin{aligned} \mbox{cov}(BV, DW) = B~ \mbox{cov}(V, W)D' \end{aligned}$$

and also that if V and W are orthogonal, then

$$\displaystyle \begin{aligned} \mbox{cov}(V + W) = \mbox{cov}(V) + \mbox{cov}(W). \end{aligned}$$

The algebra is a bit tedious, but the key steps are worth noting.

Let

$$\displaystyle \begin{aligned} Y^n = (Y(0), \ldots, Y(n)). \end{aligned}$$

Note that

$$\displaystyle \begin{aligned} L\left[X(n) | Y^{n-1}\right] = L\left[AX(n-1) + V(n-1) | Y^{n-1}\right] = A \hat X(n-1). \end{aligned}$$

Hence,

$$\displaystyle \begin{aligned} L\left[Y(n)|Y^{n-1}\right] = L\left[ CX(n) + W(n)|Y^{n-1}\right] = CL\left[X(n)|Y^{n-1}\right] = CA\hat X(n-1), \end{aligned}$$

so that, by Theorem 10.2,

$$\displaystyle \begin{aligned} Y(n) - L\left[Y(n)|Y^{n-1}\right] = Y(n) - CA\hat X(n-1). \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} \hat X(n) &= L[X(n) | Y^{n}] = L\left[X(n) | Y^{n-1}\right] + L\left[X(n) | Y(n) - L\left[Y(n)|Y^{n-1}\right]\right]\\ & = A \hat X(n-1) + K_n\left[Y(n) - CA\hat X(n-1)\right]. \end{aligned} $$

This derivation shows that (10.16) is a fairly direct consequence of the formula in Theorem 10.2 for updating the LLSE.

The calculation of the gain K _n is a bit more complex. Let

$$\displaystyle \begin{aligned} \tilde Y(n) = Y(n) - L\left[Y(n)|Y^{n-1}\right] = Y(n) - CA\hat X(n-1). \end{aligned}$$

Then

$$\displaystyle \begin{aligned} K_n = \mbox{cov}\left(X(n), \tilde Y(n)\right) \mbox{cov}\left(\tilde Y(n)\right)^{-1}. \end{aligned}$$

Now,

$$\displaystyle \begin{aligned} \mbox{cov}\left(X(n), \tilde Y(n)\right) = \mbox{cov}\left(X(n) - L\left[X(n) | Y^{n-1}\right], \tilde Y(n)\right), \end{aligned}$$

because $\tilde Y(n)$ is orthogonal to Y ⁿ⁻¹. Also,

$$\displaystyle \begin{aligned} & \mbox{cov}(X(n) - L\left[X(n) | Y^{n-1}\right], \tilde Y(n)) \\ & \quad = \mbox{cov}(X(n) - A \hat X(n-1), Y(n) - CA\hat X(n-1)) \\ & \quad = \mbox{cov}(X(n) - A \hat X(n-1), CX(n) + W(n) - CA\hat X(n-1)) \\ & \quad = S_n C', \end{aligned} $$

by (10.20).

To calculate $\mbox{cov}(\tilde Y(n))$, we note that

$$\displaystyle \begin{aligned} \mbox{cov}(\tilde Y(n)) = \mbox{cov}\left(CX(n) + W(n) - C L\left[X(n)|Y^{n-1}\right]\right) = CS_nC' + \varSigma_W. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} K_n = S_n C'\left[CS_nC' + \varSigma_W\right]^{-1}. \end{aligned}$$

To show (10.18), we note that

$$\displaystyle \begin{aligned} & S_n = \mbox{cov}\left(X(n) - L\left[X(n)|Y^{n-1}\right]\right) \\ &~~~ = \mbox{cov}\left(AX(n-1) + V(n-1) - A \hat X(n-1)\right) \\ &~~~ = A \varSigma_{n-1} A' + \varSigma_V. \end{aligned} $$

Finally, to derive (10.19), we calculate

$$\displaystyle \begin{aligned} \varSigma_n = \mbox{cov}\left(X(n) - \hat X(n)\right). \end{aligned}$$

We observe that

$$\displaystyle \begin{aligned} & X(n) - L[X(n)|Y^n] = X(n) - A \hat X(n-1) - K_n \left[Y(n) - CA \hat X(n-1)\right] \\ & \quad = X(n) - A\hat X(n-1) - K_n\left[C X(n) + W(n) - CA \hat X(n-1)\right] \\ & \quad = [I - K_nC]\left[X(n) - A \hat X(n-1)\right] - K_n W(n), \end{aligned} $$

so that

$$\displaystyle \begin{aligned} & \varSigma_n = [I - K_nC] S_n [I - K_n C]' + K_n \varSigma_W K_n^{\prime}\\ & \quad = S_n - 2 K_nCS_n + K_n \left[C S_n C' + \varSigma_W\right] K_n^{\prime} \\ & \quad = S_n - 2 K_nCS_n + K_n \left[CS_nC' + \varSigma_W\right] \left[C S_n C' + \varSigma_W\right]^{-1}CS_n \mbox{ by (10.17)}\\ & \quad = S_n - K_nCS_n, \end{aligned} $$

as we wanted to show.

${\square }$

10.3 Properties of Kalman Filter

The goal of this section is to explain and justify the following result. The terms observable and reachable are defined after the statement of the theorem.

Theorem 10.3 (Properties of the Kalman Filter)

(a)
If (A, C) is observable, then Σ _n is bounded. Moreover, if Σ ₀ = 0 , then
$$\displaystyle \begin{aligned} \varSigma_n \rightarrow \varSigma \mathit{\mbox{ and }} K_n \rightarrow K, \end{aligned} $$
(10.37)

where Σ is a finite matrix.
(b)
Also, if in addition, $(A, \varSigma _V^{1/2})$ is reachable, then the filter with K _n = K is such that the covariance of the error also converges to Σ.

${\blacksquare }$

We explain these properties in the subsequent sections. Let us first make a few comments.

For some systems, the errors grow without bound. For instance, if one does not observe anything (e.g., C = 0) and if the system is unstable (e.g., X(n) = 2X(n − 1) + V (n)), then Σ _n goes to infinity. However, (a) says that “if the observations are rich enough,” this does not happen: one can track X(n) with an error that has a bounded covariance.
Part (b) of the theorem says that in some cases, one can use the filter with a constant gain K without having a bigger error, asymptotically. This is very convenient as one does not have to compute a new gain at each step.

10.3.1 Observability

Are the observations good enough to track the state with a bounded error covariance? Before stating the result, we need a precise notion of good observations.

Definition 10.1 (Observability)

We say that (A, C) is observable if the null space of

$$\displaystyle \begin{aligned} \left[ \begin{array}{c} C\\ CA\\ \vdots \\ CA^d \end{array} \right] \end{aligned}$$

is {0}. Here, d is the dimension of X(n). A matrix M has null space {0} if {0} is the only vector v such that M v = 0. ◇

The key result is the following.

Lemma 10.4 (Observability Implies Bounded Error Covariance)

(a)
If the system is observable, then Σ _n is bounded.
(b)
If in addition, Σ ₀ = 0, then Σ _n converges to some finite Σ.

Proof

(a)
Observability implies that there is only one X(0) that corresponds to (Y (0), …, Y (d)) if the system has no noise. Indeed, in that case,
$$\displaystyle \begin{aligned} X(n) = AX(n-1) \mbox{ and } Y(n) = CX(n). \end{aligned}$$

Then,
$$\displaystyle \begin{aligned} X(1) = AX(0), X(2) = A^2X(0), \ldots, X(d) = A^{d - 1} X(0), \end{aligned}$$

so that
$$\displaystyle \begin{aligned} Y(0) = CX(0), Y(1) = CAX(0), \ldots, Y(d-1) = CA^{d - 1}X(0). \end{aligned}$$

Consequently,
$$\displaystyle \begin{aligned} \left[ \begin{array}{c} Y(0)\\ Y(1)\\ \vdots \\ Y(d) \end{array} \right] = \left[ \begin{array}{c} C\\ CA\\ \vdots \\ CA^{d - 1} \end{array} \right] X(0). \end{aligned}$$

Now, imagine that there are two different initial states, say X(0) and that give the same outputs Y (0), …, Y (d). Then,

so that

The observability property implies that .

Thus, if (A, C) is observable, one can identify the initial condition X(0) uniquely after d + 1 observations of the output, when there is no noise. Hence, when there is no noise, one can then determine X(1), X(2), … exactly. Thus, when (A, C) is observable, one can determine the state X(n) precisely from the outputs.

However, our system has some noise. If (A, C) is observable, we are able to identify X(0) from Y (0), …, Y (d), up to some linear function of the noise that has affected those outputs, i.e., up to a linear function of {V (0), …, V (d − 1), W(0), …, W(d)}. Consequently, we can determine X(d) from Y (0), …, Y (d), up to some linear function of {V (0), …, V (d − 1), W(0), …, W(d)}. Similarly, we can determine X(n) from Y (n − d), …, Y (n), up to some linear function of {V (n), …, V (n + d − 1), W(n − d), …, W(n)}.

This implies that the error between X(n) and $\hat X(n)$ is a linear combination of d noise contributions, so that Σ _n is bounded.
(b)
One can show that if Σ ₀ = 0, i.e., if we know X(0), then Σ _n increases in the sense that Σ _n − Σ _n−1 is nonnegative definite. Being bounded and increasing implies that Σ _n converges, and so does K _n.

□

10.3.2 Reachability

Assume that Σ _V = QQ′. We say that (A, Q) is reachable if the rank of

$$\displaystyle \begin{aligned}{}[Q, AQ, \ldots, A^{d - 1} Q] \end{aligned}$$

is full. To appreciate the meaning of this property, note that we can write the state equations as

$$\displaystyle \begin{aligned} X(n) = A X(n-1) + Q \eta_n, \end{aligned}$$

where cov(η _n) = I. That is, the components of η are orthogonal. In the Gaussian case, the components of η are N(0, 1) and independent. If (A, Q) is reachable, this means that for any $\mathbf {x} \in \Re ^d$, there is some sequence η ₀, …, η _d such that if X(0) = 0, then X(d) = x. Indeed,

$$\displaystyle \begin{aligned} X(d) = \sum_{k = 0}^d A^k Q \eta_{d - k} = \left[Q, AQ, \ldots, A^{d - 1} Q\right] \left[ \begin{array}{c} \eta_d \\ \eta_{d-1} \vdots \\ \eta_0 \end{array} \right]. \end{aligned}$$

Since the matrix is full rank, the span of its columns is $\Re ^d$, which means precisely that there is a linear combination of these columns that is equal to any given vector in $\Re ^d$.

The proof of part (b) of the theorem is a bit too involved for this course.

10.4 Extended Kalman Filter

The Kalman filter is often used for nonlinear systems. The idea is that if the system is almost linear over a few steps, then one may be able to use the Kalman filter locally and change the matrices A and C as the estimate of the state changes.

The model is as follows:

$$\displaystyle \begin{aligned} & X(n+1) = f(X(n)) + V(n) \\ & Y(n+1) = g(X(n+1)) + W(n+1). \end{aligned} $$

The extended Kalman filter is then

$$\displaystyle \begin{aligned} \hat X(n+1) & = f\left(\hat X(n)\right) + K_n\left[Y(n+1) - g\left(f(\hat X(n))\right)\right] \\ K_n &= S_nC_n^\prime \left[C_nS_nC_n^\prime + \varSigma_W\right]^{-1} \\ S_n & = A_n \varSigma_n A_n^\prime + \varSigma_V \\ \varSigma_{n+1} & = [I - K_n C_n] S_n, \end{aligned} $$

where

$$\displaystyle \begin{aligned}{}[A_n]_{ij} = \frac{\partial}{\partial x_j} f_i\left(\hat X(n)\right) \mbox{ and } [C_n]_{ij} = \frac{\partial}{\partial x_j} g_i \left(\hat X(n)\right). \end{aligned}$$

Thus, the idea is to linearize the system around the estimated state value and then apply the usual Kalman filter.

Note that we are now in the realm of heuristics and that very little can be said about the properties of this filter. Experiments show that it works well when the nonlinearities are small, whatever this means precisely, but that it may fail miserably in other conditions.

10.4.1 Examples

Tracking a Vehicle

In this example, borrowed from “Eric Feron, Notes for AE6531, Georgia Tech.”, the goal is to track a vehicle that moves in the plane by using noisy measurements of distances to 9 points $p_i \in \Re ^2$. Let $p(n) \in \Re ^2$ be the position of the vehicle and $u(n) \in \Re ^2$ be its velocity at time n ≥ 0.

We assume that the velocity changes accruing to a known rule, except for some random perturbation. Specifically, we assume that

$$\displaystyle \begin{aligned} & p(n+1) = p(n) + 0.1u(n) {} \end{aligned} $$

(10.38)

$$\displaystyle \begin{aligned} & u(n+1) = \left[ \begin{array}{c c} 0.85 & 0.15 \\ -0.1 & 0.85 \end{array} \right] u(n) + w(n), {} \end{aligned} $$

(10.39)

where the w(n) are i.i.d. N(0, I). The measurements are

$$\displaystyle \begin{aligned} y_i(n) = ||p(n) - p_i|| + v_i(n), i = 1, 2, \ldots, 9, \end{aligned}$$

where the v _i(n) are i.i.d. N(0, 0.3²).

Figure 10.2 shows the result of the extended Kalman filter for X(n) = (p(n), u(n)) initialized with $\hat x(0) = 0$ and Σ ₀ = I.

Tracking a Chemical Reaction

This example concerns estimating the state of a chemical reactor from measurements of the pressure. This example is borrowed from James B. Rawlings and Fernando V. Lima, U. Wisconsin, Madison. There are three components A, B, C in the reactions and they are modeled as shown in Fig. 10.3 where the k _i are the kinetic constants.

Let C _A, C _B, C _C be the concentrations of the A, B, C, respectively. The model is

$$\displaystyle \begin{aligned} \frac{d}{dt} \left[ \begin{array}{c} C_A \\ C_B \\ C_C \end{array} \right] = \left[ \begin{array}{c c} -1 & 0 \\ 1 & - 2 \\ 1 & 1 \end{array} \right] \left[ \begin{array}{c} k_1 C_A - k_{-1} C_B C_C \\ k_2 C_B^2 - k_{-2} C_C \end{array} \right] \end{aligned}$$

and

$$\displaystyle \begin{aligned} y = RT(C_A + C_B + C_C). \end{aligned}$$

As shown in the top part of Fig. 10.4, this filter does not track the concentrations correctly. In fact, some concentrations that the filter estimates are negative!

The bottom graphs show that the filter tracks the concentrations converge after modifying the equations and replacing negative estimates by 0.

The point of this example is that the extended Kalman filter is not guaranteed to converge and that, sometimes, a simple modification makes it converge.

10.5 Summary

Updating LLSE;
Derivation of Kalman Filter;
Observability and Reachability;
Extended Kalman Filter.

10.5.1 Key Equations and Formulas

Updating LLSE & zero-mean	⇒ L[X\|Y, Z] = L[X\|Y] + L[X\|Z − L[Z\|Y]]	T. 10.2
Observability	⇒ bounded error covariance	L.10.4
Observability + Reachability	⇒ asymptotic filter is good enough	T.9.8
Extended Kalman Filter	Linearize equations	S.10.4

10.6 References

The book Goodwin and Sin (2009) survey filtering and applications to control. The textbook Kumar and Varaiya (1986) is a comprehensive yet accessible presentation of control theory, filtering, and adaptive control. It is available online.

References

G.C. Goodwin, K.S. Sin, Adaptive Filtering Prediction and Control (Dover, New York, 2009)
MATH Google Scholar
P.R. Kumar, P.P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control (Prentice-Hall, Upper Saddle River, 1986)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of EECS, University of California, Berkeley, Berkeley, CA, USA
Jean Walrand

Authors

Jean Walrand
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Walrand, J. (2021). Tracking: B. In: Probability in Electrical Engineering and Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-49995-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-49995-2_10
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49994-5
Online ISBN: 978-3-030-49995-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tracking: B

Abstract

10.1 Updating LLSE

Theorem 10.1 (LLSE Update—Orthogonal Additional Observation)

Proof

Theorem 10.2 (LLSE Update—General Additional Observation)

Proof

10.2 Derivation of Kalman Filter

10.3 Properties of Kalman Filter

Theorem 10.3 (Properties of the Kalman Filter)

10.3.1 Observability

Definition 10.1 (Observability)

Lemma 10.4 (Observability Implies Bounded Error Covariance)

Proof

10.3.2 Reachability

10.4 Extended Kalman Filter

10.4.1 Examples

Tracking a Vehicle

Tracking a Chemical Reaction

10.5 Summary

10.5.1 Key Equations and Formulas

10.6 References

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation