Abstract
In Chapter Tracking: A, we explained the estimation of a random variable based on observations. We also described the Kalman filter and we gave a number of examples. In this chapter, we derive the Kalman filter and explain some of its properties. We also discuss the extended Kalman filter.
Section 10.1 explains how to update an estimate as one makes additional observations. Section 10.2 derives the Kalman filter. The properties of the Kalman filter are explained in Sect. 10.3. Section 10.4 shows how the Kalman filter is extended to nonlinear systems.
You have full access to this open access chapter, Download chapter PDF
Topics: Derivation and properties of Kalman filter; Extended Kalman filter
10.1 Updating LLSE
In many situations, one keeps making observations and one wishes to update the estimate accordingly, hopefully without having to recompute everything from scratch. That is, one hopes for a method that enables to calculate L[X|Y, Z] from L[X|Y] and Z.
The key idea is in the following result.
Theorem 10.1 (LLSE Update—Orthogonal Additional Observation)
Assume that X, Y , and Z are zero-mean and that Y and Z are orthogonal. Then
\({\blacksquare }\)
Proof
Figure 10.1 shows why the result holds. To be convinced mathematically, we need to show that the error
is orthogonal to Y and to Z. To see why it is orthogonal to Y, note that the error is
Now, the term between parentheses is orthogonal to Y, by the projection property of L[X|Y]. Also, the second term is linear in Z, and is therefore orthogonal to Y since Z is orthogonal to Y. One shows that the error is orthogonal to Z in the same way. □
A simple consequence of this result is the following fact.
Theorem 10.2 (LLSE Update—General Additional Observation)
Assume that X, Y , and Z are zero-mean. Then
\({\blacksquare }\)
Proof
The idea here is that one considers the innovation \(\tilde {\mathbf {Z}} := \mathbf {Z} - L[\mathbf {Z} | \mathbf {Y}]\), which is the information in the new observation Z that is orthogonal to Y.
To see why the result holds, note that any linear combination of Y and Z can be written as a linear combination of Y and \(\tilde {\mathbf {Z}}\). For instance, if L[Z|Y] = C Y, then
Thus, the set of linear functions of Y and Z is the same as the set of linear functions of Y and \(\tilde {\mathbf {Z}}\), so that
Thus, (10.2) follows from Theorem 10.1 since Y and \(\tilde {\mathbf {Z}}\) are orthogonal. □
10.2 Derivation of Kalman Filter
We derive the equations for the Kalman filter, as stated in Theorem 9.8. For convenience, we repeat those equations here:
and
In the algebra, we repeatedly use the fact that
and also that if V and W are orthogonal, then
The algebra is a bit tedious, but the key steps are worth noting.
Let
Note that
Hence,
so that, by Theorem 10.2,
Thus,
This derivation shows that (10.16) is a fairly direct consequence of the formula in Theorem 10.2 for updating the LLSE.
The calculation of the gain K n is a bit more complex. Let
Then
Now,
because \(\tilde Y(n)\) is orthogonal to Y n−1. Also,
by (10.20).
To calculate \(\mbox{cov}(\tilde Y(n))\), we note that
Thus,
To show (10.18), we note that
Finally, to derive (10.19), we calculate
We observe that
so that
as we wanted to show.
\({\square }\)
10.3 Properties of Kalman Filter
The goal of this section is to explain and justify the following result. The terms observable and reachable are defined after the statement of the theorem.
Theorem 10.3 (Properties of the Kalman Filter)
-
(a)
If (A, C) is observable, then Σ n is bounded. Moreover, if Σ 0 = 0 , then
$$\displaystyle \begin{aligned} \varSigma_n \rightarrow \varSigma \mathit{\mbox{ and }} K_n \rightarrow K, \end{aligned} $$(10.37)where Σ is a finite matrix.
-
(b)
Also, if in addition, \((A, \varSigma _V^{1/2})\) is reachable, then the filter with K n = K is such that the covariance of the error also converges to Σ.
\({\blacksquare }\)
We explain these properties in the subsequent sections. Let us first make a few comments.
-
For some systems, the errors grow without bound. For instance, if one does not observe anything (e.g., C = 0) and if the system is unstable (e.g., X(n) = 2X(n − 1) + V (n)), then Σ n goes to infinity. However, (a) says that “if the observations are rich enough,” this does not happen: one can track X(n) with an error that has a bounded covariance.
-
Part (b) of the theorem says that in some cases, one can use the filter with a constant gain K without having a bigger error, asymptotically. This is very convenient as one does not have to compute a new gain at each step.
10.3.1 Observability
Are the observations good enough to track the state with a bounded error covariance? Before stating the result, we need a precise notion of good observations.
Definition 10.1 (Observability)
We say that (A, C) is observable if the null space of
is {0}. Here, d is the dimension of X(n). A matrix M has null space {0} if {0} is the only vector v such that M v = 0. ◇
The key result is the following.
Lemma 10.4 (Observability Implies Bounded Error Covariance)
-
(a)
If the system is observable, then Σ n is bounded.
-
(b)
If in addition, Σ 0 = 0, then Σ n converges to some finite Σ.
Proof
-
(a)
Observability implies that there is only one X(0) that corresponds to (Y (0), …, Y (d)) if the system has no noise. Indeed, in that case,
$$\displaystyle \begin{aligned} X(n) = AX(n-1) \mbox{ and } Y(n) = CX(n). \end{aligned}$$Then,
$$\displaystyle \begin{aligned} X(1) = AX(0), X(2) = A^2X(0), \ldots, X(d) = A^{d - 1} X(0), \end{aligned}$$so that
$$\displaystyle \begin{aligned} Y(0) = CX(0), Y(1) = CAX(0), \ldots, Y(d-1) = CA^{d - 1}X(0). \end{aligned}$$Consequently,
$$\displaystyle \begin{aligned} \left[ \begin{array}{c} Y(0)\\ Y(1)\\ \vdots \\ Y(d) \end{array} \right] = \left[ \begin{array}{c} C\\ CA\\ \vdots \\ CA^{d - 1} \end{array} \right] X(0). \end{aligned}$$Now, imagine that there are two different initial states, say X(0) and that give the same outputs Y (0), …, Y (d). Then,
so that
The observability property implies that .
Thus, if (A, C) is observable, one can identify the initial condition X(0) uniquely after d + 1 observations of the output, when there is no noise. Hence, when there is no noise, one can then determine X(1), X(2), … exactly. Thus, when (A, C) is observable, one can determine the state X(n) precisely from the outputs.
However, our system has some noise. If (A, C) is observable, we are able to identify X(0) from Y (0), …, Y (d), up to some linear function of the noise that has affected those outputs, i.e., up to a linear function of {V (0), …, V (d − 1), W(0), …, W(d)}. Consequently, we can determine X(d) from Y (0), …, Y (d), up to some linear function of {V (0), …, V (d − 1), W(0), …, W(d)}. Similarly, we can determine X(n) from Y (n − d), …, Y (n), up to some linear function of {V (n), …, V (n + d − 1), W(n − d), …, W(n)}.
This implies that the error between X(n) and \(\hat X(n)\) is a linear combination of d noise contributions, so that Σ n is bounded.
-
(b)
One can show that if Σ 0 = 0, i.e., if we know X(0), then Σ n increases in the sense that Σ n − Σ n−1 is nonnegative definite. Being bounded and increasing implies that Σ n converges, and so does K n.
□
10.3.2 Reachability
Assume that Σ V = QQ′. We say that (A, Q) is reachable if the rank of
is full. To appreciate the meaning of this property, note that we can write the state equations as
where cov(η n) = I. That is, the components of η are orthogonal. In the Gaussian case, the components of η are N(0, 1) and independent. If (A, Q) is reachable, this means that for any \(\mathbf {x} \in \Re ^d\), there is some sequence η 0, …, η d such that if X(0) = 0, then X(d) = x. Indeed,
Since the matrix is full rank, the span of its columns is \(\Re ^d\), which means precisely that there is a linear combination of these columns that is equal to any given vector in \(\Re ^d\).
The proof of part (b) of the theorem is a bit too involved for this course.
10.4 Extended Kalman Filter
The Kalman filter is often used for nonlinear systems. The idea is that if the system is almost linear over a few steps, then one may be able to use the Kalman filter locally and change the matrices A and C as the estimate of the state changes.
The model is as follows:
The extended Kalman filter is then
where
Thus, the idea is to linearize the system around the estimated state value and then apply the usual Kalman filter.
Note that we are now in the realm of heuristics and that very little can be said about the properties of this filter. Experiments show that it works well when the nonlinearities are small, whatever this means precisely, but that it may fail miserably in other conditions.
10.4.1 Examples
Tracking a Vehicle
In this example, borrowed from “Eric Feron, Notes for AE6531, Georgia Tech.”, the goal is to track a vehicle that moves in the plane by using noisy measurements of distances to 9 points \(p_i \in \Re ^2\). Let \(p(n) \in \Re ^2\) be the position of the vehicle and \(u(n) \in \Re ^2\) be its velocity at time n ≥ 0.
We assume that the velocity changes accruing to a known rule, except for some random perturbation. Specifically, we assume that
where the w(n) are i.i.d. N(0, I). The measurements are
where the v i(n) are i.i.d. N(0, 0.32).
Figure 10.2 shows the result of the extended Kalman filter for X(n) = (p(n), u(n)) initialized with \(\hat x(0) = 0\) and Σ 0 = I.
Tracking a Chemical Reaction
This example concerns estimating the state of a chemical reactor from measurements of the pressure. This example is borrowed from James B. Rawlings and Fernando V. Lima, U. Wisconsin, Madison. There are three components A, B, C in the reactions and they are modeled as shown in Fig. 10.3 where the k i are the kinetic constants.
Let C A, C B, C C be the concentrations of the A, B, C, respectively. The model is
and
As shown in the top part of Fig. 10.4, this filter does not track the concentrations correctly. In fact, some concentrations that the filter estimates are negative!
The bottom graphs show that the filter tracks the concentrations converge after modifying the equations and replacing negative estimates by 0.
The point of this example is that the extended Kalman filter is not guaranteed to converge and that, sometimes, a simple modification makes it converge.
10.5 Summary
-
Updating LLSE;
-
Derivation of Kalman Filter;
-
Observability and Reachability;
-
Extended Kalman Filter.
10.5.1 Key Equations and Formulas
References
G.C. Goodwin, K.S. Sin, Adaptive Filtering Prediction and Control (Dover, New York, 2009)
P.R. Kumar, P.P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control (Prentice-Hall, Upper Saddle River, 1986)
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2021 The Author(s)
About this chapter
Cite this chapter
Walrand, J. (2021). Tracking: B. In: Probability in Electrical Engineering and Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-49995-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-49995-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49994-5
Online ISBN: 978-3-030-49995-2
eBook Packages: Computer ScienceComputer Science (R0)