Robust Kalman tracking and smoothing with propagating and non-propagating outliers

Ruckdeschel, Peter; Spangl, Bernhard; Pupashenko, Daria

doi:10.1007/s00362-012-0496-4

Robust Kalman tracking and smoothing with propagating and non-propagating outliers

Regular Article
Published: 09 January 2013

Volume 55, pages 93–123, (2014)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Peter Ruckdeschel^1,4,
Bernhard Spangl² &
Daria Pupashenko^3,4

484 Accesses
20 Citations
1 Altmetric
1 Mention
Explore all metrics

Abstract

A common situation in filtering where classical Kalman filtering does not perform particularly well is tracking in the presence of propagating outliers. This calls for robustness understood in a distributional sense, i.e.; we enlarge the distribution assumptions made in the ideal model by suitable neighborhoods. Based on optimality results for distributional-robust Kalman filtering from Ruckdeschel (Ansätze zur Robustifizierung des Kalman-Filters, vol 64, 2001; Optimally (distributional-)robust Kalman filtering, arXiv: 1004.3393, 2010a), we propose new robust recursive filters and smoothers designed for this purpose as well as specialized versions for non-propagating outliers. We apply these procedures in the context of a GPS problem arising in the car industry. To better understand these filters, we study their behavior at stylized outlier patterns (for which they are not designed) and compare them to other approaches for the tracking problem. Finally, in a simulation study we discuss efficiency of our procedures in comparison to competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization Viewpoint on Kalman Smoothing with Applications to Robust and Sparse Estimation

Extended Kalman Filters

References

Anderson BDO, Moore JB (1990) Optimal control. Linear quadratic methods. Prentice Hall, New York
MATH Google Scholar
Anscombe FJ (1960) Rejection of outliers. Technometrics 2:123–147
Article MATH MathSciNet Google Scholar
Birmiwal K, Papantoni-Kazakos P (1994) Outlier resistant prediction for stationary processes. Stat Decis 12(4):395–427
MATH MathSciNet Google Scholar
Birmiwal K, Shen J (1993) Optimal robust filtering. Stat Decis 11(2):101–119
MATH MathSciNet Google Scholar
Boncelet CG Jr, Dickinson BW (1983) An approach to robust Kalman filtering. In: Proceedings on the 22nd IEEE conference on decision and control, vol 1, pp 304–305
Boncelet CG Jr, Dickinson BW (1987) An extension to the SRIF Kalman filter. IEEE Trans Autom Control AC-32:176–179
Google Scholar
Cipra T, Hanzak T (2011) Exponential smoothing for a time series with outliers. Kybernetika 47:165–178
MATH MathSciNet Google Scholar
Cipra T, Romera R (1991) Robust Kalman filter and its application in time series analysis. Kybernetika 27(6):481–494
MATH MathSciNet Google Scholar
Donoho D, Johnstone I (1994) Ideal spatial adaptation via wavelet shrinkage. Biometrika 81:425–455
Article MATH MathSciNet Google Scholar
Ershov AA, Liptser RS (1978) Robust Kalman filter in discrete time. Automat Remote Control 39:359–367
MATH Google Scholar
Fox AJ (1972) Outliers in time series. J R Stat Soc B 34:350–363
MATH Google Scholar
Franke J (1985) Minimax-robust prediction of discrete time series. Z Wahrscheinlichkeitstheor Verw Geb 68:337–364
Article MATH MathSciNet Google Scholar
Franke J, Poor HV (1984) Minimax-robust filtering and finite-length robust predictors. In: Robust and nonlinear time series analysis. Proceedings of workshop, Heidelberg, Germany, 1983, Nr. 26 in Lecture notes in statistics. Springer, New York
Fried R, Schettlinger K (2010) robfilter: robust time series filters. R package version 2.6.1. Available on CRAN. http://cran.r-project.org/web/packages/robfilter
Fried R, Bernholt T, Gather U (2006) Repeated median and hybrid filters. Comput Stat Data Anal 50:2313–2338
Article MATH MathSciNet Google Scholar
Gelper S, Fried R, Croux C (2010) Robust forecasting with exponential and holt-winters smoothing. J Forecast 29:285–300
Google Scholar
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2011) mvtnorm: multivariate normal and t distributions. R package version 0.9-9991. Available on CRAN. http://CRAN.R-project.org/package=mvtnorm
Hampel FR (1968) Contributions to the theory of robust estimation. Dissertation, University of California, Berkeley
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York
MATH Google Scholar
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101
Article MATH Google Scholar
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng Trans ASME 82:35–45
Google Scholar
Kassam SA, Poor HV (1985) Robust techniques for signal processing: a survey. Proc IEEE 73(3):433–481
Article MATH Google Scholar
Künsch H (2001) State space models and hidden Markov models. In: Barndorff-Nielsen OE, Cox DR, Klüppelberg C (eds) Complex stochastic systems. Chapman and Hall, New York, pp 109–173
Martin RD, Yohai VJ (1986) Influence functionals for time series (with discussion). Ann Stat 14:781–818
Article MATH MathSciNet Google Scholar
Masreliez CJ, Martin R (1977) Robust Bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans Autom Control AC-22:361–371
Google Scholar
R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Rieder H, Kohl M, Ruckdeschel P (2008) The costs of not knowing the radius. Stat Methods Appl 17(1):13–40
Google Scholar
Ruckdeschel P (2000) Robust Kalman filtering. In: Hárdle W, Hlávka Z, Klinke S (eds) XploRe. Application guide, chap 18. Springer, New York, pp 483–516
Ruckdeschel P (2001) Ansätze zur Robustifizierung des Kalman-Filters, vol 64. Bayreuther Mathematische Schriften, Bayreuth
MATH Google Scholar
Ruckdeschel P (2010a) Optimally (distributional-)robust Kalman filtering. Available on arXiv: 1004.3393
Ruckdeschel P (2010b) Optimally robust Kalman filtering. Technical report 185, Fraunhofer ITWM Kaiserslautern, Kaiserslautern. http://www.itwm.fraunhofer.de/fileadmin/ITWM-Media/Zentral/Pdf/Berichte_ITWM/2010/bericht_185.pdf
Rudin W (1974) Real and complex analysis, 2nd edn. McGraw-Hill, New York
Schettlinger K (2009) Signal and variability extraction for online monitoring in intensive care. Dissertation, TU Dortmund, Dortmund. https://eldorado.tu-dortmund.de/bitstream/2003/26044/1/Dissertation-Schettlinger-Internetpublikation.pdf
Schick IC, Mitter SK (1994) Robust recursive estimation in the presence of heavy-tailed observation noise. Ann Stat 22(2):1045–1080
Article MATH MathSciNet Google Scholar
Shumway RH, Stoffer DS (1982) An approach to time series smoothing and forecasting using the EM algorithm. J Time Ser Anal 3:253–264
Article MATH Google Scholar
Spangl B (2008) On robust spectral density estimation. Dissertation, Department of Statistics and Probability Theory, Vienna University of Technology, Vienna
Stockinger N, Dutter R (1987) Robust time series analysis: a survey. Kybernetika 23 Supplement
Venables W, Ripley B (2002) Modern applied statistics with S-plus, 4th edn. Springer, New York
Wan EA, van der Merwe R (2002) The unscented Kalman filter. In: Haykin S (ed) Kalman filtering and neural networks. Wiley, New York

Download references

Acknowledgments

The authors thank two anonymous referees for their valuable and helpful comments. Financial support from VW foundation in the framework of project Robust Risk Estimation for D. Pupashenko is gratefully acknowledged.

Author information

Authors and Affiliations

Fraunhofer ITWM, Abt. Finanzmathematik, Fraunhofer-Platz 1, 67663 , Kaiserslautern, Germany
Peter Ruckdeschel
Institute of Applied Statistics and Computing, University of Natural Resources and Applied Life Sciences, Gregor-Mendel-Str. 33, 1180 , Vienna, Austria
Bernhard Spangl
Hochschule Furtwangen, Fak. Maschinenbau und Verfahrenstechnik, Jakob-Kienzle-Strae 17, 78054 , Villingen-Schwenningen, Germany
Daria Pupashenko
TU Kaiserslautern, AG Statistik, FB. Mathematik, P.O. Box 3049, 67653 , Kaiserslautern, Germany
Peter Ruckdeschel & Daria Pupashenko

Authors

Peter Ruckdeschel
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Spangl
View author publications
You can also search for this author in PubMed Google Scholar
Daria Pupashenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Ruckdeschel.

Appendix

1.1 Optimality of the classical Kalman filter

Optimality of the classical Kalman filter among all linear filters in $L_2$-sense and, under normality of the error and innovation distributions, among all measurable filters is a well-known fact, compare, e.g. (Anderson and Moore (1990), Sect. 5.2). As we will need some of the arguments later, let us complement this fact by some generalization to arbitrary norms generated by a quadratic form and by a thorough treatment of the case of singularities in the covariances arising in the definition of the Kalman gain from (3.5). To do so, we take the orthogonal decomposition of the Hilbert into closed linear subspaces as $ \mathrm{lin}(Y_{1:(t-1)})\oplus \mathrm{lin} (\Delta Y_{t})$ as granted and for $X=X_t-X_{t|t-1}$ and $Y=\Delta Y_t$ as given in (3.5), derive $\hat{K}_t$. To this end, for any matrix A let us denote by $A^-$ the generalized inverse of A with the defining properties

$$\begin{aligned} A^-AA^-=A^-,\quad AA^-A = A,\quad A^-A = (A^-A)^\tau ,\quad AA^- = (AA^-)^\tau \end{aligned}$$

(8.1)

and for D a positive semi-definite symmetric matrix in $\mathbb{R }^{p\times p}$ and for $x\in \mathbb{R }^p$ define the semi-norm generated by D as $\Vert x\Vert ^2_D:=x^\tau D^{-} x$.

Lemma 8.1

Let $p,q \in \mathbb N , P$ some probability and $X\in L_2^p(P), Y\in L^q_2(P), \text{ E} X=0, \text{ E} Y=0$, where for some $Z\in \mathbb{R }^{q\times p}$, and some $\varepsilon \in L_2^q(P)$ independent of $X, Y=ZX+\varepsilon $. Let D a positive semi-definite symmetric matrix in $\mathbb{R }^{p\times p}$. Then

$$\begin{aligned} \hat{K} = \mathop {\text{ Cov}}\nolimits (X,Y) \mathop {\text{ Cov}}\nolimits (Y)^- \end{aligned}$$

(8.2)

solves

$$\begin{aligned} \text{ E} \Vert X-KY\Vert ^2_D = \, \min {}!,\quad K \in \mathbb{R }^{p\times q} \end{aligned}$$

(8.3)

$\hat{K}$ is unique up to addition of some $A \in \mathbb{R }^{p\times q}$ such that $A\mathop {\text{ Cov}}\nolimits Y =0$ and some $B \in \mathbb{R }^{p\times q}$ such that $DB=0$. If $\hat{K} = D D^- \hat{K}, \hat{K}$ has smallest Frobenius norm among all solutions K to (8.3).

Proof

Denote $L_2^p(P,D)$ the Hilbert space generated by all $\mathbb{R }^p$ valued random variables U such that $\text{ E}_P \Vert U\Vert _D^2 <\infty $—after a passage to equivalence classes of all random variables $U, U^{\prime }$ such that $\text{ E}_P \Vert U-U^{\prime }\Vert _D^2 = 0$. Let $S=\mathop {\text{ Cov}}\nolimits (X)$ and $V=\mathop {\text{ Cov}}\nolimits (\varepsilon )$. Then $\mathop {\text{ Cov}}\nolimits (X,Y)=SZ^\tau $ and $\mathop {\text{ Cov}}\nolimits (Y)=ZSZ^\tau +V$. Denote the approximation space $\{ KY\,\mid \, K\in \mathbb{R }^{p\times q}\}\subset L_2^p(P,D)$ by $\mathcal{K }$. $\mathcal{ K }$ is a closed linear subspace of $L_2^p(P,D)$, hence by Rudin (1974)[Thm. 4.10] there exists a unique minimizer $\hat{X} = \hat{K} Y \in \mathcal{K }$ to problem (8.3). It is characterized by

$$\begin{aligned} \text{ E} (X- \hat{K}Y)^\tau D^{-} KY = 0,\quad \forall K \in \mathbb{R }^{p\times q}. \end{aligned}$$

(8.4)

Plugging in $K=D e_i \tilde{e}_j^\tau , \{e_i\}, \{\tilde{e}_j\}$ canonical bases of $\mathbb{R }^p, \mathbb{R }^q$, respectively, we see that (8.4) is equivalent to

$$\begin{aligned} \text{ E} \pi (X- \hat{K}Y)Y^\tau = 0 \iff \pi K \mathop {\text{ Cov}}\nolimits (Y)= \pi \mathop {\text{ Cov}}\nolimits (X,Y) \end{aligned}$$

(8.5)

where $\pi =D^-D$ is the orthogonal projector onto the column space of D. But $y\in \mathbb{R }^q$ can only lie in $\ker \mathop {\text{ Cov}}\nolimits (Y)$ if $y\in \ker \mathop {\text{ Cov}}\nolimits (X,Y)$. Hence indeed $\hat{K} \mathop {\text{ Cov}}\nolimits (Y)=\mathop {\text{ Cov}}\nolimits (X,Y)$, and the uniqueness assertion is obvious. We write $\pi _D$ and $\pi _{C}$ for the orthogonal projectors to the column spaces of D and $\mathop {\text{ Cov}}\nolimits (Y)$, respectively and $\bar{\pi }_{C}=\mathbb{I }_q-\pi _{C}, \bar{\pi }_D=\mathbb{I }_p-\pi _D$ for the corresponding complementary projectors. Then we see that $\hat{K} = \hat{K} \pi _{C}$ and for any $A \in \mathbb{R }^{p\times q}$ with $A\mathop {\text{ Cov}}\nolimits Y =0$ we have $A=A\bar{\pi }_{C}$ and for any $B \in \mathbb{R }^{p\times q}$ with $DB =0$ we have $B=\bar{\pi }_{D} B$; hence

$$\begin{aligned} \Vert \hat{K} + A + B\Vert ^2&= \mathop { tr{}}\hat{K}^\tau \hat{K} + 2\mathop { tr{}}A ^\tau \hat{K} + 2\mathop { tr{}}B^\tau \hat{K} + \mathop { tr{}}(A+B)^\tau (A+B)\\&= \Vert \hat{K} \Vert ^2 + 2\mathop { tr{}}\hat{K} \pi _{C} \bar{\pi }_{C} A ^\tau + 2\mathop { tr{}}\hat{K} \pi _{D} \bar{\pi }_{D} B ^\tau + \Vert A+B \Vert ^2 \\&= \Vert \hat{K} \Vert ^2 + \Vert A +B \Vert ^2 \ge \Vert \hat{K} \Vert ^2\quad \text{ with} \text{ equality} \text{ iff} A+B=0\text{.} \end{aligned}$$

$\square $

1.2 Sketch of the optimality of the rLS.AO

The rLS filter is optimally-robust in some sense: To see this, in a first step we essentially boil down our SSM to (3.9), i.e., we have an unobservable but interesting state $X\sim P^X(dx)$, where for technical reasons we assume that in the ideal model $\text{ E} |X|^2 <\infty $. Instead of X, for some $Z\in \mathbb{R }^{q\times p}$, we rather observe the sum $Y=ZX+\varepsilon $ of X and a stochastically independent error $\varepsilon $. As (wide-sense) AO model, we consider the SO outlier of (2.5), (2.6). The corresponding neighborhood is defined as

$$\begin{aligned} \mathcal{U }^\mathrm{SO}(r)=\bigcup _{0\le s\le r} \Big \{\mathcal{L }(X,Y^\mathrm{re}) \,|\, Y^\mathrm{re} \; \text{ acc.} \text{ to} \text{(2.5)} \text{ and} \text{(2.6)} \text{ with} \text{ radius} s\Big \}. \end{aligned}$$

(8.6)

In this setting we may formulate two typical robust optimization problems, i.e., a minimax formulation, and, in the spirit of (Hampel (1968), Lemma 5), a formulation where robustness enters as side condition on the bias to be fulfilled on the whole neighborhood

$$\begin{aligned}&\text{[Minmax-SO]}\quad \max \nolimits _{\mathcal{U }}\, \text{ E}_\mathrm{\mathrm re} |X-f(Y^\mathrm{re})|^2 = \min \nolimits _f{}! \end{aligned}$$

(8.7)

$$\begin{aligned}&\text{[Lemma-5]}\quad \text{ E}_\mathrm{\mathrm id} |X-f(Y^\mathrm{id})|^2 = \min \nolimits _f{}! \quad \text{ s.t.}\,\sup \nolimits _\mathcal{U }\big |\text{ E}_\mathrm{\mathrm re} f(Y^\mathrm{re})- \text{ E} X \big |\le b .\qquad \quad \end{aligned}$$

(8.8)

Then one can show that setting $D(y)=\text{ E}_\mathrm{\mathrm id}[X|Y=y]-\text{ E} X$, the solution to both problems is $\hat{f}(y)=\text{ E} X +H_\rho (D(y))$ (with $b=\rho /r$ in Problem (8.8)), and that this is just the (one-step) rLS, once $\text{ E}_\mathrm{\mathrm id}[X|Y]$ is linear in Y. A proof to this assertion is given in (Ruckdeschel (2010a), Thm. 3.2).

Remark 8.2

(a)
As mentioned in Sect. 3.2, Cipra and Hanzak (2011) show an optimality similar to the one for Problem (8.8), and hence, non-surprisingly come up with a similar procedure.
(b)
The ACM filter by Masreliez and Martin (1977), an early competitor to the rLS, by analogy applies Huber’s (1964) minimax variance result to the “random location parameter X” setting of (3.9). They come up with redescenders as filter f. Hence the ACM filter is not so much vulnerable in the extreme tails but rather where the corresponding $\psi $ function takes its maximum in absolute value. Care has to be taken, as such “inliers” producing the least favorable situation for the ACM are much harder to detect on naïve data inspection, in particular in higher dimensions.
(c)
For exact SO-optimality of the rLS-filter, linearity of the ideal conditional expectation is crucial. However, one can show that $E_\mathrm{id}[\Delta X|\Delta Y]$ is linear iff $\Delta X$ is normal, but, having used the rLS-filter in the $\Delta X$-past, normality cannot hold, see (Ruckdeschel (2010a), Prop.’s 3.4, 3.6).
(d)
Although rLS fails to be SO-optimal for $t>1$, it does performs quite well at both simulations and real data. To some extent this can be explained by passing to a certain extension of the original SO-neighborhoods. For details see (Ruckdeschel (2010a), Thm. 3.10, Prop. 3.11).

1.3 Optimality of the rLS.IO

This section discusses (one-step) optimality of the rLS.IO in some detail. We omit time indices and write $\Sigma $ for $\Sigma _{t|t-1}$. To start, let us again look at the boiled down model (3.9) where we interchange the rôle of $\varepsilon $ and X, and note that $X-f(Y)=-(\varepsilon -g(Y))$ for $f(Y)=Y-g(Y)$. Hence in this simple model, the optimal reconstruction of a corrupted X assuming that $\varepsilon $ is still from the ideal distribution is just $Y-g(Y), g(Y)$ the optimal reconstruction of $\varepsilon $ in the same situation.

In notation, let us write $\mathop { oP}(a|b)$ for the best linear reconstruction of a by means of b, i.e., the orthogonal projection of a onto the closed linear space generated by b.

Assuming linear conditional expectations and mutatis mutandis in (Ruckdeschel (2010a), Thm. 3.2), the optimally-robust reconstruction of $\varepsilon $ given Y in the sense of Problems (8.7), (8.8) is just $H_b(\mathop {oP}(\varepsilon |Y))$—with the same caveats as to the optimality for larger time indices as in Remark 8.2. But again, $\mathop {oP}(\varepsilon |Y)=\mathop {oP}(Y-X|Y)=Y-\mathop {oP}(X|Y)$ so the IO-optimal procedure $f_\mathrm{IO}$ is

$$\begin{aligned} f_\mathrm{IO}(Y)=Y-H_b(Y-\mathop {oP}(X|Y)). \end{aligned}$$

(8.9)

Details as to the translation of the contamination neighborhoods and exact formulations of the optimality results are given in Ruckdeschel (2010a).

The general setup with some arbitrary $Z\in \mathbb{R }^{q\times p}$, where Z in general is not invertible, and moreover, even $Z^\tau \Sigma Z$ may be singular, is not trivial, though. For instance, our preceding argument so far only covers reconstruction of ZX, but at this stage it is not obvious how to optimally derive a reconstruction of X from this. In particular, in this general case, there are directions which our (robustified) reconstruction cannot see—at least all directions in $\ker Z$. So an unbounded criterion like MSE would play havoc once unbounded contamination happens in these directions. So in this context, the best we can do is optimally reconstructing ZX on the whole neighborhood generated by outliers in X and then, in a second step, for this best reconstruction of ZX, find the best back-transform to X in the ideal model setting. The question is how much we loose by this. To this end, note that

$$\begin{aligned} \mathop {oP}(\varepsilon |Y)=\mathop {oP}(Y-ZX|Y)=Y-Z\mathop { oP}(X|Y)=(\mathbb{I }_q-ZK)Y. \end{aligned}$$

(8.10)

For $Z^\Sigma $ from (3.12), we introduce the orthogonal projector onto the column space of $Z^\tau \Sigma Z$ and its orthogonal complement as

$$\begin{aligned} \pi _{Z,\Sigma } = Z Z^\Sigma ,\quad \bar{\pi }_{Z,\Sigma }:= \mathbb{I }_q-\pi _{Z,\Sigma }. \end{aligned}$$

(8.11)

Then we have the following Lemma:

Lemma 8.3

(a)
For any positive definite $D, Z^\Sigma $ from (3.12) solves
$$\begin{aligned} \text{ E}_\mathrm{id} \Vert X- A \mathop {oP}(Z X|Y)\Vert _D^2=\;\min {}!,\quad A\in \mathbb{R }^{p\times q}. \end{aligned}$$
(8.12)
(b)
$\Sigma Z^\tau \bar{\pi }_{Z,\Sigma } = 0$; in particular, no matter of the rank of Z or $\pi _{Z,\Sigma }$, with $K=\Sigma Z^\tau C^-$,
$$\begin{aligned} Z^{\Sigma } Z K = K. \end{aligned}$$
(8.13)

Proof

(a) As in Lemma 8.1, we see that

$$\begin{aligned} \hat{A}=\Sigma Z^\tau K^\tau Z^\tau (ZK\mathop {\text{ Cov}}\nolimits (Y)K^\tau Z^\tau )^-. \end{aligned}$$

(8.14)

Abbreviating $Z \Sigma Z^\tau $ by B and $\mathop {\text{ Cov}}\nolimits (Y)$ by C, this gives $ \hat{A}=\Sigma Z^\tau C^- B(BC^-B)^- $, and with $\Sigma _{.5}$ the symmetric root of $\Sigma $, and with $G=\Sigma _{.5}Z^\tau $, this becomes $\hat{A}=\Sigma _{.5} G C^- G^\tau G(G^\tau G C^-G^\tau G)^-$. Next we pass to the singular value decomposition of $G=USW^\tau $, with $U, W$ corresponding orthogonal matrices in $\mathbb{R }^{p\times p}$ and $\mathbb{R }^{q\times q}$, respectively, and $S\in \mathbb{R }^{p\times q}$ a matrix with the singular values on the “diagonal entries” $S_{i,i}, i=1,\ldots ,\min (p,q)$ and $S_{i,j}=0, i\not =j$; furthermore, $S_{i,i}>0$ for $i=1,\ldots ,d$, $d\le \min (p,q)$ and 0 else. Using $(aba^\tau )^-=(a^\tau )^{-1}b^- a^{-1}$ for a invertible and setting $T=S^\tau S $, we obtain

$$\begin{aligned} \hat{A}&= \Sigma _{.5} USW^\tau C^- W T W^\tau (WTW^\tau C^-WTW^\tau )^-\\&= \Sigma _{.5} USW^\tau C^- W T (TW^\tau C^-WT)^-W^\tau . \end{aligned}$$

As the expressions of the symmetric matrices $W^\tau C^- W$ are surrounded by S (resp. T-)-terms, we may replace them with a matrix $R\in \mathbb{R }^{q\times q}$ with only entries in the upper $d\times d$ block, i.e., $\hat{A}=\Sigma _{.5} US R T (T R T)^- W^\tau $ and as R now is compatible with S and T,

$$\begin{aligned} \hat{A}= \Sigma _{.5} US R 1_d R^- T^- W^\tau = \Sigma _{.5} US R R^- T^- W^\tau ,\quad \text{ for}\, 1_d=TT^-. \end{aligned}$$

Now, as $C=WTW^\tau + V, W^\tau C^- W=(T+ W^\tau V W)^-$, in particular the upper $d\times d$ block $R_d$ of $R=1_d (T+ W^\tau V W)^- 1_d$ is invertible and

$$\begin{aligned} \hat{A}=\Sigma _{.5} US T^- W^\tau (=\Sigma _{.5} US^- W^\tau )= \Sigma _{.5} USW^\tau W T^-W^\tau = \Sigma Z^\tau B^- = Z^\Sigma . \end{aligned}$$

(b) We start by noting that $\Sigma Z^\tau \bar{\pi }_{Z,\Sigma } = \Sigma _{.5} USW^\tau W (\mathbb{I }_q-1_d) W^\tau =0 $. For (8.13), we write $ K=\Sigma Z^\tau (\pi _{Z,\Sigma }+\bar{\pi }_{Z,\Sigma }) C^- = \Sigma Z^\tau \pi _{Z,\Sigma } C^- = Z^\Sigma Z K $. $\square $

As a consequence of assertion (b) in the preceding Lemma, we obtain

Corollary 8.4

No matter of the rank of Z or $\pi _{Z,\Sigma }$,

$$\begin{aligned} \mathop {oP}(X|Y) = Z^\Sigma (Y -\mathop {oP}(\varepsilon |Y) ) \end{aligned}$$

(8.15)

that is, we can exactly recover $\mathop {oP}(X|Y)$ from $\mathop {oP}(\varepsilon |Y)$, and passing over the reconstruction of ZX first does not cost us anything in efficiency compared to the direct route.

Proof

We only note that $Y -\mathop {oP}(\varepsilon |Y)=\mathop { oP}(ZX |Y) = Z K Y$.

To keep things well-defined in this setting where we have “invisible directions” in the state, we may recur to passing to a semi-norm in X-space which ignores such directions. A possible candidate for $D$ in Lemma 8.1 is

$$\begin{aligned} D^-=(Z^\Sigma Z)^\tau \Sigma ^- Z^\Sigma Z. \end{aligned}$$

(8.16)

On the one hand, as we show below, invisible directions get ignored, on the other hand, by (8.13), no direction visible for the classically optimal procedure is lost. $\square $

Proposition 8.5

Using D from (8.16) and assuming observation errors from the ideal situation, maximal MSE error for rLS.IO measured in this norm remains bounded for IO contamination. With this norm, $\hat{K}$ is smallest possible solution to (8.3) in Frobenius norm.

Proof

The error term $e=X-\hat{X}$ for the rLS.IO can be written as

$$\begin{aligned} e=X-Z^\Sigma (Y-H_b(Y-ZKY))= (\mathbb{I }_p-Z^\Sigma Z)X -Z^\Sigma (\varepsilon - H_b((\mathbb{I }_q-ZK)Y)). \end{aligned}$$

As $(Z^\Sigma Z)^2 = Z^\Sigma Z$, we see that $(\mathbb{I }_p-Z^\Sigma Z)(Z^\Sigma Z)=0$, so that in D-semi-norm, the $(\mathbb{I }_p-Z^\Sigma Z)X$ terms cancel out and we get

$$\begin{aligned} e^\tau D^- e&= [Z^\Sigma (\varepsilon - H_b(\,\cdot \,))]^\tau (Z^\Sigma Z)^\tau \Sigma ^- Z^\Sigma Z [Z^\Sigma (\varepsilon - H_b(\,\cdot \,))]\\&= (\varepsilon - H_b(\,\cdot \,))^\tau (Z^\Sigma )^\tau (Z^\Sigma Z)^\tau \Sigma ^- Z^\Sigma Z Z^\Sigma (\varepsilon - H_b(\,\cdot \,))\\&= (\varepsilon - H_b(\,\cdot \,))^\tau B^- (\varepsilon - H_b(\,\cdot \,)) \le 2 \varepsilon ^\tau B^- \varepsilon + 2 H_b(\,\cdot \,)^\tau B^- H_b(\,\cdot \,) \end{aligned}$$

so MSE is bounded by $2 \mathop { tr{}}(B^- (V + b^2 \mathbb{I }_q))$. The second assertion is an immediate consequence of Lemmas 8.1 and 8.3(b). $\square $

Note that changing the norm in the Y-space is not necessary for boundedness reasons, as with only ideally distributed $\varepsilon $, the reconstruction of ZX can be achieved such that no matter how largely $\Delta X$ is contaminated, the maximal MSE remains bounded.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruckdeschel, P., Spangl, B. & Pupashenko, D. Robust Kalman tracking and smoothing with propagating and non-propagating outliers. Stat Papers 55, 93–123 (2014). https://doi.org/10.1007/s00362-012-0496-4

Download citation

Received: 16 April 2012
Revised: 12 December 2012
Published: 09 January 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00362-012-0496-4

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Kalman tracking and smoothing with propagating and non-propagating outliers

Abstract

Access this article

Similar content being viewed by others

Optimization Viewpoint on Kalman Smoothing with Applications to Robust and Sparse Estimation

Extended Kalman Filters

Extended Kalman Filters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Optimality of the classical Kalman filter

Lemma 8.1

Proof

1.2 Sketch of the optimality of the rLS.AO

Remark 8.2

1.3 Optimality of the rLS.IO

Lemma 8.3

Proof

Corollary 8.4

Proof

Proposition 8.5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Robust Kalman tracking and smoothing with propagating and non-propagating outliers

Abstract

Access this article

Similar content being viewed by others

Optimization Viewpoint on Kalman Smoothing with Applications to Robust and Sparse Estimation

Extended Kalman Filters

Extended Kalman Filters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Optimality of the classical Kalman filter

Lemma 8.1

Proof

1.2 Sketch of the optimality of the rLS.AO

Remark 8.2

1.3 Optimality of the rLS.IO

Lemma 8.3

Proof

Corollary 8.4

Proof

Proposition 8.5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation