Skip to main content
Log in

Robust Kalman tracking and smoothing with propagating and non-propagating outliers

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

A common situation in filtering where classical Kalman filtering does not perform particularly well is tracking in the presence of propagating outliers. This calls for robustness understood in a distributional sense, i.e.; we enlarge the distribution assumptions made in the ideal model by suitable neighborhoods. Based on optimality results for distributional-robust Kalman filtering from Ruckdeschel (Ansätze zur Robustifizierung des Kalman-Filters, vol 64, 2001; Optimally (distributional-)robust Kalman filtering, arXiv: 1004.3393, 2010a), we propose new robust recursive filters and smoothers designed for this purpose as well as specialized versions for non-propagating outliers. We apply these procedures in the context of a GPS problem arising in the car industry. To better understand these filters, we study their behavior at stylized outlier patterns (for which they are not designed) and compare them to other approaches for the tracking problem. Finally, in a simulation study we discuss efficiency of our procedures in comparison to competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Anderson BDO, Moore JB (1990) Optimal control. Linear quadratic methods. Prentice Hall, New York

    MATH  Google Scholar 

  • Anscombe FJ (1960) Rejection of outliers. Technometrics 2:123–147

    Article  MATH  MathSciNet  Google Scholar 

  • Birmiwal K, Papantoni-Kazakos P (1994) Outlier resistant prediction for stationary processes. Stat Decis 12(4):395–427

    MATH  MathSciNet  Google Scholar 

  • Birmiwal K, Shen J (1993) Optimal robust filtering. Stat Decis 11(2):101–119

    MATH  MathSciNet  Google Scholar 

  • Boncelet CG Jr, Dickinson BW (1983) An approach to robust Kalman filtering. In: Proceedings on the 22nd IEEE conference on decision and control, vol 1, pp 304–305

  • Boncelet CG Jr, Dickinson BW (1987) An extension to the SRIF Kalman filter. IEEE Trans Autom Control AC-32:176–179

    Google Scholar 

  • Cipra T, Hanzak T (2011) Exponential smoothing for a time series with outliers. Kybernetika 47:165–178

    MATH  MathSciNet  Google Scholar 

  • Cipra T, Romera R (1991) Robust Kalman filter and its application in time series analysis. Kybernetika 27(6):481–494

    MATH  MathSciNet  Google Scholar 

  • Donoho D, Johnstone I (1994) Ideal spatial adaptation via wavelet shrinkage. Biometrika 81:425–455

    Article  MATH  MathSciNet  Google Scholar 

  • Ershov AA, Liptser RS (1978) Robust Kalman filter in discrete time. Automat Remote Control 39:359–367

    MATH  Google Scholar 

  • Fox AJ (1972) Outliers in time series. J R Stat Soc B 34:350–363

    MATH  Google Scholar 

  • Franke J (1985) Minimax-robust prediction of discrete time series. Z Wahrscheinlichkeitstheor Verw Geb 68:337–364

    Article  MATH  MathSciNet  Google Scholar 

  • Franke J, Poor HV (1984) Minimax-robust filtering and finite-length robust predictors. In: Robust and nonlinear time series analysis. Proceedings of workshop, Heidelberg, Germany, 1983, Nr. 26 in Lecture notes in statistics. Springer, New York

  • Fried R, Schettlinger K (2010) robfilter: robust time series filters. R package version 2.6.1. Available on CRAN. http://cran.r-project.org/web/packages/robfilter

  • Fried R, Bernholt T, Gather U (2006) Repeated median and hybrid filters. Comput Stat Data Anal 50:2313–2338

    Article  MATH  MathSciNet  Google Scholar 

  • Gelper S, Fried R, Croux C (2010) Robust forecasting with exponential and holt-winters smoothing. J Forecast 29:285–300

    Google Scholar 

  • Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2011) mvtnorm: multivariate normal and t distributions. R package version 0.9-9991. Available on CRAN. http://CRAN.R-project.org/package=mvtnorm

  • Hampel FR (1968) Contributions to the theory of robust estimation. Dissertation, University of California, Berkeley

  • Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York

    MATH  Google Scholar 

  • Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101

    Article  MATH  Google Scholar 

  • Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng Trans ASME 82:35–45

    Google Scholar 

  • Kassam SA, Poor HV (1985) Robust techniques for signal processing: a survey. Proc IEEE 73(3):433–481

    Article  MATH  Google Scholar 

  • Künsch H (2001) State space models and hidden Markov models. In: Barndorff-Nielsen OE, Cox DR, Klüppelberg C (eds) Complex stochastic systems. Chapman and Hall, New York, pp 109–173

  • Martin RD, Yohai VJ (1986) Influence functionals for time series (with discussion). Ann Stat 14:781–818

    Article  MATH  MathSciNet  Google Scholar 

  • Masreliez CJ, Martin R (1977) Robust Bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans Autom Control AC-22:361–371

    Google Scholar 

  • R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org

  • Rieder H, Kohl M, Ruckdeschel P (2008) The costs of not knowing the radius. Stat Methods Appl 17(1):13–40

    Google Scholar 

  • Ruckdeschel P (2000) Robust Kalman filtering. In: Hárdle W, Hlávka Z, Klinke S (eds) XploRe. Application guide, chap 18. Springer, New York, pp 483–516

  • Ruckdeschel P (2001) Ansätze zur Robustifizierung des Kalman-Filters, vol 64. Bayreuther Mathematische Schriften, Bayreuth

    MATH  Google Scholar 

  • Ruckdeschel P (2010a) Optimally (distributional-)robust Kalman filtering. Available on arXiv: 1004.3393

  • Ruckdeschel P (2010b) Optimally robust Kalman filtering. Technical report 185, Fraunhofer ITWM Kaiserslautern, Kaiserslautern. http://www.itwm.fraunhofer.de/fileadmin/ITWM-Media/Zentral/Pdf/Berichte_ITWM/2010/bericht_185.pdf

  • Rudin W (1974) Real and complex analysis, 2nd edn. McGraw-Hill, New York

  • Schettlinger K (2009) Signal and variability extraction for online monitoring in intensive care. Dissertation, TU Dortmund, Dortmund. https://eldorado.tu-dortmund.de/bitstream/2003/26044/1/Dissertation-Schettlinger-Internetpublikation.pdf

  • Schick IC, Mitter SK (1994) Robust recursive estimation in the presence of heavy-tailed observation noise. Ann Stat 22(2):1045–1080

    Article  MATH  MathSciNet  Google Scholar 

  • Shumway RH, Stoffer DS (1982) An approach to time series smoothing and forecasting using the EM algorithm. J Time Ser Anal 3:253–264

    Article  MATH  Google Scholar 

  • Spangl B (2008) On robust spectral density estimation. Dissertation, Department of Statistics and Probability Theory, Vienna University of Technology, Vienna

  • Stockinger N, Dutter R (1987) Robust time series analysis: a survey. Kybernetika 23 Supplement

  • Venables W, Ripley B (2002) Modern applied statistics with S-plus, 4th edn. Springer, New York

  • Wan EA, van der Merwe R (2002) The unscented Kalman filter. In: Haykin S (ed) Kalman filtering and neural networks. Wiley, New York

Download references

Acknowledgments

The authors thank two anonymous referees for their valuable and helpful comments. Financial support from VW foundation in the framework of project Robust Risk Estimation for D. Pupashenko is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Ruckdeschel.

Appendix

Appendix

1.1 Optimality of the classical Kalman filter

Optimality of the classical Kalman filter among all linear filters in \(L_2\)-sense and, under normality of the error and innovation distributions, among all measurable filters is a well-known fact, compare, e.g. (Anderson and Moore (1990), Sect. 5.2). As we will need some of the arguments later, let us complement this fact by some generalization to arbitrary norms generated by a quadratic form and by a thorough treatment of the case of singularities in the covariances arising in the definition of the Kalman gain from (3.5). To do so, we take the orthogonal decomposition of the Hilbert into closed linear subspaces as \( \mathrm{lin}(Y_{1:(t-1)})\oplus \mathrm{lin} (\Delta Y_{t})\) as granted and for \(X=X_t-X_{t|t-1}\) and \(Y=\Delta Y_t\) as given in (3.5), derive \(\hat{K}_t\). To this end, for any matrix A let us denote by \(A^-\) the generalized inverse of A with the defining properties

$$\begin{aligned} A^-AA^-=A^-,\quad AA^-A = A,\quad A^-A = (A^-A)^\tau ,\quad AA^- = (AA^-)^\tau \end{aligned}$$
(8.1)

and for D a positive semi-definite symmetric matrix in \(\mathbb{R }^{p\times p}\) and for \(x\in \mathbb{R }^p\) define the semi-norm generated by D as \(\Vert x\Vert ^2_D:=x^\tau D^{-} x\).

Lemma 8.1

Let \(p,q \in \mathbb N , P\) some probability and \(X\in L_2^p(P), Y\in L^q_2(P), \text{ E} X=0, \text{ E} Y=0\), where for some \(Z\in \mathbb{R }^{q\times p}\), and some \(\varepsilon \in L_2^q(P)\) independent of \(X, Y=ZX+\varepsilon \). Let D a positive semi-definite symmetric matrix in \(\mathbb{R }^{p\times p}\). Then

$$\begin{aligned} \hat{K} = \mathop {\text{ Cov}}\nolimits (X,Y) \mathop {\text{ Cov}}\nolimits (Y)^- \end{aligned}$$
(8.2)

solves

$$\begin{aligned} \text{ E} \Vert X-KY\Vert ^2_D = \, \min {}!,\quad K \in \mathbb{R }^{p\times q} \end{aligned}$$
(8.3)

\(\hat{K}\) is unique up to addition of some \(A \in \mathbb{R }^{p\times q}\) such that \(A\mathop {\text{ Cov}}\nolimits Y =0\) and some \(B \in \mathbb{R }^{p\times q}\) such that \(DB=0\). If \(\hat{K} = D D^- \hat{K}, \hat{K}\) has smallest Frobenius norm among all solutions K to (8.3).

Proof

Denote \(L_2^p(P,D)\) the Hilbert space generated by all \(\mathbb{R }^p\) valued random variables U such that \(\text{ E}_P \Vert U\Vert _D^2 <\infty \)—after a passage to equivalence classes of all random variables \(U, U^{\prime }\) such that \(\text{ E}_P \Vert U-U^{\prime }\Vert _D^2 = 0\). Let \(S=\mathop {\text{ Cov}}\nolimits (X)\) and \(V=\mathop {\text{ Cov}}\nolimits (\varepsilon )\). Then \(\mathop {\text{ Cov}}\nolimits (X,Y)=SZ^\tau \) and \(\mathop {\text{ Cov}}\nolimits (Y)=ZSZ^\tau +V\). Denote the approximation space \(\{ KY\,\mid \, K\in \mathbb{R }^{p\times q}\}\subset L_2^p(P,D)\) by \(\mathcal{K }\). \(\mathcal{ K }\) is a closed linear subspace of \(L_2^p(P,D)\), hence by Rudin (1974)[Thm. 4.10] there exists a unique minimizer \(\hat{X} = \hat{K} Y \in \mathcal{K }\) to problem (8.3). It is characterized by

$$\begin{aligned} \text{ E} (X- \hat{K}Y)^\tau D^{-} KY = 0,\quad \forall K \in \mathbb{R }^{p\times q}. \end{aligned}$$
(8.4)

Plugging in \(K=D e_i \tilde{e}_j^\tau , \{e_i\}, \{\tilde{e}_j\}\) canonical bases of \(\mathbb{R }^p, \mathbb{R }^q\), respectively, we see that (8.4) is equivalent to

$$\begin{aligned} \text{ E} \pi (X- \hat{K}Y)Y^\tau = 0 \iff \pi K \mathop {\text{ Cov}}\nolimits (Y)= \pi \mathop {\text{ Cov}}\nolimits (X,Y) \end{aligned}$$
(8.5)

where \(\pi =D^-D\) is the orthogonal projector onto the column space of D. But \(y\in \mathbb{R }^q\) can only lie in \(\ker \mathop {\text{ Cov}}\nolimits (Y)\) if \(y\in \ker \mathop {\text{ Cov}}\nolimits (X,Y)\). Hence indeed \(\hat{K} \mathop {\text{ Cov}}\nolimits (Y)=\mathop {\text{ Cov}}\nolimits (X,Y)\), and the uniqueness assertion is obvious. We write \(\pi _D\) and \(\pi _{C}\) for the orthogonal projectors to the column spaces of D and \(\mathop {\text{ Cov}}\nolimits (Y)\), respectively and \(\bar{\pi }_{C}=\mathbb{I }_q-\pi _{C}, \bar{\pi }_D=\mathbb{I }_p-\pi _D\) for the corresponding complementary projectors. Then we see that \(\hat{K} = \hat{K} \pi _{C}\) and for any \(A \in \mathbb{R }^{p\times q}\) with \(A\mathop {\text{ Cov}}\nolimits Y =0\) we have \(A=A\bar{\pi }_{C}\) and for any \(B \in \mathbb{R }^{p\times q}\) with \(DB =0\) we have \(B=\bar{\pi }_{D} B\); hence

$$\begin{aligned} \Vert \hat{K} + A + B\Vert ^2&= \mathop { tr{}}\hat{K}^\tau \hat{K} + 2\mathop { tr{}}A ^\tau \hat{K} + 2\mathop { tr{}}B^\tau \hat{K} + \mathop { tr{}}(A+B)^\tau (A+B)\\&= \Vert \hat{K} \Vert ^2 + 2\mathop { tr{}}\hat{K} \pi _{C} \bar{\pi }_{C} A ^\tau + 2\mathop { tr{}}\hat{K} \pi _{D} \bar{\pi }_{D} B ^\tau + \Vert A+B \Vert ^2 \\&= \Vert \hat{K} \Vert ^2 + \Vert A +B \Vert ^2 \ge \Vert \hat{K} \Vert ^2\quad \text{ with} \text{ equality} \text{ iff} A+B=0\text{.} \end{aligned}$$

\(\square \)

1.2 Sketch of the optimality of the rLS.AO

The rLS filter is optimally-robust in some sense: To see this, in a first step we essentially boil down our SSM to (3.9), i.e., we have an unobservable but interesting state \(X\sim P^X(dx)\), where for technical reasons we assume that in the ideal model \(\text{ E} |X|^2 <\infty \). Instead of X, for some \(Z\in \mathbb{R }^{q\times p}\), we rather observe the sum \(Y=ZX+\varepsilon \) of X and a stochastically independent error \(\varepsilon \). As (wide-sense) AO model, we consider the SO outlier of (2.5), (2.6). The corresponding neighborhood is defined as

$$\begin{aligned} \mathcal{U }^\mathrm{SO}(r)=\bigcup _{0\le s\le r} \Big \{\mathcal{L }(X,Y^\mathrm{re}) \,|\, Y^\mathrm{re} \; \text{ acc.} \text{ to} \text{(2.5)} \text{ and} \text{(2.6)} \text{ with} \text{ radius} s\Big \}. \end{aligned}$$
(8.6)

In this setting we may formulate two typical robust optimization problems, i.e., a minimax formulation, and, in the spirit of (Hampel (1968), Lemma 5), a formulation where robustness enters as side condition on the bias to be fulfilled on the whole neighborhood

$$\begin{aligned}&\text{[Minmax-SO]}\quad \max \nolimits _{\mathcal{U }}\, \text{ E}_\mathrm{\mathrm re} |X-f(Y^\mathrm{re})|^2 = \min \nolimits _f{}! \end{aligned}$$
(8.7)
$$\begin{aligned}&\text{[Lemma-5]}\quad \text{ E}_\mathrm{\mathrm id} |X-f(Y^\mathrm{id})|^2 = \min \nolimits _f{}! \quad \text{ s.t.}\,\sup \nolimits _\mathcal{U }\big |\text{ E}_\mathrm{\mathrm re} f(Y^\mathrm{re})- \text{ E} X \big |\le b .\qquad \quad \end{aligned}$$
(8.8)

Then one can show that setting \(D(y)=\text{ E}_\mathrm{\mathrm id}[X|Y=y]-\text{ E} X\), the solution to both problems is \(\hat{f}(y)=\text{ E} X +H_\rho (D(y))\) (with \(b=\rho /r\) in Problem (8.8)), and that this is just the (one-step) rLS, once \(\text{ E}_\mathrm{\mathrm id}[X|Y]\) is linear in Y. A proof to this assertion is given in (Ruckdeschel (2010a), Thm. 3.2).

Remark 8.2

 

  1. (a)

    As mentioned in Sect. 3.2, Cipra and Hanzak (2011) show an optimality similar to the one for Problem (8.8), and hence, non-surprisingly come up with a similar procedure.

  2. (b)

    The ACM filter by Masreliez and Martin (1977), an early competitor to the rLS, by analogy applies Huber’s (1964) minimax variance result to the “random location parameter X” setting of (3.9). They come up with redescenders as filter f. Hence the ACM filter is not so much vulnerable in the extreme tails but rather where the corresponding \(\psi \) function takes its maximum in absolute value. Care has to be taken, as such “inliers” producing the least favorable situation for the ACM are much harder to detect on naïve data inspection, in particular in higher dimensions.

  3. (c)

    For exact SO-optimality of the rLS-filter, linearity of the ideal conditional expectation is crucial. However, one can show that \(E_\mathrm{id}[\Delta X|\Delta Y]\) is linear iff \(\Delta X\) is normal, but, having used the rLS-filter in the \(\Delta X\)-past, normality cannot hold, see (Ruckdeschel (2010a), Prop.’s 3.4, 3.6).

  4. (d)

    Although rLS fails to be SO-optimal for \(t>1\), it does performs quite well at both simulations and real data. To some extent this can be explained by passing to a certain extension of the original SO-neighborhoods. For details see (Ruckdeschel (2010a), Thm. 3.10, Prop. 3.11).

 

1.3 Optimality of the rLS.IO

This section discusses (one-step) optimality of the rLS.IO in some detail. We omit time indices and write \(\Sigma \) for \(\Sigma _{t|t-1}\). To start, let us again look at the boiled down model (3.9) where we interchange the rôle of \(\varepsilon \) and X, and note that \(X-f(Y)=-(\varepsilon -g(Y))\) for \(f(Y)=Y-g(Y)\). Hence in this simple model, the optimal reconstruction of a corrupted X assuming that \(\varepsilon \) is still from the ideal distribution is just \(Y-g(Y), g(Y)\) the optimal reconstruction of \(\varepsilon \) in the same situation.

In notation, let us write \(\mathop { oP}(a|b)\) for the best linear reconstruction of a by means of b, i.e., the orthogonal projection of a onto the closed linear space generated by b.

Assuming linear conditional expectations and mutatis mutandis in (Ruckdeschel (2010a), Thm. 3.2), the optimally-robust reconstruction of \(\varepsilon \) given Y in the sense of Problems (8.7), (8.8) is just \(H_b(\mathop {oP}(\varepsilon |Y))\)—with the same caveats as to the optimality for larger time indices as in Remark 8.2. But again, \(\mathop {oP}(\varepsilon |Y)=\mathop {oP}(Y-X|Y)=Y-\mathop {oP}(X|Y)\) so the IO-optimal procedure \(f_\mathrm{IO}\) is

$$\begin{aligned} f_\mathrm{IO}(Y)=Y-H_b(Y-\mathop {oP}(X|Y)). \end{aligned}$$
(8.9)

Details as to the translation of the contamination neighborhoods and exact formulations of the optimality results are given in Ruckdeschel (2010a).

The general setup with some arbitrary \(Z\in \mathbb{R }^{q\times p}\), where Z in general is not invertible, and moreover, even \(Z^\tau \Sigma Z\) may be singular, is not trivial, though. For instance, our preceding argument so far only covers reconstruction of ZX, but at this stage it is not obvious how to optimally derive a reconstruction of X from this. In particular, in this general case, there are directions which our (robustified) reconstruction cannot see—at least all directions in \(\ker Z\). So an unbounded criterion like MSE would play havoc once unbounded contamination happens in these directions. So in this context, the best we can do is optimally reconstructing ZX on the whole neighborhood generated by outliers in X and then, in a second step, for this best reconstruction of ZX, find the best back-transform to X in the ideal model setting. The question is how much we loose by this. To this end, note that

$$\begin{aligned} \mathop {oP}(\varepsilon |Y)=\mathop {oP}(Y-ZX|Y)=Y-Z\mathop { oP}(X|Y)=(\mathbb{I }_q-ZK)Y. \end{aligned}$$
(8.10)

For \(Z^\Sigma \) from (3.12), we introduce the orthogonal projector onto the column space of \(Z^\tau \Sigma Z\) and its orthogonal complement as

$$\begin{aligned} \pi _{Z,\Sigma } = Z Z^\Sigma ,\quad \bar{\pi }_{Z,\Sigma }:= \mathbb{I }_q-\pi _{Z,\Sigma }. \end{aligned}$$
(8.11)

Then we have the following Lemma:

Lemma 8.3

 

  1. (a)

    For any positive definite \(D, Z^\Sigma \) from (3.12) solves

    $$\begin{aligned} \text{ E}_\mathrm{id} \Vert X- A \mathop {oP}(Z X|Y)\Vert _D^2=\;\min {}!,\quad A\in \mathbb{R }^{p\times q}. \end{aligned}$$
    (8.12)
  2. (b)

    \(\Sigma Z^\tau \bar{\pi }_{Z,\Sigma } = 0\); in particular, no matter of the rank of Z or \(\pi _{Z,\Sigma }\), with \(K=\Sigma Z^\tau C^-\),

    $$\begin{aligned} Z^{\Sigma } Z K = K. \end{aligned}$$
    (8.13)

 

Proof

  (a) As in Lemma 8.1, we see that

$$\begin{aligned} \hat{A}=\Sigma Z^\tau K^\tau Z^\tau (ZK\mathop {\text{ Cov}}\nolimits (Y)K^\tau Z^\tau )^-. \end{aligned}$$
(8.14)

Abbreviating \(Z \Sigma Z^\tau \) by B and \(\mathop {\text{ Cov}}\nolimits (Y)\) by C, this gives \( \hat{A}=\Sigma Z^\tau C^- B(BC^-B)^- \), and with \(\Sigma _{.5}\) the symmetric root of \(\Sigma \), and with \(G=\Sigma _{.5}Z^\tau \), this becomes \(\hat{A}=\Sigma _{.5} G C^- G^\tau G(G^\tau G C^-G^\tau G)^-\). Next we pass to the singular value decomposition of \(G=USW^\tau \), with \(U, W\) corresponding orthogonal matrices in \(\mathbb{R }^{p\times p}\) and \(\mathbb{R }^{q\times q}\), respectively, and \(S\in \mathbb{R }^{p\times q}\) a matrix with the singular values on the “diagonal entries” \(S_{i,i}, i=1,\ldots ,\min (p,q)\) and \(S_{i,j}=0, i\not =j\); furthermore, \(S_{i,i}>0\) for \(i=1,\ldots ,d\), \(d\le \min (p,q)\) and 0 else. Using \((aba^\tau )^-=(a^\tau )^{-1}b^- a^{-1}\) for a invertible and setting \(T=S^\tau S \), we obtain

$$\begin{aligned} \hat{A}&= \Sigma _{.5} USW^\tau C^- W T W^\tau (WTW^\tau C^-WTW^\tau )^-\\&= \Sigma _{.5} USW^\tau C^- W T (TW^\tau C^-WT)^-W^\tau . \end{aligned}$$

As the expressions of the symmetric matrices \(W^\tau C^- W\) are surrounded by S (resp. T-)-terms, we may replace them with a matrix \(R\in \mathbb{R }^{q\times q}\) with only entries in the upper \(d\times d\) block, i.e., \(\hat{A}=\Sigma _{.5} US R T (T R T)^- W^\tau \) and as R now is compatible with S and T,

$$\begin{aligned} \hat{A}= \Sigma _{.5} US R 1_d R^- T^- W^\tau = \Sigma _{.5} US R R^- T^- W^\tau ,\quad \text{ for}\, 1_d=TT^-. \end{aligned}$$

Now, as \(C=WTW^\tau + V, W^\tau C^- W=(T+ W^\tau V W)^-\), in particular the upper \(d\times d\) block \(R_d\) of \(R=1_d (T+ W^\tau V W)^- 1_d\) is invertible and

$$\begin{aligned} \hat{A}=\Sigma _{.5} US T^- W^\tau (=\Sigma _{.5} US^- W^\tau )= \Sigma _{.5} USW^\tau W T^-W^\tau = \Sigma Z^\tau B^- = Z^\Sigma . \end{aligned}$$

(b) We start by noting that \(\Sigma Z^\tau \bar{\pi }_{Z,\Sigma } = \Sigma _{.5} USW^\tau W (\mathbb{I }_q-1_d) W^\tau =0 \). For (8.13), we write \( K=\Sigma Z^\tau (\pi _{Z,\Sigma }+\bar{\pi }_{Z,\Sigma }) C^- = \Sigma Z^\tau \pi _{Z,\Sigma } C^- = Z^\Sigma Z K \). \(\square \)  

As a consequence of assertion (b) in the preceding Lemma, we obtain

Corollary 8.4

No matter of the rank of Z or \(\pi _{Z,\Sigma }\),

$$\begin{aligned} \mathop {oP}(X|Y) = Z^\Sigma (Y -\mathop {oP}(\varepsilon |Y) ) \end{aligned}$$
(8.15)

that is, we can exactly recover \(\mathop {oP}(X|Y)\) from \(\mathop {oP}(\varepsilon |Y)\), and passing over the reconstruction of ZX first does not cost us anything in efficiency compared to the direct route.

Proof

We only note that \(Y -\mathop {oP}(\varepsilon |Y)=\mathop { oP}(ZX |Y) = Z K Y\).

To keep things well-defined in this setting where we have “invisible directions” in the state, we may recur to passing to a semi-norm in X-space which ignores such directions. A possible candidate for \(D\) in Lemma 8.1 is

$$\begin{aligned} D^-=(Z^\Sigma Z)^\tau \Sigma ^- Z^\Sigma Z. \end{aligned}$$
(8.16)

On the one hand, as we show below, invisible directions get ignored, on the other hand, by (8.13), no direction visible for the classically optimal procedure is lost. \(\square \)

Proposition 8.5

Using D from (8.16) and assuming observation errors from the ideal situation, maximal MSE error for rLS.IO measured in this norm remains bounded for IO contamination. With this norm, \(\hat{K}\) is smallest possible solution to (8.3) in Frobenius norm.

Proof

The error term \(e=X-\hat{X}\) for the rLS.IO can be written as

$$\begin{aligned} e=X-Z^\Sigma (Y-H_b(Y-ZKY))= (\mathbb{I }_p-Z^\Sigma Z)X -Z^\Sigma (\varepsilon - H_b((\mathbb{I }_q-ZK)Y)). \end{aligned}$$

As \((Z^\Sigma Z)^2 = Z^\Sigma Z\), we see that \((\mathbb{I }_p-Z^\Sigma Z)(Z^\Sigma Z)=0\), so that in D-semi-norm, the \((\mathbb{I }_p-Z^\Sigma Z)X\) terms cancel out and we get

$$\begin{aligned} e^\tau D^- e&= [Z^\Sigma (\varepsilon - H_b(\,\cdot \,))]^\tau (Z^\Sigma Z)^\tau \Sigma ^- Z^\Sigma Z [Z^\Sigma (\varepsilon - H_b(\,\cdot \,))]\\&= (\varepsilon - H_b(\,\cdot \,))^\tau (Z^\Sigma )^\tau (Z^\Sigma Z)^\tau \Sigma ^- Z^\Sigma Z Z^\Sigma (\varepsilon - H_b(\,\cdot \,))\\&= (\varepsilon - H_b(\,\cdot \,))^\tau B^- (\varepsilon - H_b(\,\cdot \,)) \le 2 \varepsilon ^\tau B^- \varepsilon + 2 H_b(\,\cdot \,)^\tau B^- H_b(\,\cdot \,) \end{aligned}$$

so MSE is bounded by \(2 \mathop { tr{}}(B^- (V + b^2 \mathbb{I }_q))\). The second assertion is an immediate consequence of Lemmas 8.1 and 8.3(b). \(\square \)

Note that changing the norm in the Y-space is not necessary for boundedness reasons, as with only ideally distributed \(\varepsilon \), the reconstruction of ZX can be achieved such that no matter how largely \(\Delta X\) is contaminated, the maximal MSE remains bounded.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruckdeschel, P., Spangl, B. & Pupashenko, D. Robust Kalman tracking and smoothing with propagating and non-propagating outliers. Stat Papers 55, 93–123 (2014). https://doi.org/10.1007/s00362-012-0496-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-012-0496-4

Keywords

Mathematics Subject Classification (2000)

Navigation