Abstract
A common situation in filtering where classical Kalman filtering does not perform particularly well is tracking in the presence of propagating outliers. This calls for robustness understood in a distributional sense, i.e.; we enlarge the distribution assumptions made in the ideal model by suitable neighborhoods. Based on optimality results for distributional-robust Kalman filtering from Ruckdeschel (Ansätze zur Robustifizierung des Kalman-Filters, vol 64, 2001; Optimally (distributional-)robust Kalman filtering, arXiv: 1004.3393, 2010a), we propose new robust recursive filters and smoothers designed for this purpose as well as specialized versions for non-propagating outliers. We apply these procedures in the context of a GPS problem arising in the car industry. To better understand these filters, we study their behavior at stylized outlier patterns (for which they are not designed) and compare them to other approaches for the tracking problem. Finally, in a simulation study we discuss efficiency of our procedures in comparison to competitors.
Similar content being viewed by others
References
Anderson BDO, Moore JB (1990) Optimal control. Linear quadratic methods. Prentice Hall, New York
Anscombe FJ (1960) Rejection of outliers. Technometrics 2:123–147
Birmiwal K, Papantoni-Kazakos P (1994) Outlier resistant prediction for stationary processes. Stat Decis 12(4):395–427
Birmiwal K, Shen J (1993) Optimal robust filtering. Stat Decis 11(2):101–119
Boncelet CG Jr, Dickinson BW (1983) An approach to robust Kalman filtering. In: Proceedings on the 22nd IEEE conference on decision and control, vol 1, pp 304–305
Boncelet CG Jr, Dickinson BW (1987) An extension to the SRIF Kalman filter. IEEE Trans Autom Control AC-32:176–179
Cipra T, Hanzak T (2011) Exponential smoothing for a time series with outliers. Kybernetika 47:165–178
Cipra T, Romera R (1991) Robust Kalman filter and its application in time series analysis. Kybernetika 27(6):481–494
Donoho D, Johnstone I (1994) Ideal spatial adaptation via wavelet shrinkage. Biometrika 81:425–455
Ershov AA, Liptser RS (1978) Robust Kalman filter in discrete time. Automat Remote Control 39:359–367
Fox AJ (1972) Outliers in time series. J R Stat Soc B 34:350–363
Franke J (1985) Minimax-robust prediction of discrete time series. Z Wahrscheinlichkeitstheor Verw Geb 68:337–364
Franke J, Poor HV (1984) Minimax-robust filtering and finite-length robust predictors. In: Robust and nonlinear time series analysis. Proceedings of workshop, Heidelberg, Germany, 1983, Nr. 26 in Lecture notes in statistics. Springer, New York
Fried R, Schettlinger K (2010) robfilter: robust time series filters. R package version 2.6.1. Available on CRAN. http://cran.r-project.org/web/packages/robfilter
Fried R, Bernholt T, Gather U (2006) Repeated median and hybrid filters. Comput Stat Data Anal 50:2313–2338
Gelper S, Fried R, Croux C (2010) Robust forecasting with exponential and holt-winters smoothing. J Forecast 29:285–300
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2011) mvtnorm: multivariate normal and t distributions. R package version 0.9-9991. Available on CRAN. http://CRAN.R-project.org/package=mvtnorm
Hampel FR (1968) Contributions to the theory of robust estimation. Dissertation, University of California, Berkeley
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng Trans ASME 82:35–45
Kassam SA, Poor HV (1985) Robust techniques for signal processing: a survey. Proc IEEE 73(3):433–481
Künsch H (2001) State space models and hidden Markov models. In: Barndorff-Nielsen OE, Cox DR, Klüppelberg C (eds) Complex stochastic systems. Chapman and Hall, New York, pp 109–173
Martin RD, Yohai VJ (1986) Influence functionals for time series (with discussion). Ann Stat 14:781–818
Masreliez CJ, Martin R (1977) Robust Bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans Autom Control AC-22:361–371
R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Rieder H, Kohl M, Ruckdeschel P (2008) The costs of not knowing the radius. Stat Methods Appl 17(1):13–40
Ruckdeschel P (2000) Robust Kalman filtering. In: Hárdle W, Hlávka Z, Klinke S (eds) XploRe. Application guide, chap 18. Springer, New York, pp 483–516
Ruckdeschel P (2001) Ansätze zur Robustifizierung des Kalman-Filters, vol 64. Bayreuther Mathematische Schriften, Bayreuth
Ruckdeschel P (2010a) Optimally (distributional-)robust Kalman filtering. Available on arXiv: 1004.3393
Ruckdeschel P (2010b) Optimally robust Kalman filtering. Technical report 185, Fraunhofer ITWM Kaiserslautern, Kaiserslautern. http://www.itwm.fraunhofer.de/fileadmin/ITWM-Media/Zentral/Pdf/Berichte_ITWM/2010/bericht_185.pdf
Rudin W (1974) Real and complex analysis, 2nd edn. McGraw-Hill, New York
Schettlinger K (2009) Signal and variability extraction for online monitoring in intensive care. Dissertation, TU Dortmund, Dortmund. https://eldorado.tu-dortmund.de/bitstream/2003/26044/1/Dissertation-Schettlinger-Internetpublikation.pdf
Schick IC, Mitter SK (1994) Robust recursive estimation in the presence of heavy-tailed observation noise. Ann Stat 22(2):1045–1080
Shumway RH, Stoffer DS (1982) An approach to time series smoothing and forecasting using the EM algorithm. J Time Ser Anal 3:253–264
Spangl B (2008) On robust spectral density estimation. Dissertation, Department of Statistics and Probability Theory, Vienna University of Technology, Vienna
Stockinger N, Dutter R (1987) Robust time series analysis: a survey. Kybernetika 23 Supplement
Venables W, Ripley B (2002) Modern applied statistics with S-plus, 4th edn. Springer, New York
Wan EA, van der Merwe R (2002) The unscented Kalman filter. In: Haykin S (ed) Kalman filtering and neural networks. Wiley, New York
Acknowledgments
The authors thank two anonymous referees for their valuable and helpful comments. Financial support from VW foundation in the framework of project Robust Risk Estimation for D. Pupashenko is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Optimality of the classical Kalman filter
Optimality of the classical Kalman filter among all linear filters in \(L_2\)-sense and, under normality of the error and innovation distributions, among all measurable filters is a well-known fact, compare, e.g. (Anderson and Moore (1990), Sect. 5.2). As we will need some of the arguments later, let us complement this fact by some generalization to arbitrary norms generated by a quadratic form and by a thorough treatment of the case of singularities in the covariances arising in the definition of the Kalman gain from (3.5). To do so, we take the orthogonal decomposition of the Hilbert into closed linear subspaces as \( \mathrm{lin}(Y_{1:(t-1)})\oplus \mathrm{lin} (\Delta Y_{t})\) as granted and for \(X=X_t-X_{t|t-1}\) and \(Y=\Delta Y_t\) as given in (3.5), derive \(\hat{K}_t\). To this end, for any matrix A let us denote by \(A^-\) the generalized inverse of A with the defining properties
and for D a positive semi-definite symmetric matrix in \(\mathbb{R }^{p\times p}\) and for \(x\in \mathbb{R }^p\) define the semi-norm generated by D as \(\Vert x\Vert ^2_D:=x^\tau D^{-} x\).
Lemma 8.1
Let \(p,q \in \mathbb N , P\) some probability and \(X\in L_2^p(P), Y\in L^q_2(P), \text{ E} X=0, \text{ E} Y=0\), where for some \(Z\in \mathbb{R }^{q\times p}\), and some \(\varepsilon \in L_2^q(P)\) independent of \(X, Y=ZX+\varepsilon \). Let D a positive semi-definite symmetric matrix in \(\mathbb{R }^{p\times p}\). Then
solves
\(\hat{K}\) is unique up to addition of some \(A \in \mathbb{R }^{p\times q}\) such that \(A\mathop {\text{ Cov}}\nolimits Y =0\) and some \(B \in \mathbb{R }^{p\times q}\) such that \(DB=0\). If \(\hat{K} = D D^- \hat{K}, \hat{K}\) has smallest Frobenius norm among all solutions K to (8.3).
Proof
Denote \(L_2^p(P,D)\) the Hilbert space generated by all \(\mathbb{R }^p\) valued random variables U such that \(\text{ E}_P \Vert U\Vert _D^2 <\infty \)—after a passage to equivalence classes of all random variables \(U, U^{\prime }\) such that \(\text{ E}_P \Vert U-U^{\prime }\Vert _D^2 = 0\). Let \(S=\mathop {\text{ Cov}}\nolimits (X)\) and \(V=\mathop {\text{ Cov}}\nolimits (\varepsilon )\). Then \(\mathop {\text{ Cov}}\nolimits (X,Y)=SZ^\tau \) and \(\mathop {\text{ Cov}}\nolimits (Y)=ZSZ^\tau +V\). Denote the approximation space \(\{ KY\,\mid \, K\in \mathbb{R }^{p\times q}\}\subset L_2^p(P,D)\) by \(\mathcal{K }\). \(\mathcal{ K }\) is a closed linear subspace of \(L_2^p(P,D)\), hence by Rudin (1974)[Thm. 4.10] there exists a unique minimizer \(\hat{X} = \hat{K} Y \in \mathcal{K }\) to problem (8.3). It is characterized by
Plugging in \(K=D e_i \tilde{e}_j^\tau , \{e_i\}, \{\tilde{e}_j\}\) canonical bases of \(\mathbb{R }^p, \mathbb{R }^q\), respectively, we see that (8.4) is equivalent to
where \(\pi =D^-D\) is the orthogonal projector onto the column space of D. But \(y\in \mathbb{R }^q\) can only lie in \(\ker \mathop {\text{ Cov}}\nolimits (Y)\) if \(y\in \ker \mathop {\text{ Cov}}\nolimits (X,Y)\). Hence indeed \(\hat{K} \mathop {\text{ Cov}}\nolimits (Y)=\mathop {\text{ Cov}}\nolimits (X,Y)\), and the uniqueness assertion is obvious. We write \(\pi _D\) and \(\pi _{C}\) for the orthogonal projectors to the column spaces of D and \(\mathop {\text{ Cov}}\nolimits (Y)\), respectively and \(\bar{\pi }_{C}=\mathbb{I }_q-\pi _{C}, \bar{\pi }_D=\mathbb{I }_p-\pi _D\) for the corresponding complementary projectors. Then we see that \(\hat{K} = \hat{K} \pi _{C}\) and for any \(A \in \mathbb{R }^{p\times q}\) with \(A\mathop {\text{ Cov}}\nolimits Y =0\) we have \(A=A\bar{\pi }_{C}\) and for any \(B \in \mathbb{R }^{p\times q}\) with \(DB =0\) we have \(B=\bar{\pi }_{D} B\); hence
\(\square \)
1.2 Sketch of the optimality of the rLS.AO
The rLS filter is optimally-robust in some sense: To see this, in a first step we essentially boil down our SSM to (3.9), i.e., we have an unobservable but interesting state \(X\sim P^X(dx)\), where for technical reasons we assume that in the ideal model \(\text{ E} |X|^2 <\infty \). Instead of X, for some \(Z\in \mathbb{R }^{q\times p}\), we rather observe the sum \(Y=ZX+\varepsilon \) of X and a stochastically independent error \(\varepsilon \). As (wide-sense) AO model, we consider the SO outlier of (2.5), (2.6). The corresponding neighborhood is defined as
In this setting we may formulate two typical robust optimization problems, i.e., a minimax formulation, and, in the spirit of (Hampel (1968), Lemma 5), a formulation where robustness enters as side condition on the bias to be fulfilled on the whole neighborhood
Then one can show that setting \(D(y)=\text{ E}_\mathrm{\mathrm id}[X|Y=y]-\text{ E} X\), the solution to both problems is \(\hat{f}(y)=\text{ E} X +H_\rho (D(y))\) (with \(b=\rho /r\) in Problem (8.8)), and that this is just the (one-step) rLS, once \(\text{ E}_\mathrm{\mathrm id}[X|Y]\) is linear in Y. A proof to this assertion is given in (Ruckdeschel (2010a), Thm. 3.2).
Remark 8.2
-
(a)
As mentioned in Sect. 3.2, Cipra and Hanzak (2011) show an optimality similar to the one for Problem (8.8), and hence, non-surprisingly come up with a similar procedure.
-
(b)
The ACM filter by Masreliez and Martin (1977), an early competitor to the rLS, by analogy applies Huber’s (1964) minimax variance result to the “random location parameter X” setting of (3.9). They come up with redescenders as filter f. Hence the ACM filter is not so much vulnerable in the extreme tails but rather where the corresponding \(\psi \) function takes its maximum in absolute value. Care has to be taken, as such “inliers” producing the least favorable situation for the ACM are much harder to detect on naïve data inspection, in particular in higher dimensions.
-
(c)
For exact SO-optimality of the rLS-filter, linearity of the ideal conditional expectation is crucial. However, one can show that \(E_\mathrm{id}[\Delta X|\Delta Y]\) is linear iff \(\Delta X\) is normal, but, having used the rLS-filter in the \(\Delta X\)-past, normality cannot hold, see (Ruckdeschel (2010a), Prop.’s 3.4, 3.6).
-
(d)
Although rLS fails to be SO-optimal for \(t>1\), it does performs quite well at both simulations and real data. To some extent this can be explained by passing to a certain extension of the original SO-neighborhoods. For details see (Ruckdeschel (2010a), Thm. 3.10, Prop. 3.11).
1.3 Optimality of the rLS.IO
This section discusses (one-step) optimality of the rLS.IO in some detail. We omit time indices and write \(\Sigma \) for \(\Sigma _{t|t-1}\). To start, let us again look at the boiled down model (3.9) where we interchange the rôle of \(\varepsilon \) and X, and note that \(X-f(Y)=-(\varepsilon -g(Y))\) for \(f(Y)=Y-g(Y)\). Hence in this simple model, the optimal reconstruction of a corrupted X assuming that \(\varepsilon \) is still from the ideal distribution is just \(Y-g(Y), g(Y)\) the optimal reconstruction of \(\varepsilon \) in the same situation.
In notation, let us write \(\mathop { oP}(a|b)\) for the best linear reconstruction of a by means of b, i.e., the orthogonal projection of a onto the closed linear space generated by b.
Assuming linear conditional expectations and mutatis mutandis in (Ruckdeschel (2010a), Thm. 3.2), the optimally-robust reconstruction of \(\varepsilon \) given Y in the sense of Problems (8.7), (8.8) is just \(H_b(\mathop {oP}(\varepsilon |Y))\)—with the same caveats as to the optimality for larger time indices as in Remark 8.2. But again, \(\mathop {oP}(\varepsilon |Y)=\mathop {oP}(Y-X|Y)=Y-\mathop {oP}(X|Y)\) so the IO-optimal procedure \(f_\mathrm{IO}\) is
Details as to the translation of the contamination neighborhoods and exact formulations of the optimality results are given in Ruckdeschel (2010a).
The general setup with some arbitrary \(Z\in \mathbb{R }^{q\times p}\), where Z in general is not invertible, and moreover, even \(Z^\tau \Sigma Z\) may be singular, is not trivial, though. For instance, our preceding argument so far only covers reconstruction of ZX, but at this stage it is not obvious how to optimally derive a reconstruction of X from this. In particular, in this general case, there are directions which our (robustified) reconstruction cannot see—at least all directions in \(\ker Z\). So an unbounded criterion like MSE would play havoc once unbounded contamination happens in these directions. So in this context, the best we can do is optimally reconstructing ZX on the whole neighborhood generated by outliers in X and then, in a second step, for this best reconstruction of ZX, find the best back-transform to X in the ideal model setting. The question is how much we loose by this. To this end, note that
For \(Z^\Sigma \) from (3.12), we introduce the orthogonal projector onto the column space of \(Z^\tau \Sigma Z\) and its orthogonal complement as
Then we have the following Lemma:
Lemma 8.3
-
(a)
For any positive definite \(D, Z^\Sigma \) from (3.12) solves
$$\begin{aligned} \text{ E}_\mathrm{id} \Vert X- A \mathop {oP}(Z X|Y)\Vert _D^2=\;\min {}!,\quad A\in \mathbb{R }^{p\times q}. \end{aligned}$$(8.12) -
(b)
\(\Sigma Z^\tau \bar{\pi }_{Z,\Sigma } = 0\); in particular, no matter of the rank of Z or \(\pi _{Z,\Sigma }\), with \(K=\Sigma Z^\tau C^-\),
$$\begin{aligned} Z^{\Sigma } Z K = K. \end{aligned}$$(8.13)
Proof
(a) As in Lemma 8.1, we see that
Abbreviating \(Z \Sigma Z^\tau \) by B and \(\mathop {\text{ Cov}}\nolimits (Y)\) by C, this gives \( \hat{A}=\Sigma Z^\tau C^- B(BC^-B)^- \), and with \(\Sigma _{.5}\) the symmetric root of \(\Sigma \), and with \(G=\Sigma _{.5}Z^\tau \), this becomes \(\hat{A}=\Sigma _{.5} G C^- G^\tau G(G^\tau G C^-G^\tau G)^-\). Next we pass to the singular value decomposition of \(G=USW^\tau \), with \(U, W\) corresponding orthogonal matrices in \(\mathbb{R }^{p\times p}\) and \(\mathbb{R }^{q\times q}\), respectively, and \(S\in \mathbb{R }^{p\times q}\) a matrix with the singular values on the “diagonal entries” \(S_{i,i}, i=1,\ldots ,\min (p,q)\) and \(S_{i,j}=0, i\not =j\); furthermore, \(S_{i,i}>0\) for \(i=1,\ldots ,d\), \(d\le \min (p,q)\) and 0 else. Using \((aba^\tau )^-=(a^\tau )^{-1}b^- a^{-1}\) for a invertible and setting \(T=S^\tau S \), we obtain
As the expressions of the symmetric matrices \(W^\tau C^- W\) are surrounded by S (resp. T-)-terms, we may replace them with a matrix \(R\in \mathbb{R }^{q\times q}\) with only entries in the upper \(d\times d\) block, i.e., \(\hat{A}=\Sigma _{.5} US R T (T R T)^- W^\tau \) and as R now is compatible with S and T,
Now, as \(C=WTW^\tau + V, W^\tau C^- W=(T+ W^\tau V W)^-\), in particular the upper \(d\times d\) block \(R_d\) of \(R=1_d (T+ W^\tau V W)^- 1_d\) is invertible and
(b) We start by noting that \(\Sigma Z^\tau \bar{\pi }_{Z,\Sigma } = \Sigma _{.5} USW^\tau W (\mathbb{I }_q-1_d) W^\tau =0 \). For (8.13), we write \( K=\Sigma Z^\tau (\pi _{Z,\Sigma }+\bar{\pi }_{Z,\Sigma }) C^- = \Sigma Z^\tau \pi _{Z,\Sigma } C^- = Z^\Sigma Z K \). \(\square \)
As a consequence of assertion (b) in the preceding Lemma, we obtain
Corollary 8.4
No matter of the rank of Z or \(\pi _{Z,\Sigma }\),
that is, we can exactly recover \(\mathop {oP}(X|Y)\) from \(\mathop {oP}(\varepsilon |Y)\), and passing over the reconstruction of ZX first does not cost us anything in efficiency compared to the direct route.
Proof
We only note that \(Y -\mathop {oP}(\varepsilon |Y)=\mathop { oP}(ZX |Y) = Z K Y\).
To keep things well-defined in this setting where we have “invisible directions” in the state, we may recur to passing to a semi-norm in X-space which ignores such directions. A possible candidate for \(D\) in Lemma 8.1 is
On the one hand, as we show below, invisible directions get ignored, on the other hand, by (8.13), no direction visible for the classically optimal procedure is lost. \(\square \)
Proposition 8.5
Using D from (8.16) and assuming observation errors from the ideal situation, maximal MSE error for rLS.IO measured in this norm remains bounded for IO contamination. With this norm, \(\hat{K}\) is smallest possible solution to (8.3) in Frobenius norm.
Proof
The error term \(e=X-\hat{X}\) for the rLS.IO can be written as
As \((Z^\Sigma Z)^2 = Z^\Sigma Z\), we see that \((\mathbb{I }_p-Z^\Sigma Z)(Z^\Sigma Z)=0\), so that in D-semi-norm, the \((\mathbb{I }_p-Z^\Sigma Z)X\) terms cancel out and we get
so MSE is bounded by \(2 \mathop { tr{}}(B^- (V + b^2 \mathbb{I }_q))\). The second assertion is an immediate consequence of Lemmas 8.1 and 8.3(b). \(\square \)
Note that changing the norm in the Y-space is not necessary for boundedness reasons, as with only ideally distributed \(\varepsilon \), the reconstruction of ZX can be achieved such that no matter how largely \(\Delta X\) is contaminated, the maximal MSE remains bounded.
Rights and permissions
About this article
Cite this article
Ruckdeschel, P., Spangl, B. & Pupashenko, D. Robust Kalman tracking and smoothing with propagating and non-propagating outliers. Stat Papers 55, 93–123 (2014). https://doi.org/10.1007/s00362-012-0496-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-012-0496-4