Machine Learning

, Volume 108, Issue 12, pp 2113–2157 | Cite as

The kernel Kalman rule

Efficient nonparametric inference by recursive least-squares and subspace projections
  • Gregor H. W. GebhardtEmail author
  • Andras Kupcsik
  • Gerhard Neumann


Enabling robots to act in unstructured and unknown environments requires versatile state estimation techniques. While traditional state estimation methods require known models and make strong assumptions about the dynamics, such versatile techniques should be able to deal with high dimensional observations and non-linear, unknown system dynamics. The recent framework for nonparametric inference allows to perform inference on arbitrary probability distributions. High-dimensional embeddings of distributions into reproducing kernel Hilbert spaces are manipulated by kernelized inference rules, most prominently the kernel Bayes’ rule (KBR). However, the computational demands of the KBR do not scale with the number of samples. In this paper, we present two techniques to increase the computational efficiency of non-parametric inference. First, the kernel Kalman rule (KKR) is presented as an approximate alternative to the KBR that estimates the embedding of the state based on a recursive least squares objective. Based on the KKR we present the kernel Kalman filter (KKF) that updates an embedding of the belief state and learns the system and observation models from data. We further derive the kernel forward backward smoother (KFBS) based on a forward and backward KKF and a smoothing update in Hilbert space. Second, we present the subspace conditional embedding operator as a sparsification technique that still leverages from the full data set. We apply this sparsification to the KKR and derive the corresponding sparse KKF and KFBS algorithms. We show on nonlinear state estimation tasks that our approaches provide a significantly improved estimation accuracy while the computational demands are considerably decreased.


Nonparametric inference Kernel methods State estimation Model learning 



  1. Agresti, A. (2002). Deriving standard errors with the delta method. In Categorical data analysis, Wiley series in probability and statistics (p. 73ff). New York: Wiley.Google Scholar
  2. Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3), 337–404.MathSciNetCrossRefGoogle Scholar
  3. Boots, B., Gretton, A., & Gordon, G. J. (2013). Hilbert space embeddings of predictive state representations. In Proceedings of the 29th international conference on uncertainty in artificial intelligence (UAI).Google Scholar
  4. Chen, Y., Welling, M., & Smola, A. (2010). Super-samples from kernel herding. In Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence (UAI). AUAI Press.Google Scholar
  5. Csató, L., & Opper, M. (2002). Sparse on-line Gaussian processes. Neural Computation, 14(3), 641–668.CrossRefGoogle Scholar
  6. Drineas, P., & Mahoney, M. W. (2005). On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6, 23.zbMATHGoogle Scholar
  7. Fukumizu, K., Song, L., & Gretton, A. (2013). Kernel Bayes’ rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research, 14(1), 3683–3719.MathSciNetzbMATHGoogle Scholar
  8. Gauss, C. F. (1823). Theoria combinationis observationum erroribus minimis obnoxiae. Göttingen: H. Dieterich.Google Scholar
  9. Gebhardt, G. H. W., Kupcsik, A., & Neumann, G. (2015). Learning subspace conditional embedding operators. Workshop on large-scale kernel learning at ICML 2015.Google Scholar
  10. Gebhardt, G. H. W., Kupcsik, A., & Neumann, G. (2017). The kernel kalman rule—Efficient nonparametric inference with recursive least squares. In AAAI conference on artificial intelligence.Google Scholar
  11. Gomez-Gonzalez, S., Neumann, G., Schölkopf, B., & Peters, J. (2016). Using probabilistic movement primitives for striking movements. In IEEE-RAS international conference on humanoid robots (pp. 502–508).Google Scholar
  12. Grünewälder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., & Pontil, M. (2012). Conditional mean embeddings as regressors. In Proceedings of the 29th international conference on machine learning.Google Scholar
  13. Hansen, N. (2006). The CMA evolution strategy: A comparing review. In J. A. Lozano, P. Larrañaga, I. Inza, & E. Bengoetxea (Eds.), Towards a new evolutionary computation. Studies in fuzziness and soft computing (Vol. 192). Berlin, Heidelberg: Springer.Google Scholar
  14. Hsu, D., Kakade, S. M., & Zhang, T. (2012). A spectral algorithm for learning hidden Markov models. Journal of Computer and System Sciences, 78(5), 1460–1480.MathSciNetCrossRefGoogle Scholar
  15. Jaakkola, T., Diekhans, M., & Haussler, D. (1999). Using the Fisher kernel method to detect remote protein homologies. In Proceedings of the international conference on intelligent systems for molecular biology. ISMB.Google Scholar
  16. Jaeger, H. (2000). Observable operator models for discrete stochastic time series. Neural Computation, 12(6), 1371–1398.CrossRefGoogle Scholar
  17. Julier, S. J., & Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In International symposium on aerospace/defense sensing, simulation, and controls.Google Scholar
  18. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1), 35–45.MathSciNetGoogle Scholar
  19. Kawahara, Y., Yairi, T., & Machida, K. (2007). A kernel subspace method by stochastic realization for learning nonlinear dynamical systems. Advances in Neural Information Processing Systems, 19, 665–672.Google Scholar
  20. Mccalman, L., O ’callaghan, S., & Ramos, F. (2013). Multi-modal estimation with kernel embeddings for learning motion models. In 2013 IEEE international conference on robotics and automation (ICRA) (pp. 2845–2852). Karlsruhe: IEEE.Google Scholar
  21. McElhoe, B. A. (1966). An assessment of the navigation and course corrections for a manned flyby of mars or venus. In IEEE transactions on aerospace and electronic systems AES-2.Google Scholar
  22. Nishiyama, Y., Afsharinejad, A., Naruse, S., Boots, B., & Song, L. (2016). The nonparametric kernel Bayes smoother. International Conference on Artificial Intelligence and Statistics, 41, 547–555.Google Scholar
  23. Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In Advances in neural information processing systems (pp. 1–8).Google Scholar
  24. Ralaivola, L., & d’Alche Buc, F. (2005). Time series filtering, smoothing and learning using the kernel Kalman filter. In Proceedings 2005 IEEE international joint conference on neural networks (Vol. 3, pp. 1449–1454).Google Scholar
  25. Schölkopf, B., Herbrich, R., Smola, A. J. (2001). A generalized representer theorem. In: D. Helmbold & B. Williamson (Eds.), Computational learning theory (COLT), 2001. Lecture Notes in Computer Science (Vol. 2111). Berlin, Heidelberg: Springer.Google Scholar
  26. Shannon, M., Zen, H., & Byrne, W. (2013). Autoregressive models for statistical parametric speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, 21(3), 587–597.CrossRefGoogle Scholar
  27. Simon, D. (2006). Optimal state estimation. Hoboken, NJ, USA: Wiley.CrossRefGoogle Scholar
  28. Smith, G. L., Schmidt, S. F., & McGee, L. A. (1962). Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle. Washington D.C.: National Aeronautics and Space Administration.Google Scholar
  29. Smola, A. J., & Bartlett, P. P. (2001). Sparse greedy Gaussian process regression. Advances in Neural Information Processing Systems, 13(13), 619–625.Google Scholar
  30. Smola, A. J., Gretton, A., Song, L., & Schölkopf, B. (2007). A Hilbert space embedding for distributions. In Proceedings of the 18th international conference on algorithmic learning theory, lecture notes in computer science (Vol. 4754, pp. 13–31). Berlin: Springer.Google Scholar
  31. Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. Y. Weiss, B. Schölkopf, & J. C. Platt (Eds.), Advances in neural information processing systems (Vol. 18, pp. 1257–1264). MIT Press.Google Scholar
  32. Song, L., Boots, B., Siddiqi, S. M., Gordon, G. J., & Smola, A. J. (2010). Hilbert space embeddings of hidden Markov models. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 991–998).Google Scholar
  33. Song, L., Huang, J., Smola, A., & Fukumizu, K. (2009). Hilbert space embeddings of conditional distributions with applications to dynamical systems. In Proceedings of the 26th annual international conference on machine learning—ICML’09 (pp. 1–8). New York, USA: ACM Press.Google Scholar
  34. Song, L., Fukumizu, K., & Gretton, A. (2013). Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models. IEEE Signal Processing Magazine, 30(4), 98–111.CrossRefGoogle Scholar
  35. Sorenson, H. W. (1970). Least-squares estimation: From Gauss to Kalman. IEEE Spectrum, 7(7), 63–68.CrossRefGoogle Scholar
  36. Sun, W., Capobianco, R., Gordon, G. J., Bagnell, J. A., & Boots, B. (2016a). Learning to smooth with bidirectional predictive state inference machines. In The conference on uncertainty in artificial intelligence (UAI 2016).Google Scholar
  37. Sun, W., Venkatraman, A., Boots, B., & Bagnell, J. A. (2016b). Learning to filter with predictive state inference machines. In International conference on machine learning (pp. 1197–1205).Google Scholar
  38. Wan, E. A. E., & Van Der Merwe, R. (2000). The unscented Kalman filter for nonlinear estimation. In Proceedings of the IEEE 2000 adaptive systems for signal processing, communications, and control symposium (pp. 153–158). IEEE.Google Scholar
  39. Williams, C. K. I., & Seeger, M. (2000). Using the Nyström method to speed up Kernel machines. In Proceedings of the 13th international conference on neural information (pp. 661–667).Google Scholar
  40. Wojtusch, J., & von Stryk, O. (2015). Humod—A versatile and open database for the investigation, modeling and simulation of human motion dynamics on actuation level. In IEEE-RAS international conference on humanoid robots (humanoids). IEEE.Google Scholar
  41. Zhu, P., Chen, B., & Principe, J. C. (2014). Learning nonlinear generative models of time series with a Kalman filter in RKHS. IEEE Transactions on Signal Processing, 62(1), 141–155.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Technische Universität Darmstadt, Computational Learning for Autonomous SystemsDarmstadtGermany
  2. 2.Bosch Center for Artificial IntelligenceRenningenGermany
  3. 3.University of Lincoln, Lincoln Centre for Autonomous SystemsLincolnUK

Personalised recommendations