Informatics in Control, Automation and Robotics 12th International Conference, ICINCO 2015 Colmar, France, July 21-23, 2015 Revised Selected Papers pp 355-370 | Cite as

# Bayesian Quadrature Variance in Sigma-Point Filtering

## Abstract

Sigma-point filters are algorithms for recursive state estimation of the stochastic dynamic systems from noisy measurements, which rely on moment integral approximations by means of various numerical quadrature rules. In practice, however, it is hardly guaranteed that the system dynamics or measurement functions will meet the restrictive requirements of the classical quadratures, which inevitably results in approximation errors that are not accounted for in the current state-of-the-art sigma-point filters. We propose a method for incorporating information about the integral approximation error into the filtering algorithm by exploiting features of a Bayesian quadrature—an alternative to classical numerical integration. This is enabled by the fact that the Bayesian quadrature treats numerical integration as a statistical estimation problem, where the posterior distribution over the values of the integral serves as a model of numerical error. We demonstrate superior performance of the proposed filters on a simple univariate benchmarking example.

### Keywords

Nonlinear filtering Sigma-point filter Gaussian filter Integral variance Bayesian quadrature Gaussian process## 1 Introduction

Dynamic systems are widely used to model behaviour of real processes throughout the sciences. In many cases, it is useful to define a state of the system and consequently work with a state-space representation of the dynamics. When the dynamics exhibits stochasticity or can only be observed indirectly, we are faced with the problem of state estimation. Estimating a state of the dynamic system from noisy measurements is a prevalent problem in many application areas such as aircraft guidance, GPS navigation [12], weather forecast [9], telecommunications [14] and time series analysis [2]. When the state estimator is required to produce an estimate using only the present and past measurements, this is known as the filtering problem.

For a discrete-time linear Gaussian systems, the best estimator in the mean-square-error sense is the much-celebrated Kalman filter (KF) [16]. First attempts to deal with the estimation of nonlinear dynamics can be traced to the work of [29], which resulted in the extended Kalman filter (EKF). The EKF algorithm uses the Taylor series expansion to approximate the nonlinearities in the system description. A disadvantage of the Taylor series is that it requires differentiability of the approximated functions. This prompted further development [20, 28] resulting in the derivative-free filters based on the Stirling’s interpolation formula. Other approaches that approximate nonlinearities include the Fourier-Hermite KF [27], special case of which is the statistically linearized filter [7, 18].

Instead of explicitly dealing with nonlinearities in the system description, the unscented Kalman filter (UKF) [15] describes the densities by a finite set of deterministically chosen sigma-points, which are then propagated through the nonlinearity. Other filters, such as the Gauss-Hermite Kalman filter (GHKF) [13], the cubature Kalman filter (CKF) [1] and the stochastic integration filter [6], utilize numerical quadrature rules to approximate moments of the relevant densities. These filters can be seen as representatives of a more general sigma-point methodology.

A limitation of classical integral approximations, such as the Gauss-Hermite quadrature (GHQ), is that they are specifically designed to perform with zero error on a narrow class of functions (typically polynomials up to a given degree). It is also possible to design rules, that have best average-case performance on a wider range of functions at the cost of permitting small non-zero error [19]. In recent years, the Bayesian quadrature (BQ) has become a focus of interest in probabilistic numerics community [22]. The BQ treats numerical integration as a problem of Bayesian inference and thus it is able to provide an additional information—namely, uncertainty in the computation of the integral itself. In [26], the authors work with the concept of BQ, but the algorithms derived therein do not make use of the uncertainty in the integral computations. The goal of this paper is to augment the current sigma-point algorithms so that the uncertainty associated with the integral approximations is also reflected in their estimates.

The rest of the paper is organized as follows. Formal definition of the Gaussian filtering problem is outlined in Sect. 2, followed by the exposition of the basic idea of Bayesian quadrature in Sect. 3. The main contribution, which is the design of the Bayes-Hermite Kalman filter (BHKF), is presented in Sect. 4. Finally, comparison of the BHKF with existing filters is made in Sect. 5.

## 2 Problem Formulation

*N*function values \(\mathbf {g}(\mathbf {x}^{(i)})\). Conversely, this means that any quadrature is uncertain about the true function values in between the sigma-points. The importance of quantifying this uncertainty becomes particularly pronounced, when the function is not integrated exactly due to the inherent design limitations of the quadrature (such as the choice of weights and sigma-points). All sigma-point filters thus operate with the uncertainty, which is not accounted for in their estimates. The classical treatment of the quadrature does not lend itself nicely to the quantification of the uncertainty associated with a given rule. On the other hand, the Bayesian quadrature, which treats the integral approximation as a problem in Bayesian inference, is perfectly suited for this task.

The idea of using Bayesian quadrature in the state estimation algorithms was already treated in [26]. The derived filters and smoothers, however, do not utilize the full potential of the Bayesian quadrature. Namely, the integral variance is not reflected in their estimates. In this article, we aim to remedy this issue by making use of familiar expressions for GP prediction at uncertain inputs [3, 10].

## 3 Gaussian Process Priors and Bayesian Quadrature

In this section, we introduce the key concepts of Gaussian process priors and Bayesian quadrature, which are crucial to the derivation of the filtering algorithm in Sect. 4.

### 3.1 Gaussian Process Priors

*p*(

*g*) with the data, \( \mathcal {D} \,=\, \left\{ \left( \mathbf {x}_i, g(\mathbf {x}_i)\right) , i = 1, \ldots , N \right\} \) comprising the evaluation points \(\mathbf {X} \,=\, [\mathbf {x}_1, \ldots , \mathbf {x}_N]\) and the function evaluations \( \mathbf {y}_g \,=\, \left[ \, g(\mathbf {x}_1), \ldots , g(\mathbf {x}_N) \,\right] ^\top \), to produce a GP posterior \(p(g \!\mid \! \mathcal {D})\) with moments given by [23]

### 3.2 Bayesian Quadrature

*g*(

*x*). A posteriori integral variance is [24]

## 4 Bayes-Hermite Kalman Filter

In this section, we show how the integral variance can be incorporated into the moment estimates of the transformed random variable. Parallels are drawn with existing GP-based filters and the Bayes-Hermite Kalman filter algorithm is outlined.

### 4.1 Incorporating Integral Uncertainty

*g*. Note that the equations (17), (18) can only be used to model single output dimension of the vector function \( \mathbf {g} \). For now, we will assume a scalar function

*g*unless otherwise stated. To keep the notation uncluttered, conditioning on \( \mathcal {D} \) will be omitted. Treating the function values \(g(\mathbf {x})\) as random leads to the joint density \(p(g,\mathbf {x})\) and thus, when computing the moments of \(g(\mathbf {x})\), the expectations need to be taken with respect to both variables. This results in the following approximation of the true moments

*integrand*and variance of the

*integral*respectively. In case of deterministic

*g*, both of these terms are zero. With EQ covariance (22), the expression (28) for the first moment of a transformed random variable takes on the form (23).

### 4.2 BHKF Algorithm

The filtering algorithm based on the BQ can now be constructed utilizing (31) and (32). The BHKF uses two GPs with the EQ covariance—one for each function in the state-space model (1) and (2), which means that the two sets of hyper-parameters are used; \( {{\varvec{\theta }}}_\mathrm {f} \) and \( {{\varvec{\theta }}}_\mathrm {h} \). In the algorithm specification below, the lower index of \( \mathbf {q} \) and \( \mathbf {K} \) specifies the set of hyper-parameters used to compute these quantities.

**Algorithm 1 (Bayes-Hermite Kalman Filter)**

In the following, let system initial conditions \(\mathbf {x}_{0|0} \sim \mathcal {N}\left( \mathbf {m}_{0|0},\, \mathbf {P}_{0|0}\right) \), the sigma-point index \( i \,=\, 1, \ldots , N \) and time step index \( k \,=\, 1,2,\ldots \, .\)

*Initialization:*

Choose unit sigma-points \( {{\varvec{\xi }}}^{(i)}\). Set hyper-parameters \( {{\varvec{\theta }}}_\mathrm {f}\) and \( {{\varvec{\theta }}}_\mathrm {h}\). For all time steps *k*, proceed from the initial conditions \( \mathbf {x}_{0|0} \), by alternating between the following prediction and filtering steps.

*Prediction:*

- 1.
Form the sigma-points \( \mathbf {x}^{(i)}_{k-1} = \mathbf {m}^\mathrm {x}_{k-1|k-1} + \sqrt{\mathbf {P}^\mathrm {x}_{k-1|k-1}}\, {{\varvec{\xi }}}^{(i)} \).

- 2.
Propagate sigma-points through the dynamics model \( \mathbf {x}^{(i)}_{k} \,=\, \mathbf {f}\big (\mathbf {x}^{(i)}_{k-1}\big )\), and form \(\mathbf {F}\) as in (11)–(13).

- 3.
Using \(\mathbf {\xi }^{(i)}_{k}\) and hyper-parameters \( {{\varvec{\theta }}}_\mathrm {f}\), compute weights \(\mathbf {w}^\mathrm {x} \) and \( \mathbf {W}^\mathrm {x}\) according to (33) and (34) (with \(m=0\), and \(P=I\)).

- 4.Compute predictive mean \(\mathbf {m}^\mathrm {x}_{k|k-1}\) and predictive covariance \(\mathbf {P}^\mathrm {x}_{k|k-1}\)$$\begin{aligned} \mathbf {m}^\mathrm {x}_{k|k-1}&\,=\, \mathbf {F}^\top \mathbf {w}^\mathrm {x} , \\ \mathbf {P}^\mathrm {x}_{k|k-1}&\,=\, \mathbf {F}^\top \mathbf {W}^\mathrm {x}\mathbf {F} - \mathbf {m}^\mathrm {x}_{k|k-1}\big (\mathbf {m}^\mathrm {x}_{k|k-1}\big )^\top \\&\quad + \mathrm {diag}\big (\alpha ^2 - \mathrm {tr}\big (\mathbf {K}_\mathrm {f}^{-1}\mathbf {L}\big )\big ) + \mathbf {Q}. \end{aligned}$$

*Filtering:*

- 1.
Form the sigma-points \( \mathbf {x}^{(i)}_k \,=\, \mathbf {m}^\mathrm {x}_{k|k-1} + \sqrt{\mathbf {P}^\mathrm {x}_{k|k-1}}\, {{\varvec{\xi }}}^{(i)} \).

- 2.
Propagate the sigma-points through the measurement model \( \mathbf {z}^{(i)}_{k} \,=\, \mathbf {h}\big ( \mathbf {x}^{(i)}_{k} \big )\), and form \(\mathbf {H}\) as in (11)–(13)

- 3.
Using \(\mathbf {\xi }^{(i)}_{k}\) and hyper-parameters \( {{\varvec{\theta }}}_\mathrm {h}\), compute weights \(\mathbf {w}^\mathrm {z} \) and \( \mathbf {W}^\mathrm {z}\) according to (33) and (35) (with \(m=0\), and \(P=I\)) and \(\mathbf {W}^\mathrm {xz} = \mathrm {diag}\big (\, \mathbf {l}_\mathrm {h} \,\big )\mathbf {K}_\mathrm {h}^{-1}\).

- 4.Compute measurement mean, covariance and state-measurement cross-covariancewhere the$$\begin{aligned} \mathbf {m}^\mathrm {z}_{k|k-1}&\,=\, \mathbf {H}^\top \mathbf {w}^\mathrm {z} ,\\ \mathbf {P}^\mathrm {z}_{k|k-1}&\,=\, \mathbf {H}^\top \mathbf {W}^\mathrm {z}\mathbf {H} - \mathbf {m}^\mathrm {z}_{k|k}\big (\mathbf {m}^\mathrm {z}_{k|k}\big )^\top \\&\quad + \mathrm {diag}\left( \alpha ^2 - \mathrm {tr}\left( \mathbf {K}_\mathrm {h}^{-1}\mathbf {L}\right) \right) + \mathbf {R} , \\ \mathbf {P}^\mathrm {xz}_{k|k-1}&\,=\, \mathbf {P}^\mathrm {x}_{k|k-1}\big (\mathbf {P}^\mathrm {x}_{k|k-1}+ {{\varvec{\Lambda }}}\big )^{-1}\tilde{\mathbf {X}}\mathbf {W}^\mathrm {xz}\mathbf {H} , \end{aligned}$$
*i*-th row of \(\tilde{\mathbf {X}}\) is \( \mathbf {x}^{(i)}_{k} - \mathbf {m}^\mathrm {x}_{k|k-1} \) - 5.Compute the filtered mean \( \mathbf {m}^\mathrm {x}_{k|k} \) and filtered covariance \( \mathbf {P}^\mathrm {x}_{k|k} \)with Kalman gain \( \mathbf {K}_k = \mathbf {P}^\mathrm {xz}_{k|k-1}\big (\mathbf {P}^\mathrm {z}_{k|k-1}\big )^{-1} \).$$\begin{aligned} \mathbf {m}^\mathrm {x}_{k|k}&\,=\, \mathbf {m}^\mathrm {x}_{k|k-1} \,+\, \mathbf {K}_k\big (\mathbf {z}_k \,-\, \mathbf {m}^\mathrm {z}_{k|k-1}\big ) , \\ \mathbf {P}^\mathrm {x}_{k|k}&\,=\, \mathbf {P}^\mathrm {x}_{k|k-1} \,-\, \mathbf {K}_k\mathbf {P}^\mathrm {z}_{k|k-1}\mathbf {K}_k^\top , \end{aligned}$$

## 5 Numerical Illustration

*r*-th order Gauss-Hermite (GH) quadrature rule uses sigma-points, which are determined as the roots of the

*r*-th degree univariate Hermite polynomial \(H_r(x)\). When it is required to integrate function of a vector argument (\( n>1 \)), a multidimensional grid of points is formed by the Cartesian product, leading to their exponential growth (\(N = r^n\)). The GH weights are computed according to [25] as

*n*sigma-points given by

The average root-mean-square error

Sigma-pts. | N | Bayesian | Classical |
---|---|---|---|

SR | 2 | 6.157 ± 0.071 | 13.652 ± 0.253 |

UT | 3 | 7.124 ± 0.131 | 7.103 ± 0.130 |

GH-5 | 5 | 8.371 ± 0.128 | 10.466 ± 0.198 |

GH-7 | 7 | 8.360 ± 0.043 | 9.919 ± 0.215 |

GH-10 | 10 | 7.082 ± 0.098 | 8.035 ± 0.193 |

GH-15 | 15 | 6.944 ± 0.048 | 8.224 ± 0.188 |

GH-20 | 20 | 6.601 ± 0.058 | 7.406 ± 0.193 |

The average negative log-likelihood

Sigma-pts. | N | Bayesian | Classical |
---|---|---|---|

SR | 2 | 3.328 ± 0.026 | 56.570 ± 2.728 |

UT | 3 | 4.970 ± 0.343 | 5.306 ± 0.481 |

GH-5 | 5 | 4.088 ± 0.064 | 14.722 ± 0.829 |

GH-7 | 7 | 4.045 ± 0.017 | 12.395 ± 0.855 |

GH-10 | 10 | 3.530 ± 0.012 | 7.565 ± 0.534 |

GH-15 | 15 | 3.468 ± 0.014 | 7.142 ± 0.557 |

GH-20 | 20 | 3.378 ± 0.017 | 5.664 ± 0.488 |

The average non-credibility index

Sigma-pts. | N | Bayesian | Classical |
---|---|---|---|

SR | 2 | 1.265 ± 0.010 | 18.585 ± 0.045 |

UT | 3 | 0.363 ± 0.108 | 0.897 ± 0.088 |

GH-5 | 5 | 4.549 ± 0.013 | 9.679 ± 0.068 |

GH-7 | 7 | 4.368 ± 0.006 | 8.409 ± 0.076 |

GH-10 | 10 | 2.520 ± 0.006 | 5.315 ± 0.058 |

GH-15 | 15 | 2.331 ± 0.008 | 5.424 ± 0.059 |

GH-20 | 20 | 1.654 ± 0.007 | 4.105 ± 0.055 |

## 6 Conclusions

In this paper, we proposed a way of utilizing uncertainty associated with integral approximations in the nonlinear sigma-point filtering algorithms. This was enabled by the Bayesian treatment of the quadrature as well as by making use of the previously derived results for the GP prediction at uncertain inputs.

The proposed Bayesian quadrature based filtering algorithms were tested on a univariate benchmarking example. The results show that the filters utilizing additional quadrature uncertainty show significant improvement in terms of estimate credibility and overall model fit.

We also showed, that proper setting of the hyper-parameters is crucially important for achieving competitive results. Further research should be concerned with development of principled approaches for dealing with the kernel hyper-parameters. Freedom of choice of the sigma-points in the BQ offers a good opportunity for developing adaptive sigma-point placement techniques.

## Notes

### Acknowledgments

This work was supported by the Czech Science Foundation, project no. GACR P103-13-07058J.

### References

- 1.Arasaratnam, I., Haykin, S.: Cubature kalman filters. IEEE Trans. Autom. Control
**54**(6), 1254–1269 (2009)MathSciNetCrossRefMATHGoogle Scholar - 2.Bhar, R.: Stochastic filtering with applications in finance. World Scientific (2010)Google Scholar
- 3.Deisenroth, M. P., Huber, M. F., Hanebeck, U. D.: Analytic moment-based Gaussian process filtering. In: Proceedings of the 26th Annual International Conference on Machine Learning—ICML ’09, pp. 1–8. ACM Press (2009)Google Scholar
- 4.Deisenroth, M. P., Ohlsson, H.: A general perspective on gaussian filtering and smoothing: explaining current and deriving new algorithms. In: IEEE (June 2011) American Control Conference (ACC), pp. 1807–1812 (2011)Google Scholar
- 5.Deisenroth, M.P., Turner, R.D., Huber, M.F., Hanebeck, U.D., Rasmussen, C.E.: Robust filtering and smoothing with gaussian processes. IEEE Trans. Autom. Control
**57**(7), 1865–1871 (2012)MathSciNetCrossRefGoogle Scholar - 6.Duník, J., Straka, O., Šimandl, M.: Stochastic integration filter. IEEE Trans. Autom. Control
**58**(6), 1561–1566 (2013)MathSciNetCrossRefGoogle Scholar - 7.Gelb, A.: Applied Optimal Estimation. The MIT Press (1974)Google Scholar
- 8.Gelman, A.: Bayesian Data Analysis. Chapman and Hall/CRC, 3rd edn (2013)Google Scholar
- 9.Gillijns, S., Mendoza, O., Chandrasekar, J., De Moor, B., Bernstein, D., Ridley, A.: What is the ensemble kalman filter and how well does it work? In: American Control Conference, 2006. p. 6 (2006)Google Scholar
- 10.Girard, A., Rasmussen, C. E., Quiñonero Candela, J., Murray-Smith, R.: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 545–552. MIT Press (2003)Google Scholar
- 11.Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-gaussian bayesian state estimation. IEE Proceedings F (Radar and Signal Processing)
**140**(2), 107–113 (1993)CrossRefGoogle Scholar - 12.Grewal, M. S., Weill, L. R., Andrews, A. P.: Global Positioning Systems, Inertial Navigation, and Integration. Wiley (2007)Google Scholar
- 13.Ito, K., Xiong, K.: Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Control
**45**(5), 910–927 (2000)MathSciNetCrossRefMATHGoogle Scholar - 14.Jiang, T., Sidiropoulos, N., Giannakis, G.: Kalman filtering for power estimation in mobile communications. IEEE Trans. Wireless Commun.
**2**(1), 151–161 (2003)CrossRefGoogle Scholar - 15.Julier, S.J., Uhlmann, J.K., Durrant-Whyte, H.F.: A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Autom. Control
**45**(3), 477–482 (2000)MathSciNetCrossRefMATHGoogle Scholar - 16.Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng.
**82**(1), 35–45 (1960)CrossRefGoogle Scholar - 17.Li, X. R., Zhao, Z.: Measuring estimator’s credibility: noncredibility index. In: 2006 9th International Conference on Information Fusion, pp. 1–8 (2006)Google Scholar
- 18.Maybeck, P. S.: Stochastic Models, Estimation and Control: Volume 2. Academic Press (1982)Google Scholar
- 19.Minka, T.P.: Deriving Quadrature Rules from Gaussian Processes. Tech. rep., Statistics Department, Carnegie Mellon University, Tech. Rep (2000)Google Scholar
- 20.Nørgaard, M., Poulsen, N.K., Ravn, O.: New developments in state estimation for nonlinear systems. Automatica
**36**, 1627–1638 (2000)MathSciNetCrossRefMATHGoogle Scholar - 21.O’Hagan, A.: Bayes-Hermite quadrature. J. Stat. Plann. Infer.
**29**(3), 245–260 (1991)MathSciNetCrossRefMATHGoogle Scholar - 22.Osborne, M. A., Rasmussen, C. E., Duvenaud, D. K., Garnett, R., Roberts, S. J.: Active learning of model evidence using bayesian quadrature. In: Advances in Neural Information Processing Systems (NIPS), pp. 46–54 (2012)Google Scholar
- 23.Rasmussen, C. E., Williams, C. K.: Gaussian Processes for Machine Learning. The MIT Press (2006)Google Scholar
- 24.Rasmussen, C. E., Ghahramani, Z.: Bayesian monte carlo. In: Becker, S.T., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 489–496. MIT Press, Cambridge, MA (2003)Google Scholar
- 25.Särkkä, S.: Bayesian Filtering and Smoothing. Cambridge University Press, New York (2013)CrossRefMATHGoogle Scholar
- 26.Särkkä, S., Hartikainen, J., Svensson, L., Sandblom, F.: Gaussian process quadratures in nonlinear sigma-point filtering and smoothing. In: 2014 17th International Conference on Information Fusion (FUSION), pp. 1–8 (2014)Google Scholar
- 27.Sarmavuori, J., Särkkä, S.: Fourier-Hermite Kalman filter. IEEE Trans. Autom. Control
**57**(6), 1511–1515 (2012)MathSciNetCrossRefGoogle Scholar - 28.Šimandl, M., Duník, J.: Derivative-free estimation methods: new results and performance analysis. Automatica
**45**(7), 1749–1757 (2009)MathSciNetCrossRefMATHGoogle Scholar - 29.Smith, G. L., Schmidt, S. F., McGee, L. A.: Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle. Tech. rep., NASA Tech. Rep. (1962)Google Scholar
- 30.Wasserman, L.: All of Nonparametric Statistics. Springer (2007)Google Scholar