# Bayesian Quadrature Variance in Sigma-Point Filtering

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 383)

## Abstract

Sigma-point filters are algorithms for recursive state estimation of the stochastic dynamic systems from noisy measurements, which rely on moment integral approximations by means of various numerical quadrature rules. In practice, however, it is hardly guaranteed that the system dynamics or measurement functions will meet the restrictive requirements of the classical quadratures, which inevitably results in approximation errors that are not accounted for in the current state-of-the-art sigma-point filters. We propose a method for incorporating information about the integral approximation error into the filtering algorithm by exploiting features of a Bayesian quadrature—an alternative to classical numerical integration. This is enabled by the fact that the Bayesian quadrature treats numerical integration as a statistical estimation problem, where the posterior distribution over the values of the integral serves as a model of numerical error. We demonstrate superior performance of the proposed filters on a simple univariate benchmarking example.

### Keywords

Nonlinear filtering Sigma-point filter Gaussian filter Integral variance Bayesian quadrature Gaussian process

## 1 Introduction

Dynamic systems are widely used to model behaviour of real processes throughout the sciences. In many cases, it is useful to define a state of the system and consequently work with a state-space representation of the dynamics. When the dynamics exhibits stochasticity or can only be observed indirectly, we are faced with the problem of state estimation. Estimating a state of the dynamic system from noisy measurements is a prevalent problem in many application areas such as aircraft guidance, GPS navigation [12], weather forecast [9], telecommunications [14] and time series analysis [2]. When the state estimator is required to produce an estimate using only the present and past measurements, this is known as the filtering problem.

For a discrete-time linear Gaussian systems, the best estimator in the mean-square-error sense is the much-celebrated Kalman filter (KF) [16]. First attempts to deal with the estimation of nonlinear dynamics can be traced to the work of [29], which resulted in the extended Kalman filter (EKF). The EKF algorithm uses the Taylor series expansion to approximate the nonlinearities in the system description. A disadvantage of the Taylor series is that it requires differentiability of the approximated functions. This prompted further development [20, 28] resulting in the derivative-free filters based on the Stirling’s interpolation formula. Other approaches that approximate nonlinearities include the Fourier-Hermite KF [27], special case of which is the statistically linearized filter [7, 18].

Instead of explicitly dealing with nonlinearities in the system description, the unscented Kalman filter (UKF) [15] describes the densities by a finite set of deterministically chosen sigma-points, which are then propagated through the nonlinearity. Other filters, such as the Gauss-Hermite Kalman filter (GHKF) [13], the cubature Kalman filter (CKF) [1] and the stochastic integration filter [6], utilize numerical quadrature rules to approximate moments of the relevant densities. These filters can be seen as representatives of a more general sigma-point methodology.

A limitation of classical integral approximations, such as the Gauss-Hermite quadrature (GHQ), is that they are specifically designed to perform with zero error on a narrow class of functions (typically polynomials up to a given degree). It is also possible to design rules, that have best average-case performance on a wider range of functions at the cost of permitting small non-zero error [19]. In recent years, the Bayesian quadrature (BQ) has become a focus of interest in probabilistic numerics community [22]. The BQ treats numerical integration as a problem of Bayesian inference and thus it is able to provide an additional information—namely, uncertainty in the computation of the integral itself. In [26], the authors work with the concept of BQ, but the algorithms derived therein do not make use of the uncertainty in the integral computations. The goal of this paper is to augment the current sigma-point algorithms so that the uncertainty associated with the integral approximations is also reflected in their estimates.

The rest of the paper is organized as follows. Formal definition of the Gaussian filtering problem is outlined in Sect. 2, followed by the exposition of the basic idea of Bayesian quadrature in Sect. 3. The main contribution, which is the design of the Bayes-Hermite Kalman filter (BHKF), is presented in Sect. 4. Finally, comparison of the BHKF with existing filters is made in Sect. 5.

## 2 Problem Formulation

The discrete-time stochastic dynamic system is described by the following state-space model
\begin{aligned} \mathbf {x}_k&\,=\, \mathbf {f}(\mathbf {x}_{k-1}) + \mathbf {q}_{k-1},&\quad&\mathbf {q}_{k-1} \sim \mathcal {N}(\mathbf {0}, \mathbf {Q}), \end{aligned}
(1)
\begin{aligned} \mathbf {z}_k&\,=\, \mathbf {h}(\mathbf {x}_k) + \mathbf {r}_k,&\quad&\mathbf {r}_k \sim \mathcal {N}(\mathbf {0}, \mathbf {R}) , \end{aligned}
(2)
with initial conditions $$\mathbf {x}_0 \sim \mathcal {N}(\mathbf {m}_0, \mathbf {P}_0)$$, where $$\mathbf {x}_k \in \mathbb {R}^n$$ is the system state evolving according to the known nonlinear dynamics $$\mathbf {f}: \mathbb {R}^n \rightarrow \mathbb {R}^n$$ perturbed by the white state noise $$\mathbf {w}_{k-1} \in \mathbb {R}^n$$. Measurement $$\mathbf {z}_k \in \mathbb {R}^p$$ is a result of applying known nonlinear transformation $$\mathbf {h}: \mathbb {R}^n \rightarrow \mathbb {R}^p$$ to the system state and white additive measurement noise $$\mathbf {r}_{k} \in \mathbb {R}^p$$. The mutual independence is assumed between the state noise $$\mathbf {w}_k$$, the measurement noise $$\mathbf {r}_k$$ and the system initial condition $$\mathbf {x}_0$$ for all $$k \ge 1$$.
The filtering problem is concerned with determination of the probability density function $$p(\mathbf {x}_k \!\mid \! \mathbf {z}_{1:k})$$. The shorthand $$\mathbf {z}_{1:k}$$ stands for the sequence of measurements $$\mathbf {z}_1,\, \mathbf {z}_2,\, \ldots ,\, \mathbf {z}_k$$. The general solution to the filtering problem is given by the Bayesian recursive relations in the form of density functions
\begin{aligned} p(\mathbf {x}_k \!\mid \! \mathbf {z}_{1:k}) \,=\, \frac{p(\mathbf {z}_k \!\mid \! \mathbf {x}_k)\,p(\mathbf {x}_k \!\mid \! \mathbf {z}_{1:k-1})}{p(\mathbf {z}_k \!\mid \! \mathbf {z}_{1:k-1})} , \end{aligned}
(3)
with predictive density $$p(\mathbf {x}_k \!\mid \! \mathbf {z}_{1:k-1})$$ given by the Chapman-Kolmogorov equation
\begin{aligned} p(\mathbf {x}_k \!\mid \! \mathbf {z}_{1:k-1}) \,=\, \! \int \! p(\mathbf {x}_k \!\mid \! \mathbf {x}_{k-1})p(\mathbf {x}_{k-1} \!\mid \! \mathbf {z}_{1:k-1})\, \mathrm {d}\mathbf {x}_{k-1}. \end{aligned}
(4)
In this paper, the integration domain is assumed to be the support of $$\mathbf {x}_{k-1}$$. The likelihood term $$p(\mathbf {z}_k \!\mid \! \mathbf {x}_k)$$ in (3) is determined by the measurement model (2) and the transition probability $$p(\mathbf {x}_k \!\mid \! \mathbf {x}_{k-1})$$ in (4) by the dynamics model (1).
For tractability reasons, the Gaussian filters make simplifying assumption, that the joint density of state and measurement $$p(\mathbf {x}_k,\, \mathbf {z}_k\!~\mid ~\!\mathbf {z}_{1:k-1})$$ is of the form
\begin{aligned} \mathcal {N} \left( \begin{bmatrix}\, \mathbf {x}_{k|k-1} \\[0.2cm] \,\mathbf {z}_{k|k-1} \,\end{bmatrix} \,\left| \, \begin{bmatrix}\, \mathbf {m}^\mathrm {x}_{k|k-1} \\[0.2cm] \,\mathbf {m}^\mathrm {z}_{k|k-1} \,\end{bmatrix},\, \begin{bmatrix} \,\mathbf {P}^\mathrm {x}_{k|k-1}&\mathbf {P}^\mathrm {xz}_{k|k-1} \\[0.2cm] \,\mathbf {P}^\mathrm {zx}_{k|k-1}&\mathbf {P}^\mathrm {z}_{k|k-1} \, \end{bmatrix} \right. \right) . \end{aligned}
(5)
Knowledge of the moments in (5) is fully sufficient [4] to express the first two moments, $$\mathbf {m}^\mathrm {x}_{k|k}$$ and $$\mathbf {P}^\mathrm {x}_{k|k}$$, of the conditional density $$p(\mathbf {x}_k \!\mid \!\mathbf {z}_{1:k})$$ using the conditioning formula for Gaussians as
\begin{aligned} \mathbf {m}^\mathrm {x}_{k|k}&\,=\, \mathbf {m}^\mathrm {x}_{k|k-1} \,+\, \mathbf {K}_k\left( \mathbf {z}_k \,-\, \mathbf {m}^\mathrm {z}_{k|k-1}\right) , \end{aligned}
(6)
\begin{aligned} \mathbf {P}^\mathrm {x}_{k|k}&\,=\, \mathbf {P}^\mathrm {x}_{k|k-1} \,-\, \mathbf {K}_k\mathbf {P}^\mathrm {z}_{k|k-1}\mathbf {K}_k^\top , \end{aligned}
(7)
with the Kalman gain defined as $$\mathbf {K}_k~=~\mathbf {P}^\mathrm {xz}_{k|k-1}\left( \mathbf {P}^\mathrm {z}_{k|k-1}\right) ^{-1}$$.
The problem of computing the moments in (5) can be seen, on a general level, as a computation of moments of a transformed random variable
\begin{aligned} \mathbf {y} \,=\, \mathbf {g}(\mathbf {x}) , \end{aligned}
(8)
where $$\mathbf {g}$$ is a nonlinear vector function. This invariably entails evaluation of the integrals of the following kind
\begin{aligned} \mathbb {E}[\mathbf {y}] \,=\, \int \! \mathbf {g}(\mathbf {x}) p(\mathbf {x}) \, \mathrm {d}\mathbf {x} \end{aligned}
(9)
with Gaussian $$p(\mathbf {x})$$. Since the integral is typically intractable, sigma-point algorithms resort to the approximations based on weighted sum of function evaluations
\begin{aligned} \int \! \mathbf {g}(\mathbf {x}) p(\mathbf {x}) \, \mathrm {d}\mathbf {x} \approx \sum _{i=1}^N w_i \mathbf {g}(\mathbf {x}^{(i)}). \end{aligned}
(10)
The evaluation points $$\mathbf {x}^{(i)}$$ are also known as the sigma-points, hence the name. Thus, for instance, to compute $$\mathbf {m}^\mathrm {x}_{k|k-1}$$, $$\mathbf {P}^\mathrm {x}_{k|k-1}$$ and $$\mathbf {P}^\mathrm {xz}_{k|k-1}$$, we would use the following expressions given in matrix notation
\begin{aligned} \mathbf {m}^\mathrm {x}_{k|k-1}&\,\simeq \, \mathbf {F}^\top \mathbf {w} , \end{aligned}
(11)
\begin{aligned} \mathbf {P}^\mathrm {x}_{k|k-1}&\,\simeq \, \tilde{\mathbf {F}}^\top \mathbf {W}\tilde{\mathbf {F}} , \end{aligned}
(12)
\begin{aligned} \mathbf {P}^\mathrm {xz}_{k|k-1}&\,\simeq \, \tilde{\mathbf {X}}^\top \mathbf {W}\tilde{\mathbf {H}} , \end{aligned}
(13)
where $${{\mathbf {w}}} \,=\, \left[ \, w_1, \ldots , w_N \,\right] ^\top$$, $${{\mathbf {W}}} \,=\, \mathrm {diag}\left( \, [w_1, \ldots , w_N] \,\right)$$ are the quadrature weights. The remaining matrices are defined as
\begin{aligned} {{\mathbf {F}}} = \begin{bmatrix} {{\mathbf {f}}}\left( \mathbf {x}^{(1)}_{k-1}\right) ^\top \\ \vdots \\ {{\mathbf {f}}}\left( \mathbf {x}^{(N)}_{k-1}\right) ^\top \end{bmatrix} , \qquad \tilde{ {{\mathbf {F}}}} = \begin{bmatrix} \left( {{\mathbf {f}}}\left( {{\mathbf {x}}}^{(1)}_{k-1}\right) - {{\mathbf {m}}}^\mathrm {x}_{k|k-1}\right) ^\top \\ \vdots \\ \left( {{\mathbf {f}}}\left( {{\mathbf {x}}}^{(N)}_{k-1}\right) - {{\mathbf {m}}}^\mathrm {x}_{k|k-1}\right) ^\top \end{bmatrix} \end{aligned}
(14)
and
\begin{aligned} \tilde{ {{\mathbf {X}}}} = \begin{bmatrix} \left( {{\mathbf {x}}}^{(1)}_{k-1} - {{\mathbf {m}}}^\mathrm {x}_{k|k-1}\right) ^\top \\ \vdots \\ \left( {{\mathbf {x}}}^{(N)}_{k-1} - {{\mathbf {m}}}^\mathrm {x}_{k|k-1}\right) ^\top \end{bmatrix} , \qquad \tilde{ {{\mathbf {H}}}} = \begin{bmatrix} \left( {{\mathbf {h}}}\left( {{\mathbf {x}}}^{(1)}_{k-1}\right) - {{\mathbf {m}}}^\mathrm {z}_{k|k-1}\right) ^\top \\ \vdots \\ \left( {{\mathbf {h}}}\left( {{\mathbf {x}}}^{(N)}_{k-1}\right) - {{\mathbf {m}}}^\mathrm {z}_{k|k-1}\right) ^\top \end{bmatrix}. \end{aligned}
(15)
All the information a quadrature rule has about the function behaviour is conveyed by the N function values $$\mathbf {g}(\mathbf {x}^{(i)})$$. Conversely, this means that any quadrature is uncertain about the true function values in between the sigma-points. The importance of quantifying this uncertainty becomes particularly pronounced, when the function is not integrated exactly due to the inherent design limitations of the quadrature (such as the choice of weights and sigma-points). All sigma-point filters thus operate with the uncertainty, which is not accounted for in their estimates. The classical treatment of the quadrature does not lend itself nicely to the quantification of the uncertainty associated with a given rule. On the other hand, the Bayesian quadrature, which treats the integral approximation as a problem in Bayesian inference, is perfectly suited for this task.

The idea of using Bayesian quadrature in the state estimation algorithms was already treated in [26]. The derived filters and smoothers, however, do not utilize the full potential of the Bayesian quadrature. Namely, the integral variance is not reflected in their estimates. In this article, we aim to remedy this issue by making use of familiar expressions for GP prediction at uncertain inputs [3, 10].

## 3 Gaussian Process Priors and Bayesian Quadrature

In this section, we introduce the key concepts of Gaussian process priors and Bayesian quadrature, which are crucial to the derivation of the filtering algorithm in Sect. 4.

### 3.1 Gaussian Process Priors

Uncertainty over functions is naturally expressed by a stochastic process. In Bayesian quadrature, Gaussian processes (GP) are used for their favourable analytical properties. Gaussian process is a collection of random variables indexed by elements of an index set, any finite number of which has a joint Gaussian density [23]. That is, for any finite set of indices $$\mathbf {X}^\prime = \left\{ \mathbf {x}^\prime _1,\, \mathbf {x}^\prime _2,\, \ldots ,\, \mathbf {x}^\prime _m \right\}$$, it holds that
\begin{aligned} \big [ g(\mathbf {x}^\prime _1),\ g(\mathbf {x}^\prime _2),\ \ldots ,\ g(\mathbf {x}^\prime _m) \,\big ]^\top \,\sim \, \mathcal {N}(\mathbf {0}, \mathbf {K}) , \end{aligned}
(16)
where the kernel (covariance) matrix $$\mathbf {K}$$ is made up of pair-wise evaluations of the kernel function, thus $$\left[ \mathbf {K}\right] _{ij} \,=\, k(\mathbf {x}_i, \mathbf {x}_j)$$. Choosing a kernel, which in principle can be any symmetric positive definite function of two arguments, introduces assumptions about the underlying function we are trying to model. Bayesian inference allows to combine the GP prior p(g) with the data, $$\mathcal {D} \,=\, \left\{ \left( \mathbf {x}_i, g(\mathbf {x}_i)\right) , i = 1, \ldots , N \right\}$$ comprising the evaluation points $$\mathbf {X} \,=\, [\mathbf {x}_1, \ldots , \mathbf {x}_N]$$ and the function evaluations $$\mathbf {y}_g \,=\, \left[ \, g(\mathbf {x}_1), \ldots , g(\mathbf {x}_N) \,\right] ^\top$$, to produce a GP posterior $$p(g \!\mid \! \mathcal {D})$$ with moments given by [23]
\begin{aligned} \mathbb {E}_g[g(\mathbf {x}^{\prime })] \,=\, m_\mathrm {g}(\mathbf {x}^{\prime })&\,=\, \mathbf {k}^\top (\mathbf {x}^{\prime })\mathbf {K}^{-1}\mathbf {y}_\mathrm {g} , \end{aligned}
(17)
\begin{aligned} \mathbb {V}_g[g(\mathbf {x}^{\prime })] \,=\, \sigma ^2_\mathrm {g}(\mathbf {x}^{\prime })&\,=\, k(\mathbf {x}^{\prime }, \mathbf {x}^{\prime }) - \mathbf {k}^\top (\mathbf {x}^{\prime })\mathbf {K}^{-1}\mathbf {k}(\mathbf {x}^{\prime }) , \end{aligned}
(18)
where $$\mathbf {k}(\mathbf {x}^{\prime }) = [k(\mathbf {x}^{\prime }, \mathbf {x}_1), \ldots , k(\mathbf {x}^{\prime }, \mathbf {x}_N)]^\top$$. Thus, for any test input $$\mathbf {x}^{\prime }$$, we recover a Gaussian posterior predictive density over the function values $$g(\mathbf {x}^\prime )$$. Figure 1 depicts predictive moments of the GP posterior density. Notice, that in between the evaluations, where the true function value is not known, the GP model is uncertain.

The problem of numerical quadrature pertains to the approximate computation of the integral
\begin{aligned} \mathbb {E}_\mathbf {x}[g(\mathbf {x})] \,=\, \int \! g(\mathbf {x})p(\mathbf {x}) \,\mathrm {d}\mathbf {x}. \end{aligned}
(19)
The key distinguishing feature of the BQ is that it “treats the problem of numerical integration as the one of statistical inference.” [21]. This is achieved by placing a prior density over the integrated functions themselves. Consequence of this is that the integral itself is then a random variable as well. Concretely, if GP prior density is used, then the value of the integral of the function will also be Gaussian distributed. This follows from the fact that integral is a linear operator acting on the GP distributed random function $$g(\mathbf {x})$$.
Following the line of thought of [24] we take expectation (with respect to $$p(g\!\mid \!\mathcal {D})$$) of the integral (19) and obtain
\begin{aligned} \mathbb {E}_{g\mid \mathcal {D}}[\mathbb {E}_\mathbf {x}[g(\mathbf {x})]] \,&=\, \int \!\!\!\!\int \! g(\mathbf {x}) p(\mathbf {x}) \,\mathrm {d}\mathbf {x}\, p(g \!\mid \! \mathcal {D})\, \mathrm {d}g \nonumber \\ \,&=\, \int \!\!\!\!\int \! g(\mathbf {x}) p(g \!\mid \! \mathcal {D}) \, \mathrm {d}g \; p(\mathbf {x}) \,\mathrm {d}\mathbf {x} \,=\, \mathbb {E}_\mathbf {x}[\mathbb {E}_{g|\mathcal {D}}[g(\mathbf {x})]]. \end{aligned}
(20)
From (20) we see, that taking the expectation of integral is the same as integrating the GP posterior mean function, which effectively approximates the integrated function g(x). A posteriori integral variance is [24]
\begin{aligned} \mathbb {V}_{g\mid \mathcal {D}}[\mathbb {E}_\mathbf {x}[g(\mathbf {x})]] \,=\, \int \!\!\!\!\int \! \Big [ k(\mathbf {x}, \mathbf {x}^{\prime }) - \mathbf {k}^\top (\mathbf {x})\mathbf {K}^{-1}\mathbf {k}(\mathbf {x}^{\prime }) \Big ] p(\mathbf {x})p(\mathbf {x}^\prime )\,\mathrm {d}\mathbf {x}\,\mathrm {d}\mathbf {x}^\prime . \end{aligned}
(21)
A popular choice of kernel function, that enables the expressions (20) and (21) to be computed analytically is an Exponentiated Quadratic (EQ)
\begin{aligned} k(\mathbf {x}_i, \,\mathbf {x}_j; {{\varvec{\theta }}}) \,=\, \alpha ^2 \exp \Big (-\frac{1}{2} \big (\mathbf {x}_i - \mathbf {x}_j \big )^\top {{\varvec{\Lambda }}}^{-1} \big (\mathbf {x}_i - \mathbf {x}_j\big ) \Big ) , \end{aligned}
(22)
where the vertical length scale $$\alpha$$ and the horizontal length scales on diagonal of $${{\varvec{\Lambda }}} \,=\, \mathrm {diag}(\, [ \ell ^2_1,\, \ldots ,\, \ell ^2_n] \,)$$ are kernel hyper-parameters, collectively denoted by the symbol $${{\varvec{\theta }}}$$. By using this particular kernel the assumption of smoothness (infinite differentiability) of the integrand is introduced [23]. Given the kernel function in the form (22) and $$p(\mathbf {x}) = \mathcal {N}(\mathbf {m},\, \mathbf {P})$$, the expressions for the integral posterior mean and variance reduce to
\begin{aligned} \mathbb {E}_{g\mid \mathcal {D}}[\mathbb {E}_\mathbf {x}[g(\mathbf {x})]]&= \mathbf {l}^\top \mathbf {K}^{-1}\mathbf {y}_\mathrm {g} , \end{aligned}
(23)
\begin{aligned} \mathbb {V}_{g\mid \mathcal {D}}[\mathbb {E}_\mathbf {x}[g(\mathbf {x})]]&= \alpha ^2 \left| 2 {{\varvec{\Lambda }}}^{-1}\mathbf {P} + \mathbf {I} \right| ^{-1/2} - \mathbf {l}^\top \mathbf {K}^{-1}\mathbf {l} , \end{aligned}
(24)
with $${{\mathbf {l}}} \,=\, [l_1,\, \ldots ,\, l_N]^\top$$, where
\begin{aligned} l_i \,&=\, \int \! k( {{\mathbf {x}}}, {{\mathbf {x}}}_i; {{\varvec{\theta }}}_\mathrm {g}) \, \mathcal {N}( {{\mathbf {x}}} \mid {{\mathbf {m}}}, {{\mathbf {P}}}) \,\mathrm {d} {{\mathbf {x}}} \nonumber \\ \,&=\, \alpha ^2 \left| {{\varvec{\Lambda }}}^{-1} {{\mathbf {P}}} + {{\mathbf {I}}} \right| ^{-1/2} \exp \Big (-\frac{1}{2} \big ( {{\mathbf {x}}}_i - {{\mathbf {m}}}\big )^\top \big ( {{\varvec{\Lambda }}} + {{\mathbf {P}}}\big )^{-1} \big ( {{\mathbf {x}}}_i - {{\mathbf {m}}}\big ) \Big ). \end{aligned}
(25)
Notice that we could define weights as $$\mathbf {w} = \mathbf {l}^\top \mathbf {K}^{-1}$$. Then the expression (23) is just a weighted sum of function evaluations, formally conforming to the general sigma-point method as described by (11). As opposed to classical quadrature rules, that prescribe the precise locations of sigma-points, BQ makes no such restrictions. In [19], the optimal placement is determined by minimizing the posterior variance of the integral (21). In the next section, we show how the integral variance (21) can be reflected in the current nonlinear filtering quadrature-based algorithms.

## 4 Bayes-Hermite Kalman Filter

In this section, we show how the integral variance can be incorporated into the moment estimates of the transformed random variable. Parallels are drawn with existing GP-based filters and the Bayes-Hermite Kalman filter algorithm is outlined.

### 4.1 Incorporating Integral Uncertainty

Uncertainty over the function values is introduced by a GP posterior $$p(g \!\mid \! \mathcal {D})$$, whose mean function (17) acts effectively as an approximation to the deterministic function g. Note that the equations (17), (18) can only be used to model single output dimension of the vector function $$\mathbf {g}$$. For now, we will assume a scalar function g unless otherwise stated. To keep the notation uncluttered, conditioning on $$\mathcal {D}$$ will be omitted. Treating the function values $$g(\mathbf {x})$$ as random leads to the joint density $$p(g,\mathbf {x})$$ and thus, when computing the moments of $$g(\mathbf {x})$$, the expectations need to be taken with respect to both variables. This results in the following approximation of the true moments
\begin{aligned} \mu&\,=\, \mathbb {E}_ {{\mathbf {x}}}[g(\mathbf {x})] \approx \mathbb {E}_{g, {{\mathbf {x}}}}[g(\mathbf {x})] , \end{aligned}
(26)
\begin{aligned} \sigma ^2&\,=\, \mathbb {V}_ {{\mathbf {x}}}[g(\mathbf {x})] \approx \mathbb {V}_{g, {{\mathbf {x}}}}[g(\mathbf {x})]. \end{aligned}
(27)
Using the law of iterated expectations, we get
\begin{aligned} \mathbb {E}_{g, {{\mathbf {x}}}}[g(\mathbf {x})] \,=\, \mathbb {E}_g[\mathbb {E}_ {{\mathbf {x}}}[g(\mathbf {x})]] = \mathbb {E}_ {{\mathbf {x}}}[\mathbb {E}_g[g(\mathbf {x})]]. \end{aligned}
(28)
This fact was used to derive weights for the filtering and smoothing algorithms in [26], where the same weights were used in computations of means and covariances. Our proposed approach, however, proceeds differently in derivation of weights used in the computation of covariance matrices.
Note, that the term for variance can be written out using the decomposition formula $$\mathbb {V}_{g, {{\mathbf {x}}}}[g(\mathbf {x})] = \mathbb {E}_{g, {{\mathbf {x}}}}[g(\mathbf {x})^2] - \mathbb {E}_{g, {{\mathbf {x}}}}[g(\mathbf {x})]^2$$ and the Eq. (28) either as
\begin{aligned} \mathbb {V}_{g, {{\mathbf {x}}}}[g(\mathbf {x})] \,=\, \mathbb {E}_ {{\mathbf {x}}}[\mathbb {V}_g[g(\mathbf {x})]] \,+\, \mathbb {V}_ {{\mathbf {x}}}[\mathbb {E}_g[g(\mathbf {x})]] \, , \end{aligned}
(29)
or as
\begin{aligned} \mathbb {V}_{g, {{\mathbf {x}}}}[g(\mathbf {x})] \,=\, \mathbb {E}_g[\mathbb {V}_ {{\mathbf {x}}}[g(\mathbf {x})]] \,+\, \mathbb {V}_g[\mathbb {E}_ {{\mathbf {x}}}[g( {{\mathbf {x}}})]] , \end{aligned}
(30)
depending on which factorization of the joint density $$p(g, {{\mathbf {x}}})$$ is used. The terms $$\mathbb {V}_g[g( {{\mathbf {x}}})]$$ and $$\mathbb {V}_g[\mathbb {E}_ {{\mathbf {x}}}[g(\mathbf {x})]]$$ can be identified as variance of the integrand and variance of the integral respectively. In case of deterministic g , both of these terms are zero. With EQ covariance (22), the expression (28) for the first moment of a transformed random variable takes on the form (23).
Since the variance decompositions in (29) and (30) are equivalent, both can be used to achieve the same goal. The form (29) was utilized in derivation of the Gaussian process—assumed density filter (GP-ADF) [5], which relies on the solution to the problem of prediction with GPs at uncertain inputs [10]. So, even though these results were derived to solve a seemingly different problem, we point out, that by using the form (29), the uncertainty of the integral (as seen in the last term of (30)) is implicitly reflected in the resulting covariance. To conserve space, we only provide a summary of the results in [3] and point reader to the said reference for detailed derivations. The expressions for the moments of transformed variable were rewritten into a form, which assumes that a single GP is used to model all the output dimensions of the vector function (8)
\begin{aligned} {{\varvec{\mu }}}&\,=\, \mathbf {G}^\top \mathbf {w} , \end{aligned}
(31)
\begin{aligned} {{\varvec{\Sigma }}}&\,=\, \mathbf {G}^\top \mathbf {W}\mathbf {G} - {{\varvec{\mu }}} {{\varvec{\mu }}}^\top + \mathrm {diag}\left( \alpha ^2 - \mathrm {\mathrm {tr}\left( \mathbf {K}^{-1}\mathbf {L}\right) }\right) , \end{aligned}
(32)
with matrix $$\mathbf {G}$$ being defined analogously to $$\mathbf {F}$$ in (11)–(13). The weights are given as
\begin{aligned} \mathbf {w} \,=\, \mathbf {K}^{-1}\mathbf {l}\ \text { and }\ \mathbf {W} \,=\, \mathbf {K}^{-1}\mathbf {LK}^{-1} , \end{aligned}
(33)
where
\begin{aligned} \mathbf {L} \,=\, \int \! k(\mathbf {X}, \mathbf {x}; {{\varvec{\theta }}}_\mathrm {g}) \, k(\mathbf {x}, \mathbf {X}; {{\varvec{\theta }}}_\mathrm {g}) \, \mathcal {N}(\mathbf {x} \!\mid \! \mathbf {m}, \mathbf {P}) \,\mathrm {d}\mathbf {x} . \end{aligned}
(34)
The elements of the matrix $$\mathbf {L}$$ are given by
\begin{aligned}{}[\mathbf {L}]_{ij}&= \frac{k(\mathbf {x}_i, {{\varvec{m}}}; {{\varvec{\theta }}}_\mathrm {g})\, k(\mathbf {x}_j, {{\varvec{m}}}; {{\varvec{\theta }}}_\mathrm {g})}{ | 2 {{\varvec{P\Lambda }}}^{-1} + \mathbf {I} |^{1/2}} \nonumber \\&\qquad {} \times \exp \left( \left( \mathbf {z}_{ij} - {{\varvec{m}}}\right) ^\top \left( {{\varvec{P}}} + \tfrac{1}{2} {{\varvec{\Lambda }}}\right) ^{-1} {{\varvec{P\Lambda }}}^{-1} \left( \mathbf {z}_{ij} - {{\varvec{m}}}\right) \right) , \end{aligned}
(35)
where $$\mathbf {z}_{ij} = \tfrac{1}{2}(\mathbf {x}_{i} + \mathbf {x}_{j})$$. The Eqs. (31) and (32) bear certain resemblance to the sigma-point method in (11), (12); however, in this case matrix $$\mathbf {W}$$ is not diagonal.

### 4.2 BHKF Algorithm

The filtering algorithm based on the BQ can now be constructed utilizing (31) and (32). The BHKF uses two GPs with the EQ covariance—one for each function in the state-space model (1) and (2), which means that the two sets of hyper-parameters are used; $${{\varvec{\theta }}}_\mathrm {f}$$ and $${{\varvec{\theta }}}_\mathrm {h}$$. In the algorithm specification below, the lower index of $$\mathbf {q}$$ and $$\mathbf {K}$$ specifies the set of hyper-parameters used to compute these quantities.

Algorithm 1 (Bayes-Hermite Kalman Filter)

In the following, let system initial conditions $$\mathbf {x}_{0|0} \sim \mathcal {N}\left( \mathbf {m}_{0|0},\, \mathbf {P}_{0|0}\right)$$, the sigma-point index $$i \,=\, 1, \ldots , N$$ and time step index $$k \,=\, 1,2,\ldots \, .$$

Initialization:

Choose unit sigma-points $${{\varvec{\xi }}}^{(i)}$$. Set hyper-parameters $${{\varvec{\theta }}}_\mathrm {f}$$ and $${{\varvec{\theta }}}_\mathrm {h}$$. For all time steps k, proceed from the initial conditions $$\mathbf {x}_{0|0}$$, by alternating between the following prediction and filtering steps.

Prediction:
1. 1.

Form the sigma-points $$\mathbf {x}^{(i)}_{k-1} = \mathbf {m}^\mathrm {x}_{k-1|k-1} + \sqrt{\mathbf {P}^\mathrm {x}_{k-1|k-1}}\, {{\varvec{\xi }}}^{(i)}$$.

2. 2.

Propagate sigma-points through the dynamics model $$\mathbf {x}^{(i)}_{k} \,=\, \mathbf {f}\big (\mathbf {x}^{(i)}_{k-1}\big )$$, and form $$\mathbf {F}$$ as in (11)–(13).

3. 3.

Using $$\mathbf {\xi }^{(i)}_{k}$$ and hyper-parameters $${{\varvec{\theta }}}_\mathrm {f}$$, compute weights $$\mathbf {w}^\mathrm {x}$$ and $$\mathbf {W}^\mathrm {x}$$ according to (33) and (34) (with $$m=0$$, and $$P=I$$).

4. 4.
Compute predictive mean $$\mathbf {m}^\mathrm {x}_{k|k-1}$$ and predictive covariance $$\mathbf {P}^\mathrm {x}_{k|k-1}$$
\begin{aligned} \mathbf {m}^\mathrm {x}_{k|k-1}&\,=\, \mathbf {F}^\top \mathbf {w}^\mathrm {x} , \\ \mathbf {P}^\mathrm {x}_{k|k-1}&\,=\, \mathbf {F}^\top \mathbf {W}^\mathrm {x}\mathbf {F} - \mathbf {m}^\mathrm {x}_{k|k-1}\big (\mathbf {m}^\mathrm {x}_{k|k-1}\big )^\top \\&\quad + \mathrm {diag}\big (\alpha ^2 - \mathrm {tr}\big (\mathbf {K}_\mathrm {f}^{-1}\mathbf {L}\big )\big ) + \mathbf {Q}. \end{aligned}

Filtering:
1. 1.

Form the sigma-points $$\mathbf {x}^{(i)}_k \,=\, \mathbf {m}^\mathrm {x}_{k|k-1} + \sqrt{\mathbf {P}^\mathrm {x}_{k|k-1}}\, {{\varvec{\xi }}}^{(i)}$$.

2. 2.

Propagate the sigma-points through the measurement model $$\mathbf {z}^{(i)}_{k} \,=\, \mathbf {h}\big ( \mathbf {x}^{(i)}_{k} \big )$$, and form $$\mathbf {H}$$ as in (11)–(13)

3. 3.

Using $$\mathbf {\xi }^{(i)}_{k}$$ and hyper-parameters $${{\varvec{\theta }}}_\mathrm {h}$$, compute weights $$\mathbf {w}^\mathrm {z}$$ and $$\mathbf {W}^\mathrm {z}$$ according to (33) and (35) (with $$m=0$$, and $$P=I$$) and $$\mathbf {W}^\mathrm {xz} = \mathrm {diag}\big (\, \mathbf {l}_\mathrm {h} \,\big )\mathbf {K}_\mathrm {h}^{-1}$$.

4. 4.
Compute measurement mean, covariance and state-measurement cross-covariance
\begin{aligned} \mathbf {m}^\mathrm {z}_{k|k-1}&\,=\, \mathbf {H}^\top \mathbf {w}^\mathrm {z} ,\\ \mathbf {P}^\mathrm {z}_{k|k-1}&\,=\, \mathbf {H}^\top \mathbf {W}^\mathrm {z}\mathbf {H} - \mathbf {m}^\mathrm {z}_{k|k}\big (\mathbf {m}^\mathrm {z}_{k|k}\big )^\top \\&\quad + \mathrm {diag}\left( \alpha ^2 - \mathrm {tr}\left( \mathbf {K}_\mathrm {h}^{-1}\mathbf {L}\right) \right) + \mathbf {R} , \\ \mathbf {P}^\mathrm {xz}_{k|k-1}&\,=\, \mathbf {P}^\mathrm {x}_{k|k-1}\big (\mathbf {P}^\mathrm {x}_{k|k-1}+ {{\varvec{\Lambda }}}\big )^{-1}\tilde{\mathbf {X}}\mathbf {W}^\mathrm {xz}\mathbf {H} , \end{aligned}
where the i-th row of $$\tilde{\mathbf {X}}$$ is $$\mathbf {x}^{(i)}_{k} - \mathbf {m}^\mathrm {x}_{k|k-1}$$

5. 5.
Compute the filtered mean $$\mathbf {m}^\mathrm {x}_{k|k}$$ and filtered covariance $$\mathbf {P}^\mathrm {x}_{k|k}$$
\begin{aligned} \mathbf {m}^\mathrm {x}_{k|k}&\,=\, \mathbf {m}^\mathrm {x}_{k|k-1} \,+\, \mathbf {K}_k\big (\mathbf {z}_k \,-\, \mathbf {m}^\mathrm {z}_{k|k-1}\big ) , \\ \mathbf {P}^\mathrm {x}_{k|k}&\,=\, \mathbf {P}^\mathrm {x}_{k|k-1} \,-\, \mathbf {K}_k\mathbf {P}^\mathrm {z}_{k|k-1}\mathbf {K}_k^\top , \end{aligned}
with Kalman gain $$\mathbf {K}_k = \mathbf {P}^\mathrm {xz}_{k|k-1}\big (\mathbf {P}^\mathrm {z}_{k|k-1}\big )^{-1}$$.

## 5 Numerical Illustration

In the numerical simulations the performance of the filters was tested on a univariate non-stationary growth model (UNGM) [11]
\begin{aligned} x_k&\,=\, \frac{1}{2}x_{k-1} \,+\, \frac{25x_{k-1}}{1+x^2_{k-1}} \,+\, 8\cos (1.2\,k) \,+\, q_{k-1} , \end{aligned}
(36)
\begin{aligned} z_k&\,=\, \frac{1}{20} x^2_{k-1} \,+\, r_k , \end{aligned}
(37)
with the state noise $$q_{k-1} \sim \mathcal {N}(0, 10)$$, measurement noise $$r_k \!\sim \! \mathcal {N}(0, 1)$$ and initial conditions $$x_{0|0} \sim \mathcal {N}(0, 5)$$.
Since the BHKF does not prescribe the sigma-point locations, they can be chosen at will. The GHKF based on the r-th order Gauss-Hermite (GH) quadrature rule uses sigma-points, which are determined as the roots of the r-th degree univariate Hermite polynomial $$H_r(x)$$. When it is required to integrate function of a vector argument ($$n>1$$), a multidimensional grid of points is formed by the Cartesian product, leading to their exponential growth ($$N = r^n$$). The GH weights are computed according to [25] as
\begin{aligned} w_i = \frac{r!}{[rH_{r-1}(x^{(i)})]^2}. \end{aligned}
(38)
The Unscented Transform (UT) is also a simple quadrature rule [13], that uses $$N = 2n+1$$ deterministically chosen sigma-points,
\begin{aligned} \mathbf {x}^{(i)} = \mathbf {m} + \sqrt{\mathbf {P}} {{\varvec{\xi }}}^{(i)} \end{aligned}
(39)
with unit sigma-points defined as columns of the matrix
\begin{aligned} \left[ \, {{\varvec{\xi }}}^{(0)},\ {{\varvec{\xi }}}^{(1)},\ \ldots ,\ {{\varvec{\xi }}}^{(2n+1)} \,\right] = \Big [\, \mathbf {0},\ c\mathbf {I}_n,\ -c\mathbf {I}_n \,\Big ] , \end{aligned}
(40)
where $$\mathbf {I}_n$$ denotes $$n\times n$$ identity matrix. The corresponding weights are defined by
\begin{aligned} w_0 = \frac{\kappa }{n+\kappa }, \quad w_i = \frac{1}{2(n+\kappa )}, \quad i = 1, \ldots , 2n \end{aligned}
(41)
with scaling factor $$c = \sqrt{n+\kappa }$$. Very similar to UT is the spherical-radial (SR) integration rule, which is a basis of the cubature Kalman filter (CKF) [1]. The SR rule uses 2n sigma-points given by
\begin{aligned} \left[ \, {{\varvec{\xi }}}^{(1)},\ \ldots ,\ {{\varvec{\xi }}}^{(2n)} \,\right] = \Big [\, c\mathbf {I}_n,\ -c\mathbf {I}_n \,\Big ] \end{aligned}
(42)
with $$c = \sqrt{n}$$ and weights $$w_i = 1/2n, \quad i = 1, \ldots , 2n$$. All of the BHKFs used the same vertical lengthscale $$\alpha = 1$$. The horizontal lengthscale was set to $$\ell = 3.0$$ for UT, $$\ell = 0.3$$ for SR and GH-5, and $$\ell = 0.1$$ for all higher-order GH sigma-point sets. BHKFs that used UT and GH sigma-points of order 5, 7, 10, 15 and 20 were compared with their classical quadrature-based counterparts, namely, UKF and GHKF of order 5, 7, 10, 15 and 20. UKF operated with $$\kappa = 0$$.
We performed 100 simulations, each for $$K=500$$ time steps. Root-mean-square error (RMSE)
\begin{aligned} \mathrm {RMSE} = \sqrt{\frac{1}{K}\sum _{k=1}^{K} \big ( \mathbf {x}_k - \mathbf {m}^\mathrm {x}_{k|k} \big )^2} \end{aligned}
(43)
was used to measure the overall difference of the state estimate $$\mathbf {m}^\mathrm {x}_{k|k}$$ from the true state $${{\mathbf {x}}}_k$$ across all time steps. The negative log-likelihood of the filtered state estimate $${{\mathbf {m}}}^\mathrm {x}_{k|k}$$ and covariance $${{\mathbf {P}}}^\mathrm {x}_k$$
\begin{aligned} \mathrm {NLL} = -\log p( {{\mathbf {x}}}_k \mid {{\mathbf {z}}}_{1:k}) = \frac{1}{2}\left[ \log \left| 2\pi {{\mathbf {P}}}^\mathrm {x}_k\right| + ( {{\mathbf {x}}}_{k} - {{\mathbf {m}}}^\mathrm {x}_{k|k} )^\top ( {{\mathbf {P}}}^\mathrm {x}_k)^{-1} ( {{\mathbf {x}}}_{k} - {{\mathbf {m}}}^\mathrm {x}_{k|k} ) \right] \end{aligned}
(44)
was used to measure the overall model fit [8]. As a metric that takes into account the estimated state covariance, the non-credibility index (NCI) [17] given by
\begin{aligned} \mathrm {NCI} = \frac{10}{K} \sum _{k=1}^{K} \log _{10}\frac{ \big (\mathbf {x}_k - \mathbf {m}^\mathrm {x}_{k|k}\big )^\top \mathbf {P}^{-1}_{k|k} \big (\mathbf {x}_k - \mathbf {m}^\mathrm {x}_{k|k}\big ) }{ \big (\mathbf {x}_k - \mathbf {m}^\mathrm {x}_{k|k}\big )^\top {{\varvec{\Sigma }}}^{-1}_{k} \big (\mathbf {x}_k - \mathbf {m}^\mathrm {x}_{k|k}\big ) } \end{aligned}
(45)
was used, where $${{\varvec{\Sigma }}}_{k}$$ is the mean-square-error matrix. The filter is said to be optimistic if it underestimates the actual error, which is indicated by $$\mathrm {NCI} > 0$$. Perfectly credible filter would provide $$\mathrm {NCI} = 0$$, that is, it would neither overestimate nor underestimate the actual error.
Table 1

The average root-mean-square error

Sigma-pts.

N

Bayesian

Classical

SR

2

6.157  ±  0.071

13.652  ±  0.253

UT

3

7.124  ±  0.131

7.103  ±  0.130

GH-5

5

8.371  ±  0.128

10.466  ±  0.198

GH-7

7

8.360  ±  0.043

9.919  ±  0.215

GH-10

10

7.082  ±  0.098

8.035  ±  0.193

GH-15

15

6.944  ±  0.048

8.224  ±  0.188

GH-20

20

6.601  ±  0.058

7.406  ±  0.193

Table 2

The average negative log-likelihood

Sigma-pts.

N

Bayesian

Classical

SR

2

3.328  ±  0.026

56.570  ±  2.728

UT

3

4.970  ±  0.343

5.306  ±  0.481

GH-5

5

4.088  ±  0.064

14.722  ±  0.829

GH-7

7

4.045  ±  0.017

12.395  ±  0.855

GH-10

10

3.530  ±  0.012

7.565  ±  0.534

GH-15

15

3.468  ±  0.014

7.142  ±  0.557

GH-20

20

3.378  ±  0.017

5.664  ±  0.488

Tables show average values of the performance criteria across simulations with estimates of $$\pm 2$$ standard deviations (obtained by bootstrapping [30]). As evidenced by the results in Table 1, the BQ provides superior RMSE performance for all sigma-point sets. In the classical quadrature case the performance improves with increasing number of sigma-points used. Table 2 shows that the performance of BHKF is clearly superior in terms of NLL, which indicates that the estimates produced by the BQ-based filters are better representations of the unknown true state development. The self-assessment of the filter performance is less optimistic in the case of BQ, as indicated by the lower NCI in the Table 3. This indicates that the BQ based filters are more conservative in their covariance estimates. This is a consequence of including additional uncertainty (integral variance), which the classical quadrature-based filters do not utilize. Also note, that the variance of all the evaluated criteria for Bayesian quadrature based filters is mostly an order of magnitude lower.
Table 3

The average non-credibility index

Sigma-pts.

N

Bayesian

Classical

SR

2

1.265  ±  0.010

18.585  ±  0.045

UT

3

0.363  ±  0.108

0.897  ±  0.088

GH-5

5

4.549  ±  0.013

9.679  ±  0.068

GH-7

7

4.368  ±  0.006

8.409  ±  0.076

GH-10

10

2.520  ±  0.006

5.315  ±  0.058

GH-15

15

2.331  ±  0.008

5.424  ±  0.059

GH-20

20

1.654  ±  0.007

4.105  ±  0.055

To achieve competitive results, the kernel lengthscale $$\ell$$ had to be manually set for each filter separately. This was done by running the filters with increasing lengthscale, plotting the performance metrics and choosing the value which gave the smallest RMSE and the NCI closest to 0. Figure 2 illustrates the effect of changing length scale on the overall performance of the BHKF with UT sigma-points.

## 6 Conclusions

In this paper, we proposed a way of utilizing uncertainty associated with integral approximations in the nonlinear sigma-point filtering algorithms. This was enabled by the Bayesian treatment of the quadrature as well as by making use of the previously derived results for the GP prediction at uncertain inputs.

The proposed Bayesian quadrature based filtering algorithms were tested on a univariate benchmarking example. The results show that the filters utilizing additional quadrature uncertainty show significant improvement in terms of estimate credibility and overall model fit.

We also showed, that proper setting of the hyper-parameters is crucially important for achieving competitive results. Further research should be concerned with development of principled approaches for dealing with the kernel hyper-parameters. Freedom of choice of the sigma-points in the BQ offers a good opportunity for developing adaptive sigma-point placement techniques.

## Notes

### Acknowledgments

This work was supported by the Czech Science Foundation, project no. GACR P103-13-07058J.

### References

1. 1.
Arasaratnam, I., Haykin, S.: Cubature kalman filters. IEEE Trans. Autom. Control 54(6), 1254–1269 (2009)
2. 2.
Bhar, R.: Stochastic filtering with applications in finance. World Scientific (2010)Google Scholar
3. 3.
Deisenroth, M. P., Huber, M. F., Hanebeck, U. D.: Analytic moment-based Gaussian process filtering. In: Proceedings of the 26th Annual International Conference on Machine Learning—ICML ’09, pp. 1–8. ACM Press (2009)Google Scholar
4. 4.
Deisenroth, M. P., Ohlsson, H.: A general perspective on gaussian filtering and smoothing: explaining current and deriving new algorithms. In: IEEE (June 2011) American Control Conference (ACC), pp. 1807–1812 (2011)Google Scholar
5. 5.
Deisenroth, M.P., Turner, R.D., Huber, M.F., Hanebeck, U.D., Rasmussen, C.E.: Robust filtering and smoothing with gaussian processes. IEEE Trans. Autom. Control 57(7), 1865–1871 (2012)
6. 6.
Duník, J., Straka, O., Šimandl, M.: Stochastic integration filter. IEEE Trans. Autom. Control 58(6), 1561–1566 (2013)
7. 7.
Gelb, A.: Applied Optimal Estimation. The MIT Press (1974)Google Scholar
8. 8.
Gelman, A.: Bayesian Data Analysis. Chapman and Hall/CRC, 3rd edn (2013)Google Scholar
9. 9.
Gillijns, S., Mendoza, O., Chandrasekar, J., De Moor, B., Bernstein, D., Ridley, A.: What is the ensemble kalman filter and how well does it work? In: American Control Conference, 2006. p. 6 (2006)Google Scholar
10. 10.
Girard, A., Rasmussen, C. E., Quiñonero Candela, J., Murray-Smith, R.: Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 545–552. MIT Press (2003)Google Scholar
11. 11.
Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/non-gaussian bayesian state estimation. IEE Proceedings F (Radar and Signal Processing) 140(2), 107–113 (1993)
12. 12.
Grewal, M. S., Weill, L. R., Andrews, A. P.: Global Positioning Systems, Inertial Navigation, and Integration. Wiley (2007)Google Scholar
13. 13.
Ito, K., Xiong, K.: Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Control 45(5), 910–927 (2000)
14. 14.
Jiang, T., Sidiropoulos, N., Giannakis, G.: Kalman filtering for power estimation in mobile communications. IEEE Trans. Wireless Commun. 2(1), 151–161 (2003)
15. 15.
Julier, S.J., Uhlmann, J.K., Durrant-Whyte, H.F.: A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Autom. Control 45(3), 477–482 (2000)
16. 16.
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)
17. 17.
Li, X. R., Zhao, Z.: Measuring estimator’s credibility: noncredibility index. In: 2006 9th International Conference on Information Fusion, pp. 1–8 (2006)Google Scholar
18. 18.
Maybeck, P. S.: Stochastic Models, Estimation and Control: Volume 2. Academic Press (1982)Google Scholar
19. 19.
Minka, T.P.: Deriving Quadrature Rules from Gaussian Processes. Tech. rep., Statistics Department, Carnegie Mellon University, Tech. Rep (2000)Google Scholar
20. 20.
Nørgaard, M., Poulsen, N.K., Ravn, O.: New developments in state estimation for nonlinear systems. Automatica 36, 1627–1638 (2000)
21. 21.
O’Hagan, A.: Bayes-Hermite quadrature. J. Stat. Plann. Infer. 29(3), 245–260 (1991)
22. 22.
Osborne, M. A., Rasmussen, C. E., Duvenaud, D. K., Garnett, R., Roberts, S. J.: Active learning of model evidence using bayesian quadrature. In: Advances in Neural Information Processing Systems (NIPS), pp. 46–54 (2012)Google Scholar
23. 23.
Rasmussen, C. E., Williams, C. K.: Gaussian Processes for Machine Learning. The MIT Press (2006)Google Scholar
24. 24.
Rasmussen, C. E., Ghahramani, Z.: Bayesian monte carlo. In: Becker, S.T., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 489–496. MIT Press, Cambridge, MA (2003)Google Scholar
25. 25.
Särkkä, S.: Bayesian Filtering and Smoothing. Cambridge University Press, New York (2013)
26. 26.
Särkkä, S., Hartikainen, J., Svensson, L., Sandblom, F.: Gaussian process quadratures in nonlinear sigma-point filtering and smoothing. In: 2014 17th International Conference on Information Fusion (FUSION), pp. 1–8 (2014)Google Scholar
27. 27.
Sarmavuori, J., Särkkä, S.: Fourier-Hermite Kalman filter. IEEE Trans. Autom. Control 57(6), 1511–1515 (2012)
28. 28.
Šimandl, M., Duník, J.: Derivative-free estimation methods: new results and performance analysis. Automatica 45(7), 1749–1757 (2009)
29. 29.
Smith, G. L., Schmidt, S. F., McGee, L. A.: Application of statistical filter theory to the optimal estimation of position and velocity on board a circumlunar vehicle. Tech. rep., NASA Tech. Rep. (1962)Google Scholar
30. 30.
Wasserman, L.: All of Nonparametric Statistics. Springer (2007)Google Scholar