Symmetry exploits for Bayesian cubature methods
 185 Downloads
 1 Citations
Abstract
Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in order to reduce—in some cases substantially—the computational cost of the standard Bayesian cubature method. This work identifies several additional symmetry exploits within the Bayesian cubature framework. In particular, we go beyond earlier work in considering nonsymmetric measures and, in addition to the standard Bayesian cubature method, present exploits for the Bayes–Sard cubature method and the multioutput Bayesian cubature method.
Keywords
Probabilistic numerics Numerical integration Gaussian processes Fully symmetric sets1 Introduction
In the presence of a limited computational budget, it is natural to exploit any contextual information that may be available on the integrand. Classical cubatures, such as splinebased or Gaussian cubatures, are able to exploit abstract mathematical information, such as the number of continuous derivatives of the integrand (Davis and Rabinowitz 2007). However, in situations where more detailed or specific contextual information is available to the analyst, the use of generic classical cubatures can be suboptimal.
Despite these recent successes, a significant drawback of Bayesian cubature methods is that the cost of computing the distributional output is typically cubic in N, the size of the point set. For integrals whose domain M is highdimensional, the number N of points required can be exponential in \(m = \text {dim}(M)\). Thus the cubic cost associated with Bayesian cubature methods can render them impractical. In recent work, Karvonen and Särkkä (2018) noted that symmetric structure in the point set can be exploited to reduce the total computational cost. Indeed, in some cases the exponential dependence on m can be reduced to (approximately) linear. This is a similar effect to that achieved in the circulant embedding approach (Dietrich and Newsam 1997), or by the use of \({\mathcal {H}}\)matrices (Hackbusch 1999) and related approximations (Schäfer et al. 2017), though the approaches differ at a fundamental level. The aim of this paper is to present several related symmetry exploits that are specifically designed to reduce computational cost of Bayesian cubature methods.
Our principal contributions are following: First, the techniques developed in Karvonen and Särkkä (2018) are extended to the Bayes–Sard cubature method. This results in a computational method that is, essentially, of the complexity \({{\mathcal {O}}(J^3 + JN)}\), where J is the number of symmetric sets that constitute the full point set, instead of being cubic in N. In typical scenarios, there are at most a few hundred symmetric sets even though the total number of points can go up to millions. Second, we present an extension to the multioutput (i.e. vectorvalued) Bayesian cubature method that is used to simultaneously integrate \(D \in {\mathbb {N}}\) related integrals. In this case, the computational complexity is reduced from \({\mathcal {O}}(D^3 N^3)\) to \({\mathcal {O}}(D^3 J^3 + DJN)\). Third, a symmetric change of measure technique is proposed to avoid the (strong) assumption of symmetry on the measure \(\nu \) that was required in Karvonen and Särkkä (2018). Fourth, the performance of our techniques is empirically explored. Throughout, our focus is not on the performance of these integration methods, which has been explored in earlier work, already cited. Rather, our focus is on how computation for these methods can be accelerated.
The remainder of the article is structured as follows: Sect. 2 covers the essential background for Bayesian cubature methods and introduces fully symmetric sets that are used in the symmetry exploits throughout the article. Sections 3 and 4 develop fully symmetric Bayes–Sard cubature and fully symmetric multioutput Bayesian cubature. Section 5 explains how the assumption that \(\nu \) is symmetric can be relaxed. In Sect. 6, a detailed selection of empirical results is presented. Finally, some concluding remarks and discussion are contained in Sect. 7.
2 Background
This section reviews the standard Bayesian cubature method, due to Larkin (1972), and explains how fully symmetric sets can be used to alleviate its computational cost, as proposed by Karvonen and Särkkä (2018).
2.1 Standard Bayesian cubature
In this section, we present explicit formulae for the Bayesian cubature method in the case where the prior model (1) is a Gaussian random field. To simplify the notation, Sects. 2 and 3 assume that the integrand has scalar output (i.e. \(D = 1\)); this is then extended to vectorvalued output in Sect. 4.
The principal motivation for this work is the observation that both (6) and (7) involve the solution of an Ndimensional linear system defined by the matrix \(\varvec{K}_{X}\). In general this is a dense matrix and, as such, in the absence of additional structure in the linear system (Karvonen and Särkkä 2018) or further approximations [(e.g. LázaroGredilla et al. (2010), Hensman et al. (2018), Schäfer et al. (2017)], the computational complexity associated with the standard Bayesian cubature method is \({\mathcal {O}}(N^3)\). Moreover, it is often the case that \(\varvec{K}_X\) is illconditioned (Schaback 1995; Stein 2012). The exploitation of symmetric structure to circumvent the solution of a large and illconditioned linear system would render Bayesian cubature more practical, in the sense of computational efficiency and numerical robustness; this is the contribution of the present article.
2.2 Symmetry properties
Next we introduce fully symmetric sets and related symmetry concepts, before explaining in Sect. 2.3 how these can be exploited for computational simplification in the standard Bayesian cubature method. Note that, in what follows, no symmetry properties are needed for the integrand \(f^\dagger \) itself.
2.2.1 Fully symmetric point sets
Sizes of fully symmetric sets generated by the generator vector \(\varvec{\lambda } = (\lambda _1,\ldots ,\lambda _l,0,\ldots ,0)\) having \(l \le m\) distinct nonzero elements \(\lambda _1,\ldots ,\lambda _l\) [see (9)]
Dimension (m)  

2  3  4  5  6  7  
\(l=1\)  4  6  8  10  12  14 
\(l=2\)  8  24  48  80  120  168 
\(l=3\)  –  48  192  480  960  1680 
\(l=4\)  –  –  384  1920  5760  13,440 
\(l=5\)  –  –  –  3840  23,040  80,640 
\(l=6\)  –  –  –  –  46,080  322,560 
\(l=7\)  –  –  –  –  –  645,120 
2.2.2 Fully symmetric domains, kernels, and measures
At this point, we introduce several related definitions; these enable us later to state precisely which symmetry assumptions are being exploited.
Domains It will be assumed in the sequel that \({M \subset {\mathbb {R}}^m}\) is a fully symmetric domain, meaning that every fully symmetric set generated by a vector from M is contained in M: \([\varvec{\lambda }] \subset M\) whenever \(\varvec{\lambda } \in M\). Equivalently, \({M = \varvec{P}M = \{ \varvec{P} \varvec{x} \, :, \varvec{x} \in M \}}\) for any \(\varvec{P} \in \mathrm {Perm}^{{\mathrm{SC}}}_m\). Most popular domains, such as the whole of \({\mathbb {R}}^m\), hypercubes of the form \([a, a]^m\) (from which, e.g. the unit hypercube can be obtained by simple translation and scaling), balls and spheres, are fully symmetric.
Kernels A kernel \(k :M \times M \rightarrow {\mathbb {R}}\) defined on a fully symmetric domain M is said to be a fully symmetric kernel if \(k(\varvec{P}\varvec{x},\varvec{P}\varvec{x}') = k(\varvec{x},\varvec{x}')\) for any \(\varvec{P} \in \mathrm {Perm}^{{\mathrm{SC}}}_m\). Basic examples of fully symmetric kernels include isotropic kernels and products and sums of isotropic onedimensional kernels.
Measures A measure \(\nu \) on a fully symmetric domain M is a fully symmetric measure if it is invariant under fully symmetric pushforwards: \(\varvec{P}_*(\nu ) = \nu \) for any \({\varvec{P} \in \mathrm {Perm}^{{\mathrm{SC}}}_m}\). If \(\nu \) admits a Lebesgue density \(p_\nu \), this condition is equivalent to \(p_\nu (\varvec{x}) = p_\nu (\varvec{P}\varvec{x})\) for any \(\varvec{P} \in \mathrm {Perm}^{{\mathrm{SC}}}_m\). Note that this is a narrow class of measures and a relaxation of this assumption is discussed in Sect. 5.
2.2.3 Fully symmetric cubature rules
2.3 Fully symmetric Bayesian cubature
The central aim of this article is to derive generalisations for the Bayes–Sard and multioutput Bayesian cubatures of the following result from Karvonen and Särkkä (2018), originally developed only for the standard Bayesian cubature method.
Theorem 1
Theorem 1 demonstrates the principal idea; that one can exploit symmetry to reduce the number of kernel evaluations needed in the standard Bayesian cubature method from \(N^2\) to NJ and decrease the number of equations in the linear system that needs to be solved from N to J. Since J is typically considerably smaller than \(N = \sum _{j=1}^J \#[\varvec{\lambda }^j]\), using fully symmetric sets results in a substantial reduction in computational cost. Numerical examples in Karvonen and Särkkä (2018) showed that sets containing up to tens of millions of points become feasible in the standard Bayesian cubature method when symmetry exploits are used. The aim of this paper is to generalise these techniques to the important cases of Bayes–Sard cubature (Sect. 3) and multioutput Bayesian cubature (Sect. 4).
Remark 1
3 Fully symmetric Bayes–Sard cubature
In this section, we first review the Bayes–Sard cubature method from Karvonen et al. (2018) and then derive a generalisation of Theorem 1 for this method.
3.1 Bayes–Sard cubature

The posterior mean \(\mu _N(f^\dagger )\) is exactly equal to the integral \(I(f^\dagger )\) if \(f^\dagger \in \pi \). In particular, if \(\pi \) contains a nonzero constant function then \(\sum _{i=1}^N w_{X,i}^k = 1\) so that the cubature rule is normalised (however, nonnegativity of the weights is not guaranteed^{2}). This can improve the stability of the method in highdimensional settings Karvonen et al. (2018). In general, if \(\pi \) is the set of polynomials up to a certain order q, then the posterior mean is recognised as a cubature rule of algebraic degree q (Cools 1997, Definition 3.1).

Given any cubature rule \(\mu (f^\dagger ) = \sum _{i=1}^N w_i f^\dagger (\varvec{x}_i)\) for specified \(w_i \in {\mathbb {R}}\) and \(\varvec{x}_i \in M\), and given any covariance function k, one can find an Ndimensional function space \(\pi \) such that \(\mu _N = \mu \). Furthermore, the posterior standard deviation \(\sigma _N\) coincides with the worstcase error of the cubature rule \(\mu \) in the Hilbert space induced by k (Karvonen et al. 2018, Section 2.4). This demonstrates that any cubature rule can be interpreted as the posterior mean under an infinitude of prior models, providing a bridge between classical and Bayesian cubature methods.
3.2 A symmetry exploit for Bayes–Sard cubature
In this section, we present a novel result that enables fully symmetric sets to be exploited in the Bayes–Sard cubature method. In what follows we only consider a function space \(\pi \) spanned by even monomials exhibiting symmetries.^{3} In practice, we do not believe this to be a significant restriction since polynomials typically serve as a good and functional default and, in fact, one retains considerable freedom in selecting the polynomials, not being restricted to, for example, spaces of all polynomials of at most a given degree.
Lemma 1
Suppose that M and \(\nu \) are each fully symmetric. If \(\varvec{\alpha } \in {\mathbb {E}}_0^m\) then \(I(\varvec{x}^{\varvec{\alpha }}) = I(\varvec{x}^{\varvec{P} \varvec{\alpha }})\) for any \(\varvec{P} \in \text {Perm}_m\).
Proof
Lemma 2
Suppose that M, \(\nu \), and k are each fully symmetric and let \(\varvec{\lambda } \in M\). Then \(k_\nu (\varvec{x}) = k_\nu (\varvec{\lambda })\) for every \(\varvec{x} \in [\varvec{\lambda }]\).
Proof
The proof is essentially identical to that of Lemma 1. \(\square \)
Lemma 3
Proof
Lemma 4
Proof
We are now ready to prove the main result of this section. Theorem 2 establishes sufficient conditions for the Bayes–Sard cubature rule to be fully symmetric and, in that case, provides an explicit simplification of its output (13).
Theorem 2
Proof
Remark 2
The polynomial space \(\pi \) could be appended with fully symmetric collections of odd polynomials (i.e. by using additional basis functions \(\varvec{x}^{\varvec{\beta }}\), \(\varvec{\beta } \in [\varvec{\alpha }]^+\) for \(\varvec{\alpha } \notin {\mathbb {E}}_0^m\)). However, by doing this one gains nothing since the weights in \(\varvec{w}_{{\mathcal {A}}}^\pi \) corresponding to these basis functions turn out to be zero. This is quite easy to see from the easily proven facts that \(\sum _{\varvec{x} \in [\varvec{\lambda }]} \varvec{x}^{\varvec{\beta }} = 0\) and \(I(\varvec{x}^{\varvec{\beta }}) = 0\) whenever \(\varvec{\beta } \notin {\mathbb {E}}_0^m\).
Just like Theorem 1 for the standard Bayesian cubature, Theorem 2 reduces the number of kernel and basis function evaluations from roughly \(N^2 + Q^2\) to \(NJ + NJ_\alpha \) and the size of the linear system that needs to be solved from \(N+Q\) to \(J+J_\alpha \). Typically, this translates to a significant computational speedup; see Sect. 6.2 for a numerical example involving point sets of up to \(N = 179,400\). Such results could not realistically be obtained by direct solution of the original linear system (14).
4 Fully symmetric multioutput Bayesian cubature
In this section, we review the multioutput Bayesian cubature method recently proposed by Xi et al. (2018) and show how to exploit fully symmetric sets in reducing computational complexity of this method.
4.1 Multioutput Bayesian cubature
One often needs to integrate a number of related integrands, \(f_1^\dagger ,\ldots ,f_D^\dagger :M \rightarrow {\mathbb {R}}\). It is of course trivial to treat these as a set of D independent integrals and apply either the standard Bayesian or Bayes–Sard cubature method to approximate each integral. However, in many cases the relationship between the integrands can be explicitly modelled and leveraged.
4.2 Separable kernels
Remark 3
4.3 A symmetry exploit for multioutput Bayesian cubature
Our main result in this section is a second generalisation of Theorem 1, in this case for the multioutput Bayesian cubature method.
Theorem 3
Proof
The computational complexity of forming the fully symmetric weight matrix \(\varvec{W}_{\varLambda }\) is dominated by the DJN kernel evaluations needed to form \(\varvec{S}\) and the inversion of this \(DJ \times DJ\) matrix. Due to J often being orders of magnitude smaller than N, these tasks remain feasible even for a very large total number of points DN. For example, in Sect. 6.3 the result of Theorem 3 is applied to facilitate the simultaneous computation of up to \(D = 50\) integrals arising in a global illumination problem, each integrand being evaluated at up to \(N = 288\) points. Such results can barely be obtained by direct solution of the original linear system in (20).
5 Symmetric change of measure
The results presented in this article, and those originally described in Karvonen and Särkkä (2018), rely on the assumption that the measure \(\nu \) is fully symmetric (see Sect. 2.2.2). This is a strong restriction; most measures are not fully symmetric. However, this assumption can be avoided in a relatively straightforward manner, which is now described.
Remark 4
Note that the situation here is unlike standard importance sampling (Robert and Casella 2013, Section 3.3), in that the importance distribution \(\nu _*\) is required to be fully symmetric. As such, it seems not obvious how to mathematically characterise an “optimal” choice of \(\nu _*\). Indeed, any notion of optimality ought also depend on the cubature method that will be used. Nevertheless, obvious constructions (e.g. the choice of \(\nu _*\) as an isotropic centred Gaussian for \(\nu \) subGaussian and \(M = {\mathbb {R}}^m\)) can work rather well.
6 Results
In this section, we assess the performance of the fully symmetric Bayes–Sard and fully symmetric multioutput Bayesian cubature methods based on computational simplifications provided in Theorems 2 and 3. MATLAB code for all examples is provided at https://github.com/tskarvone/bcsymmetryexploits.
6.1 Selection of fully symmetric sets

In low dimensions, say \(m \le 4\), it is feasible to use (quasi) Monte Carlo samples as generators, as each fully symmetric set will contain at most 384 points (see Table 1). However, a large number of fully symmetric sets may be needed to ensure sufficient coverage of the space. This approach can work, as in Sect. 6.3, but is occasionally prone to failure (Karvonen and Särkkä 2018, Section 5.3).

In higher dimensions (or when a more robust design is desired), we recommend selecting a triedandtested fully symmetric point set, such as a sparse grid (Holtz 2011, Chapter 4). This can then be further modified if required, since fully symmetric sets can be added or removed at will. In very high dimensions, this can amount to using effectively lowdimensional generator vectors of the forms \((x_1,0,\ldots ,0)\), \((x_2,0,\ldots ,0)\), \((x_1,x_2,0,\ldots ,0)\) and so on, for points \(x_i\) that come from some classical onedimensional integration rule, such as Gauss–Hermite or Clenshaw–Curtis.
6.2 Zero coupon bonds
This example involves a model for zero coupon bonds that has been used to assess accuracy and robustness of the Bayes–Sard cubature and fully symmetric Bayesian cubature methods in Karvonen and Särkkä (2018) and Karvonen et al. (2018).
6.2.1 Integration problem
6.2.2 Setting
6.2.3 Results
The results are depicted in Fig. 2. We observe that Bayes–Sard method is much less sensitive to the lengthscale choice compared to the standard Bayesian cubature method. For instance, the selection \(\ell = \sqrt{m}\) has Bayes–Sard outperform the standard Bayesian cubature by roughly three orders of magnitude. It is also clear that, in this particular problem, the addition of more polynomial basis functions can significantly improve the integral estimates.
It was not possible to obtain results at this scale in the earlier work of Karvonen et al. (2018), where the largest value of N considered was 5,000. In contrast, our result in Theorem 2 enabled point sets of size up to \({N = 179,400}\) to be used. The computational time required to produce the results for the Bayes–Sard cubature in the most demanding case, \(T = 300\) and \({m = 2}\), was on the order of 2.5 min on a standard laptop computer. However, this can be mostly attributed to a suboptimal algorithm for generating the sparse grid. Indeed, after the points had been obtained it took roughly one second to compute the Bayes–Sard weights.
6.3 Global illumination integrals
Next we considered the multioutput Bayesian cubature method, together with the symmetry exploit developed in Sect. 4.1, to compute a collection of closely related integrals arising in a global illumination context. This is a popular application of Bayesian cubature methods; see Brouillat et al. (2009), Marques et al. (2013, 2015), Briol et al. (2019) and Xi et al. (2018) for existing work. In particular, multioutput Bayesian cubature was applied to the problem that we consider below in Xi et al. (2018), where \(D = 5\) integrals were simultaneously computed. Through computational simplifications obtained by using fully symmetric sets, in what follows we simultaneously compute up to \(D = 50\) integrals, a tenfold improvement.
6.3.1 Integration problem
6.3.2 Setting
6.3.3 Results
In accordance with Xi et al. (2018), we observed that the multioutput Bayesian cubature method is superior to the standard one already when \(D = 5\). The performance gain of the multioutput method keeps increasing when more integrands are added but is ultimately bounded. This is reasonable since integrands for wildly different \(\varvec{\omega }_o^d\) can convey little information about each other. For the smallest values of D the multioutput method is less accurate than the standard Bayesian cubature method. This can be explained by potential nonuniform covering of the unit sphere when the total number DJ of fully symmetric sets is low (e.g. when some of the generator vectors happen to cluster, the fully symmetric sets they generated do not greatly differ, so that less information is obtained on the integrand). For instance, the standard deviation over the 100 runs in the relative error of fully symmetric Bayesian cubature for the first integral (i.e. the case \(D = 1\) in Fig. 3) was 0.34 (\(J=3\)) or 0.17 (\(J=6\)) while that of the standard Bayesian cubature with random points was only 0.19 (\(N=144\)) or 0.11 (\(N=288\)). See also Karvonen and Särkkä (2018, Figure 5.1).
Computational times remained reasonable throughout this experiment; see Fig. 4. For example, without symmetry exploits, the case \(D = 50\) and \(J = 6\) would require \((DN)^2 = \) 207,360,000 kernel evaluations and inversion of a 14,400dimensional matrix while Theorem 3 reduces these numbers, respectively, to \({DNJ = 86,400}\) and \(DJ = 300\). From Fig. 4, it is seen that this computation took only 0.8 s. This suggests that with more carefully selected fully symmetric point sets it may be possible to realise the desire expressed in Xi et al. (2018, Section 4) of simultaneous computation of up to thousands of related integrals.
6.4 Symmetric change of measure illustration
The purpose of this final experiment is to briefly illustrate the symmetric change of measure technique, proposed in Sect. 5. To limit scope, we consider applying this technique in conjunction with the fully symmetric standard Bayesian cubature method (i.e. Theorem 1).
6.4.1 Integration problem
6.4.2 Setting
For these experiments \(\nu \) was taken to be a uniform mixture of eight Gaussian distributions \({N}(\varvec{\mu }_i, \varvec{\varSigma }_i)\), \({i = 1,\dots ,8}\), with their mean vectors drawn independently from the standard normal distributions and detrended so that \(\sum _{i=1}^8 \varvec{\mu }_i = \varvec{0}\). The covariance matrices of each Gaussian component were independent and normalised draws from the Wishart distribution \(W_6(\varvec{I}_6, d+2(q1))\), \(q \in {\mathbb {N}}\). The resulting \(\nu \) is almost surely not fully symmetric and therefore Theorem 1 cannot be applied. Different values of q correspond to different degrees of symmetricity of \(\nu \): for small values of q covariance matrices \(\varvec{\varSigma }_i\) are likely to be nearly singular, while as \(q \rightarrow \infty \) they become diagonal. Accordingly, we experimented with \(q \in \{1, \ldots , 8\}\). For each q, the proposal distribution \(\nu _*\) was a zeromean Gaussian with diagonal covariance \(\sigma ^2 \varvec{I}_6\) for \(\sigma ^2\) set to the mean of the diagonal elements of the \(\varvec{\varSigma }_i\). For Bayesian cubature, we used the Gaussian kernel (26) with a lengthscale \(\ell = 0.8\) and the Gauss–Hermite sparse grid (Karvonen and Särkkä 2018, Section 4.2) with the midpoint removed. Note that the resulting point sets are not nested for different N.
6.4.3 Results
7 Discussion
There is increasing interest in the use of Bayesian methods for numerical integration (Briol et al. 2019). Bayesian cubature methods are attractive due to analytic and theoretical tractability of the underlying Gaussian model. However, these method are also associated with a computational cost that is cubic in the number of points, N, and moreover the linear systems that must be inverted are typically illconditioned.
The symmetry exploits developed in this work circumvent the need for large linear systems to be solved in Bayesian cubature methods. In particular, we presented novel results for Bayes–Sard cubature (Karvonen et al. 2018) and multioutput Bayesian cubature (Xi et al. 2018) that make it possible to apply these methods even for extremely large datasets or when there are many function to be integrated. In conjunction with the inherent robustness of the Bayes–Sard cubature method (Karvonen et al. 2018), this results in a highly reliable probabilistic integration method that can be applied even to integrals that are relatively highdimensional.
Three extensions of this work are highlighted: First, the combination of multioutput and Bayes–Sard methods appears to be a natural extension and we expect that symmetry properties can similarly be exploited for this method. This could lead to promising procedures for integration of collections of closely related highdimensional functions appearing in, for example, financial applications (Holtz 2011). Similarly, our exploits should extend to the Student’s t based Bayesian cubatures proposed in Prüher et al. (2017). Second, the investigation of optimality criteria for the symmetric change of measure technique in Sect. 5 remains to be explored. Third, although we focussed solely on computational aspects, the important statistical question of how to ensure Bayesian cubature methods produce output that is wellcalibrated remains to some extent unresolved.^{5} As discussed in Karvonen and Särkkä (2018), it appears that symmetry exploits do not easily lend themselves to selection of kernel parameters, for instance via crossvalidation or maximisation of marginal likelihood.^{6} A potential, though somewhat heuristic, way to proceed might be to exploit the concentration of measure phenomenon (Ledoux 2001) or low effective dimensionality of the integrand (Wang and Sloan 2005) in order to identify a suitable data subset on which kernel parameters can be calibrated more easily or a priori.
Footnotes
 1.
 2.
It is possible to employ a positivity constraint (Ehler et al. 2019), but in that case there is no convenient closedform expression for the weights and the Bayesian interpretation is sacrificed.
 3.
Odd monomials come for “free”; see Remark 2.
 4.
The elements of each random generator vector are almost surely nonzero and distinct.
 5.
Though, see related work Jagadeeswaran and Hickernell (2019) on this point.
 6.
An exception is for kernel amplitude parameters, which can be analytically marginalised as in Proposition 2 of Briol et al. (2019).
Notes
Acknowledgements
Open access funding provided by Aalto University. TK was supported by the Aalto ELEC Doctoral School. SS was supported by the Academy of Finland. CJO was supported by the Lloyd’s Register Foundation Programme on DataCentric Engineering at the Alan Turing Institute, UK. This material was developed, in part, at the Prob Num 2018 workshop hosted by the Lloyd’s Register Foundation programme on DataCentric Engineering at the Alan Turing Institute, UK, and supported by the National Science Foundation, USA, under Grant DMS1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the abovenamed funding bodies and research institutions.
References
 Álvarez, M., Rosasco, L., Lawrence, N.: Kernels for vectorvalued functions: a review. Found. Trends Mach. Learn. 4(3), 195–266 (2012). https://doi.org/10.1561/2200000036 CrossRefzbMATHGoogle Scholar
 Bach, F., LacosteJulien, S., Obozinski, G.: On the equivalence between herding and conditional gradient algorithms. In: Proceedings of the 29th International Conference on Machine Learning, pp. 1355–1362 (2012). https://icml.cc/2012/papers/683.pdf. Accessed Sept 3 2019
 Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 18(21), 1–38 (2017)MathSciNetzbMATHGoogle Scholar
 Berlinet, A., ThomasAgnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, New York (2011)zbMATHGoogle Scholar
 Bezhaev, A.Yu..: Cubature formulae on scattered meshes. Sov. J. Numer. Anal. Math. Model. 6(2), 95–106 (1991). https://doi.org/10.1515/rnam.1991.6.2.95
 Brauchart, J.S., Saff, E.B., Sloan, I.H., Womersley, R.S.: QMC designs: optimal order quasi Monte Carlo integration schemes on the sphere. Math. Comput. 83(290), 2821–2851 (2014). https://doi.org/10.1090/S002557182014028391 MathSciNetCrossRefzbMATHGoogle Scholar
 Briol, F.X., Oates, C.J., Cockayne, J., Chen, W.Y., Girolami, M.: On the sampling problem for kernel quadrature. In: Proceedings of the 34th International Conference on Machine Learning, pp. 586–595 (2017). http://proceedings.mlr.press/v70/briol17a.html. Accessed Sept 3 2019
 Briol, F.X., Oates, C.J., Girolami, M., Osborne, M.A., Sejdinovic, D.: Probabilistic integration: a role in statistical computation? Stat. Sci. 34(1), 1–22 (2019)MathSciNetCrossRefGoogle Scholar
 Briol, F.X., Oates, C.J., Girolami, M., Osborne, M.A.: FrankWolfe Bayesian quadrature: probabilistic integration with theoretical guarantees. In: Advances in Neural Information Processing Systems, vol. 28, pp. 1162–1170 (2015). https://papers.nips.cc/paper/5749frankwolfebayesianquadratureprobabilisticintegrationwiththeoreticalguarantees. Accessed Sept 3 2019
 Brouillat, J., Bouville, C., Loos, B., Hansen, C., Bouatouch, K.: A Bayesian Monte Carlo approach to global illumination. Comput. Graph. Forum 28(8), 2315–2329 (2009). https://doi.org/10.1111/j.14678659.2009.01537.x CrossRefGoogle Scholar
 Chai, H., Garnett, R.: An improved Bayesian framework for quadrature of constrained integrands (2018). arXiv:1802.04782
 Chen, W., Mackey, L., Gorham, J., Briol, F.X., Oates, C.J.: Stein points. In: Proceedings of the 35th International Conference on Machine Learning (2018). http://proceedings.mlr.press/v80/chen18f. Accessed Sept 3 2019
 Cockayne, J., Oates, C.J., Sullivan, T., Girolami, M.: Bayesian probabilistic numerical methods (2017). arXiv:1702.03673
 Cools, R.: Constructing cubature formulae: the science behind the art. Acta Numer. 6, 1–54 (1997). https://doi.org/10.1017/S0962492900002701 MathSciNetCrossRefzbMATHGoogle Scholar
 Davis, P.J., Rabinowitz, P.: Methods of Numerical Integration. Courier Corporation, North Chelmsford (2007)zbMATHGoogle Scholar
 DeVore, R., Foucart, S., Petrova, G., Wojtaszczyk, P.: Computing a quantity of interest from observational data. Constr. Approx. (2018). https://doi.org/10.1007/s0036501894337 CrossRefzbMATHGoogle Scholar
 Diaconis, P.: Bayesian numerical analysis. In: Gupta, S.S., Berger, J.O. (eds.) Statistical Decision Theory and Related Topics IV, vol. 1, pp. 163–175. Springer, New York (1988). https://doi.org/10.1007/9781461387688_20 CrossRefGoogle Scholar
 Dietrich, C.R., Newsam, G.N.: Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrix. SIAM J. Sci. Comput. 18(4), 1088–1107 (1997). https://doi.org/10.1137/s1064827592240555 MathSciNetCrossRefzbMATHGoogle Scholar
 Dutre, P., Bekaert, P., Bala, K.: Advanced Global Illumination. AK Peters/CRC Press, Boca Raton (2006). https://doi.org/10.1201/9781315365473 CrossRefGoogle Scholar
 Ehler, M., Graef, M., Oates, C.J.: Optimal Monte Carlo integration on closed manifolds. Stat. Comput. (2019). https://doi.org/10.1007/s1122201909894w CrossRefGoogle Scholar
 Genz, A.: Fully symmetric interpolatory rules for multiple integrals. SIAM J. Numer. Anal. 23(6), 1273–1283 (1986). https://doi.org/10.1137/0723086 MathSciNetCrossRefzbMATHGoogle Scholar
 Genz, A., Keister, B.D.: Fully symmetric interpolatory rules for multiple integrals over infinite regions with Gaussian weight. J. Comput. Appl. Math. 71(2), 299–309 (1996). https://doi.org/10.1016/03770427(95)002324 MathSciNetCrossRefzbMATHGoogle Scholar
 Gunter, T., Osborne, M.A., Garnett, R., Hennig, P., Roberts, S.J.: Sampling for inference in probabilistic models with fast Bayesian quadrature. In: Advances in Neural Information Processing Systems, vol. 27, pp. 2789–2797 (2014). https://papers.nips.cc/paper/5483samplingforinferenceinprobabilisticmodelswithfastbayesianquadrature. Accessed Sept 3 2019
 Hackbusch, W.: A sparse matrix arithmetic based on \({\cal{H}}\)matrices. Part I: introduction to \({\cal{H}}\)matrices. Computing 62(2), 89–108 (1999). https://doi.org/10.1007/s006070050015 MathSciNetCrossRefzbMATHGoogle Scholar
 Hennig, P., Osborne, M.A., Girolami, M.: Probabilistic numerics and uncertainty in computations. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. (2015). https://doi.org/10.1098/rspa.2015.0142 CrossRefzbMATHGoogle Scholar
 Hensman, J., Durrande, N., Solin, A.: Variational Fourier features for Gaussian processes. J. Mach. Learn. Res. 11(151), 1–52 (2018)zbMATHGoogle Scholar
 Holtz, M.: Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance. Number 77 in Lecture Notes in Computational Science and Engineering. Springer, New York (2011). https://doi.org/10.1007/9783642160042 CrossRefzbMATHGoogle Scholar
 Jagadeeswaran, R., Hickernell, F.J.: Fast automatic Bayesian cubature using lattice sampling. Stat. Comput. (2019). https://doi.org/10.1007/s11222019098959. (to appear)CrossRefGoogle Scholar
 Kanagawa, M., Sriperumbudur, B. K., Fukumizu, K.: Convergence guarantees for kernelbased quadrature rules in misspecified settings. In: Advances in Neural Information Processing Systems, vol. 29, pp. 3288–3296 (2016). arXiv:1605.07254
 Kanagawa, M., Sriperumbudur, B.K., Fukumizu, K.: Convergence analysis of deterministic kernelbased quadrature rules in misspecified settings. Found. Comput. Math. (2019). https://doi.org/10.1007/s10208018094077 CrossRefGoogle Scholar
 Karvonen, T., Särkkä, S., Oates, C.J.: A Bayes–Sard cubature method. In: Advances in Neural Information Processing Systems, vol. 31, pp. 5882–5893 (2018). https://papers.nips.cc/paper/7829abayessardcubaturemethod. Accessed Sept 3 2019
 Karvonen, T., Särkkä, S.: Classical quadrature rules via Gaussian processes. In: 27th IEEE International Workshop on Machine Learning for Signal Processing (2017). https://doi.org/10.1109/mlsp.2017.8168195
 Karvonen, T., Särkkä, S.: Fully symmetric kernel quadrature. SIAM J. Sci. Comput. 40(2), A697–A720 (2018). https://doi.org/10.1137/17m1121779 MathSciNetCrossRefzbMATHGoogle Scholar
 Kennedy, M.: Bayesian quadrature with nonnormal approximating functions. Stat. Comput. 8(4), 365–375 (1998). https://doi.org/10.1023/A:1008832824006 CrossRefGoogle Scholar
 Larkin, F.M.: Probabilistic error estimates in spline interpolation and quadrature. In: Information Processing 74 (Proceedings of IFIP Congress, Stockholm, 1974), vol. 74, pp. 605–609. NorthHolland (1974)Google Scholar
 Larkin, F.M.: Gaussian measure in Hilbert space and applications in numerical analysis. Rocky Mt. J. Math. 2(3), 379–421 (1972). https://doi.org/10.1216/rmj197223379 MathSciNetCrossRefzbMATHGoogle Scholar
 LázaroGredilla, M., QuiñoneroCandela, J., Rasmussen, C.E., FigueirasVidal, A.R.: Sparse spectrum Gaussian process regression. J. Mach. Learn. Res. 11, 1865–1881 (2010)MathSciNetzbMATHGoogle Scholar
 Ledoux, M.: The Concentration of Measure Phenomenon. Number 89 in Mathematical Surveys and Monographs. American Mathematical Society, Providence (2001). https://doi.org/10.1090/surv/089 CrossRefGoogle Scholar
 Lu, J., Darmofal, D.L.: Higherdimensional integration with Gaussian weight for applications in probabilistic design. SIAM J. Sci. Comput. 26(2), 613–624 (2004). https://doi.org/10.1137/s1064827503426863 MathSciNetCrossRefzbMATHGoogle Scholar
 Marques, R., Bouville, C., Ribardière, M., Santos, L.P., Bouatouch, K.: A spherical Gaussian framework for Bayesian Monte Carlo rendering of glossy surfaces. IEEE Trans. Vis. Comput. Graph. 19(10), 1619–1932 (2013). https://doi.org/10.1109/tvcg.2013.79 CrossRefGoogle Scholar
 Marques, R., Bouville, C., Santos, L.P., Bouatouch, K.: Efficient Quadrature Rules for Illumination Integrals: From Quasi Monte Carlo to Bayesian Monte Carlo. Synthesis Lectures on Computer Graphics and Animation. Morgan & Claypool Publishers, San Rafael (2015). https://doi.org/10.2200/s00649ed1v01y201505cgr019 CrossRefzbMATHGoogle Scholar
 McNamee, J., Stenger, F.: Construction of fully symmetric numerical integration formulas. Numer. Math. 10(4), 327–344 (1967). https://doi.org/10.1007/BF02162032 MathSciNetCrossRefzbMATHGoogle Scholar
 Minka, T.: Deriving quadrature rules from Gaussian processes. Technical report, Microsoft Research, Statistics Department, Carnegie Mellon University (2000). https://www.microsoft.com/enus/research/publication/derivingquadraturerulesgaussianprocesses/. Accessed Sept 3 2019
 Najm, H.N., Debusschere, B.J., Marzouk, Y.M., Widmer, S., Le Maître, O.: Uncertainty quantification in chemical systems. Int. J. Numer. Methods Eng. 80(6–7), 789–814 (2009). https://doi.org/10.1002/nme.2551 MathSciNetCrossRefzbMATHGoogle Scholar
 Novak, E., Ritter, K.: Simple cubature formulas with high polynomial exactness. Constr. Approx. 15(4), 499–522 (1999). https://doi.org/10.1007/s003659900119 MathSciNetCrossRefzbMATHGoogle Scholar
 Novak, E., Ritter, K., Schmitt, R., Steinbauer, A.: On an interpolatory method for high dimensional integration. J. Comput. Appl. Math. 112(1–2), 215–228 (1999). https://doi.org/10.1016/s03770427(99)002228 MathSciNetCrossRefzbMATHGoogle Scholar
 Oates, C.J., Niederer, S., Lee, A., Briol, F.X., Girolami, M.: Probabilistic models for integration error in the assessment of functional cardiac models. In: Advances in Neural Information Processing Systems, vol. 30, pp. 109–117 (2017). http://papers.nips.cc/paper/6616probabilisticmodelsforintegrationerrorintheassessmentoffunctionalcardiacmodels. Accessed Sept 3 2019
 Oettershagen, J.: Construction of Optimal Cubature Algorithms with Applications to Econometrics and Uncertainty Quantification. Ph.D. thesis, Institut für Numerische Simulation, Universität Bonn (2017)Google Scholar
 O’Hagan, A.: Curve fitting and optimal design for prediction. J. R. Stat. Soc. Ser. B (Methodol.) 40(1), 1–42 (1978). https://doi.org/10.1111/j.25176161.1978.tb01643.x MathSciNetCrossRefzbMATHGoogle Scholar
 O’Hagan, A.: Bayes–Hermite quadrature. J. Stat. Plan. Inference 29(3), 245–260 (1991). https://doi.org/10.1016/03783758(91)90002v MathSciNetCrossRefzbMATHGoogle Scholar
 Osborne, M., Garnett, R., Ghahramani, Z., Duvenaud, D.K., Roberts, S.J., Rasmussen, C.E.: Active learning of model evidence using Bayesian quadrature. In: Advances in Neural Information Processing Systems, vol. 25, pp. 46–54 (2012a). https://papers.nips.cc/paper/4657activelearningofmodelevidenceusingbayesianquadrature. Accessed Sept 3 2019
 Osborne, M., Garnett, R., Roberts, S., Hart, C., Aigrain, S., Gibson, N.: Bayesian quadrature for ratios. In: Artificial Intelligence and Statistics, pp. 832–840 (2012b). http://proceedings.mlr.press/v22/osborne12/osborne12.pdf. Accessed Sept 3 2019
 Pronzato, L., Zhigljavsky, A.: Bayesian quadrature and energy minimization for spacefilling design (2018). arXiv:1808.10722
 Prüher, J., Tronarp, F., Karvonen, T., Särkkä, S., Straka, O.: Student\(t\) process quadratures for filtering of nonlinear systems with heavytailed noise. In: 20th International Conference on Information Fusion (2017). https://doi.org/10.23919/icif.2017.8009742
 Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
 Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2013)zbMATHGoogle Scholar
 Särkkä, S., Hartikainen, J., Svensson, L., Sandblom, F.: On the relation between Gaussian process quadratures and sigmapoint methods. J. Adv. Inf. Fusion 11(1), 31–46 (2016). arXiv:1504.05994 Google Scholar
 Schaback, R.: Error estimates and condition numbers for radial basis function interpolation. Adv. Comput. Math. 3(3), 251–264 (1995). https://doi.org/10.1007/bf02432002 MathSciNetCrossRefzbMATHGoogle Scholar
 Schäfer, F., Sullivan, T.J., Owhadi, H.: Compression, inversion, and approximate PCA of dense kernel matrices at nearlinear computational complexity (2017). arXiv:1706.02205
 Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: International Conference on Algorithmic Learning Theory, pp. 13–31. Springer (2007). https://doi.org/10.1007/9783540752257_5 Google Scholar
 Sommariva, A., Vianello, M.: Numerical cubature on scattered data by radial basis functions. Computing 76(3–4), 295–310 (2006). https://doi.org/10.1007/s0060700501422 MathSciNetCrossRefzbMATHGoogle Scholar
 Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York (2012)Google Scholar
 Wang, X., Sloan, I.H.: Why are highdimensional finance problems often of low effective dimension? SIAM J. Sci. Comput. 27(1), 159–183 (2005). https://doi.org/10.1137/s1064827503429429 MathSciNetCrossRefzbMATHGoogle Scholar
 Wendland, H.: Scattered Data Approximation. Number 28 in Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2005)Google Scholar
 Xi, X., Briol, F.X., Girolami, M.: Bayesian quadrature for multiple related integrals. In: Proceedings of the 35th International Conference on Machine Learning (2018) (to appear). arXiv:1801.04153
 Xiu, D., Karniadakis, G.E.: Modeling uncertainty in flow simulations via generalized polynomial chaos. J. Comput. Phys. 187(1), 137–167 (2003). https://doi.org/10.1016/s00219991(03)000925 MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.