Symmetry exploits for Bayesian cubature methods

Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in order to reduce—in some cases substantially—the computational cost of the standard Bayesian cubature method. This work identifies several additional symmetry exploits within the Bayesian cubature framework. In particular, we go beyond earlier work in considering non-symmetric measures and, in addition to the standard Bayesian cubature method, present exploits for the Bayes–Sard cubature method and the multi-output Bayesian cubature method.


This paper considers the numerical approximation of an integral
where the set M ⊂ R m is equipped with the Borel σalgebra B, ν is a measure on (M, B), and f † : M → R is a B-measurable scalar-valued integrand (vector-valued integrands will be considered in Sec. 4).Our interest is in the situation where the exact values of f † cannot be deduced until the function itself is evaluated, and that the evaluations are associated with a substantial computational cost or a very large number of them is required.Such situations are typical in, for example, uncertainty quantification for chemical systems [42], fluid mechanical simulation [64] and certain financial applications [26].
In lieu of a limited computational budget, it is natural to exploit any contextual information that may be available on the integrand.Classical cubatures, such as spline-based or Gaussian cubatures, are able to exploit abstract mathematical information, such as the number of continuous derivatives of the integrand [14].However, in situations where more detailed or specific contextual information is available to the analyst, the use of generic classical cubatures can be sub-optimal.
The language of probabilities provides one mechanism in which contextual information about the integrand can be captured.Let (Ω, F, P) be a probability space.Then an analyst can elicit their prior information about the integrand f † in the form of a stochastic process model: wherein the function x x x → f (x x x; ω) is B-measurable for each fixed ω ∈ Ω.Through the stochastic process, the analyst can encode both abstract mathematical information, such as the number of continuous derivatives of the integrand, and specific contextual information, such as the possibility of a trend or a periodic component.The process of elicitation is not discussed in this work (see [16,24]); for our purposes the stochastic process in Eqn. 1 is considered to be provided.
In Bayesian cubature methods, due to Larkin [33] and re-discovered in [16,48,41], the analyst first selects a point set X = {x x x i } N i=1 ⊂ M , N ∈ N, on which the true integrand f † is evaluated.Let this data be denoted D = {(x x x i , f † (x x x i ))} N i=1 .Then the analyst conditions their stochastic process according to these data D, to obtain a second stochastic process ω → f N (• ; ω).
The analyst reports the implied distribution over the value of the integral of interest; that is the law of the random variable This distribution can be computed in closed form under certain assumptions on the structure of the prior model.A sufficient condition is that the stochastic process is Gaussian, which (arguably) does not severely restrict the analyst in terms of what contextual information can be included [53].In addition, the probabilistic output of the method enables uncertainty quantification for the unknown true value of the integral [33,12,8].These appealing properties have led to Bayesian cubature methods being used in diverse areas such as from computer graphics [38], non-linear filtering [52] and applied Bayesian statistics [49].
The theoretical aspects of Bayesian cubature methods have now been widely-studied.In particular, convergence of the posterior mean point estimator as N → ∞ has been studied in both the well-specified [4,59,7,19,8] and mis-specified [27,28] regimes, where in the mis-specified case the true integrand f † need not be a realisation f (• ; ω) from the stochastic process model for any ω ∈ Ω.Some relationships between the posterior mean estimator and classical cubature methods have been documented in [16,55,29].In [34,48,31] the Bayes-Sard framework was studied, where it was proposed to incorporate an explicit parametric component [47] into the prior model in order that contextual information, such as trends, can be properly encoded.The choice of point set X for Bayesian cubature has been studied in [7,1,6,46,11,51].In addition, several extensions have been considered to address specific technical challenges posed by non-negative integrands [10], model evidence integrals in a Bayesian context [49,22], ratios [50], non-Gaussian prior models [32,52], measures that can be only be sampled [45], and vector-valued integrands [63].Despite these recent successes, a significant drawback of Bayesian cubature methods is that the cost of computing the distributional output is typically cubic in N , the size of the point set.For integrals whose domain M is high-dimensional, the number N of points required can be exponential in m = dim(M ).Thus the cubic cost associated with Bayesian cubature methods can render them impractical.In recent work, Karvonen and Särkkä [30] noted that symmetric structure in the point set can be exploited to reduce the total computational cost.Indeed, in some cases the exponential dependence on m can be reduced to (approximately) linear.This is a similar effect to that achieved in the circulant embedding approach [17], or by the use of H-matrices [23] and related approximations [57], though the approaches differ at a fundamental level.The aim of this paper is to present several related symmetry exploits, that are specifically designed to reduce computational cost of Bayesian cubature methods.
Our principal contributions are following: First, the techniques developed in [30] are extended to the Bayes-Sard cubature method.This results in a computational method that is, essentially, of the complexity O(J 3 + JN ), where J is the number of symmetric sets that constitute the full point set, instead of being cubic in N .In typical scenarios there are at most a few hundred symmetric sets even though the total number of points can go up to millions.Second, we present an extension to the multi-output (i.e., vector-valued) Bayesian cubature method that is used to simultaneously integrate D ∈ N related integrals.In this case, the computational complexity is reduced from O(D 3 N 3 ) to O(D 3 J 3 + DJN ).Third, a symmetric change of measure technique is proposed to avoid the (strong) assumption of symmetry on the measure ν that was required in [30].Fourth, the performance of our techniques is empirically explored.Throughout, our focus is not on the performance of these integration methods, which has been explored in earlier work, already cited.Rather, our focus is on how computation for these methods can be accelerated.
The remainder of the article is structured as follows: Sec. 2 covers the essential background for Bayesian cubature methods and introduces fully symmetric sets that are used in the symmetry exploits throughout the article.Secs. 3 and 4 develop fully symmetric Bayes-Sard cubature and fully symmetric multi-output Bayesian cubature.Sec.explains how the assumption that ν is symmetric can be relaxed.In Sec. 6 a detailed selection of empirical results are presented.Finally, some concluding remarks and discussion are contained in Sec. 7.

Background
This section reviews the standard Bayesian cubature method, due to Larkin [33], and explains how fully symmetric sets can be used to alleviate its computational cost, as proposed in [30].

Standard Bayesian Cubature
In this section we present explicit formulae for the Bayesian cubature method in the case where the prior model (Eqn. 1) is a Gaussian random field.To simplify the notation, Secs. 2 and 3 assume that the integrand has scalar output (i.e.D = 1); this is then extended to vector-valued output in Sec. 4.
To reduce the notational overhead, in what follows the ω ∈ Ω argument is left implicit.Thus we consider f (x x x) to be a scalar-valued random variable for each x x x ∈ M .In particular, in this paper we focus on stochastic processes that are Gaussian, meaning that there exists a mean function m : M → R and a symmetric positive definite covariance function . . .
for any N ∈ N and all point sets {x x x i } N i=1 ⊂ M .The conditional distribution f N of this field, based on the data D = {(x x x i , f † (x x x i )} N i=1 of function evaluations at the points X = {x x x i } N i=1 , is also Gaussian, with mean and covariance functions where the vector From the fact that linear functionals of Gaussian processes are Gaussian, we obtain that with Here k ν (x x x) := M k(x x x, x x x ) dν(x x x ) is called the kernel mean function [58] and is the variance of the integral itself under the prior model.This method is known as the standard Bayesian cubature, with the implicit understanding that the model for the integrand should be carefully selected to ensure ( 5) is well-calibrated [8], meaning that the uncertainty assessment can be trusted.The need for careful calibration is in line with standard approaches to the Gaussian process regression task [53].
To understand when the Bayesian cubature output is meaningful, it is useful to write the posterior mean and variance ( 6) and (7) in terms of the weight vector That is, we have µ . Let H(k) be the Hilbert space reproduced by the kernel k (see [3] for background).It can then be verified that w w w X solves a quadratic minimisation problem of approximating k ν with a function from the finite-dimensional space spanned by {k(•, x x x)} x x x∈X ⊂ H(k), namely: . and that the minimum the value of this norm is σ X (see e.g.[46,Ch. 3] and [2]).Equivalently, the can be obtained as the minimiser of the worst case error sup among all cubature rules with points X, with σ N corresponding to the minimal worst case error [8,46].Thus, in terms of uncertainty quantification, the posterior standard deviation σ X can indeed be meaningfully related to the integration problem being solved.
The principal motivation for this work is the observation that both Eqn.6 and 7 involve the solution of an N -dimensional linear system defined by the matrix K K K X .In general this is a dense matrix and, as such, in the absence of additional structure in the linear system [30] or further approximations (e.g.[35,25,57]), the computational complexity associated with the standard Bayesian cubature method is O(N 3 ).Moreover, it is often the case that K K K X is ill-conditioned [56,60].The exploitation of symmetric structure to circumvent the solution of a large and ill-conditioned linear system would render Bayesian cubature more practical, in the sense of computational efficiency and numerical robustness; this is the contribution of the present article.

Symmetry Properties
Next we introduce fully symmetric sets and related symmetry concepts, before explaining in Sec.2.3 how Fig. these can be exploited for computational simplification in the standard Bayesian cubature method.Note that, in what follows, no symmetry properties are needed for the integrand f † itself.

Fully Symmetric Point Sets
Given a vector λ λ λ ∈ R m , the fully symmetric set [λ λ λ] ⊂ R m generated by this vector is defined as the point set consisting of all vectors that can be obtained from λ λ λ via coordinate permutations and sign changes.That is, where Π m and S m stand for the collections of all permutations of the first m positive integers and of all vectors of the form s s s = (s 1 , . . ., s m ) with each s i either 1 or −1.
Here λ λ λ is called a generator vector and its individual elements are called generators.Alternatively, we can write the fully symmetric set in terms of permutation and sign change matrices: [λ λ λ] = where Perm SC m is the collection of m × m matrices having exactly one non-zero element on each row and column, this element being either 1 or −1.Some fully symmetric sets are displayed in Fig. 1.The cardinality of a fully symmetric set [λ λ λ], generated by a generator vector λ λ λ containing r 0 zero generators and l distinct non-zero generators with multiplicities r 1 , . . ., r l , is See Table 1 for a number of examples in low dimensions.For λ λ λ ∈ R m having non-negative elements, we occasionally need the concept of a non-negative fully symmetric set [λ λ λ] + := P P P ∈Permm where Perm m ⊂ Perm SC m is the collection of m × m permutation matrices.

Fully Symmetric Domains, Kernels, and Measures
At this point we introduce several related definitions; these enable us later to state precisely which symmetry assumptions are being exploited.
Domains.It will be assumed in the sequel that M ⊂ R m is a fully symmetric domain, meaning that every fully symmetric set generated by a vector from M is contained in M : [λ λ λ] ⊂ M whenever λ λ λ ∈ M .Equivalently, M = P P P M = {P P Px x x : , x x x ∈ M } for any P P P ∈ Perm SC m .Most popular domains, such as the whole of R m , hypercubes of the form [−a, a] m (from which e.g. the unit hypercube can be obtained by simple translation and scaling), balls and spheres, are fully symmetric.Kernels.A kernel k : M × M → R defined on a fully symmetric domain M is said to be a fully symmetric kernel if k(P P Px x x, P P Px x x ) = k(x x x, x x x ) for any P P P ∈ Perm SC m .Basic examples of fully symmetric kernels include isotropic kernels and products and sums of isotropic one-dimensional kernels.
Measures.A measure ν on a fully symmetric domain M is a fully symmetric measure if it admits a density p ν with respect to the Lebesgue on M which satisfies p ν (x x x) = p ν (P P Px x x) for any P P P ∈ Perm SC m .Note that this is a narrow class of measures and a relaxation of this assumption is discussed in Sec. 5.

Fully Symmetric Cubature Rules
The linear functional µ(f † ) = N i=1 w i f † (x x x i ) is said to be fully symmetric cubature rule if its point set can be written as a union of a number J ∈ N of fully symmetric sets [λ λ λ 1 ], . . ., [λ λ λ J ] and all points in each [λ λ λ j ] are assigned an equal weight.That is, a fully symmetric cubature rule is of the form for some weights w w w FS ∈ R J and generator vectors λ λ λ 1 , . . ., λ λ λ J ∈ M .Because this structure typically greatly simplifies design of the weights, many classical polynomial-based cubature rules are fully symmetric [40,20,21,37], including certain sparse grids [43,44] 2.3 Fully Symmetric Bayesian Cubature The central aim of this article is to derive generalisations for the Bayes-Sard and multi-output Bayesian cubatures of the following result from [30], originally developed only for the standard Bayesian cubature method.
Theorem 1 Consider the standard Bayesian cubature method based on a domain M , measure ν, and kernel k that are each fully symmetric and fix the mean function to be m ≡ 0. Suppose that the point set is a union of J fully symmetric sets: X = ∪ J j=1 [λ λ λ j ] for some distinct generator vectors Λ = {λ λ λ 1 , . . ., λ λ λ J } ⊂ M .Then the output of the standard Bayesian cubature method can be expressed in the fully symmetric form The weights w w w Λ ∈ R J are the solution to the linear system S S Sw w w Λ = k k k ν,Λ of J equations, where Theorem 1 demonstrates the principal idea; that one can exploit symmetry to reduce the number of kernel evaluations needed in the standard Bayesian cubature method from N 2 to N J and decrease the number of equations in the linear system that needs to be solved from N to J. Since J is typically considerably smaller than N = J j=1 #[λ λ λ j ], using fully symmetric sets results in a substantial reduction in computational cost.Numerical examples in [30] showed that sets containing up to tens of millions of points become feasible in the standard Bayesian cubature method when symmetry exploits are used.The aim of this paper is to generalise these techniques to the important cases of Bayes-Sard cubature (Sec.3) and multi-output Bayesian cubature (Sec.4).
, the condition number of the matrix S S S in Thm. 1 cannot exceed that of K K K X (similar results are available for the matrices in Thms. 2 and 3).This scenario occurs in, for instance, the numerical example of Sec.6.2.To verify the claim, observe that by Lem. 4 S S Sv v v = αv v v implies that the block vector . . .
the matrix S S S is symmetric; therefore its condition number is the ratio of the largest and smallest eigenvalues.It follows that the condition number of S S S must be smaller or equal to that of K K K X .
3 Fully Symmetric Bayes-Sard Cubature In this section we first review the Bayes-Sard cubature method from [31] and then derive a generalisation of Thm. 1 for this method.

Bayes-Sard Cubature
In the standard Bayesian cubature method the mean function m must be a priori specified.This requirement is relaxed in Bayes-Sard cubature [31], where a hierarchical approach is taken instead.Specifically, in Bayes-Sard cubature the prior mean function is given the parametric form is specified.The conditional distribution f N of this field, based as before on data D, is again Gaussian.In particular, when Σ Σ Σ −1 → 0 0 0 (meaning that the prior on θ θ θ becomes improper, or weakly informative 1 ) and assuming that Q ≤ N , the posterior mean and variance take the forms where ) is called the Vandermonde matrix and the vectors α α α and β β β are defined via the linear system For there to exist a unique solution to Eqn. 12, the Vandermonde matrix has to be of full rank.This technical condition, equivalent to the zero function being the only element of π vanishing on X, is known as π-unisolvency of the point set X.The output of the Bayes-Sard cubature method is the posterior marginal distribution of the integral, namely The mean and variance, obtained by integrating Eqns. 10 and 11, are and the weight vectors w w w k X ∈ R N and w w w π X ∈ R Q are the solution to the linear system The Bayes-Sard weights w w w k X , like the standard Bayesian cubature weights, have a worst case interpretation: 1 See [34,48] for slightly different earlier formulations where an improper prior is placed "directly" on θ θ θ.
The Bayes-Sard method has some important theoretical and practical advantages over the standard Bayesian cubature method, which motivate us to study it in detail: -The posterior mean µ N (f † ) is exactly equal to the integral In particular, if π contains a non-zero constant function then N i=1 w k X,i = 1 so that the cubature rule is normalised.This can improve the stability of the method in high-dimensional settings [31].In general, if π is the set of polynomials up to a certain order q, then the posterior mean is recognised as a cubature rule of algebraic degree q [13, Def.3.1].
-Given any cubature rule µ(f † ) = N i=1 w i f † (x x x i ) for specified w i ∈ R and x x x i ∈ M , and given any covariance function k, one can find an N -dimensional function space π such that µ N = µ.Furthermore, the posterior standard deviation σ N coincides with the worst-case error of the cubature rule µ in the Hilbert space induced by k [31, Sec.2.4].This demonstrates that any cubature rule can be interpreted as the posterior mean under an infinitude of prior models, providing a bridge between classical and Bayesian cubature methods.
The dimension of the linear system in ( 14) is N + Q.Thus the computational cost associated with the Bayes-Sard method is strictly greater than that of standard Bayesian cubature; at least O(N 3 ) in general.It is therefore of considerable practical interest to ask whether symmetry exploits can also be developed for the Bayes-Sard method.

A Symmetry Exploit for Bayes-Sard Cubature
In this section we present a novel result that enables fully symmetric sets to be exploited in the Bayes-Sard cubature method.In what follows we only consider a function space π spanned by even monomials exhibiting symmetries 2 .In practice, we do not believe this to be a significant restriction since polynomials typically serve as a good and functional default and, in fact, one retains considerable freedom in selecting the polynomials, not being restricted to, for example, spaces of all polynomials of at most a given degree.
Here x x x α α α denotes the monomial Our development will require that π α is a union of J α ∈ N non-negative fully symmetric sets in E m 0 .That is, α α α ∈ π α implies P P Pα α α ∈ π α for any permutation matrix P P P ∈ Perm m and there exist distinct α α α 1 , . . ., α α α Jα ∈ E m 0 such that To prove a Bayes-Sard analogue of Thm. 1, we need four simple lemmas: Lemma 1 Suppose that M and ν are each fully symmetric.If α α α ∈ E m 0 then I(x x x α α α ) = I(x x x P P P α α α ) for any Proof First, observe that (P P P −1 x x x) α α α = x x x P P P α α α .By a change of variables and the fact that permutation matrices have unit determinants, = |det P P P −1 | P P P M (P P P −1 x x x) α α α p ν (P P P −1 x x x) dx x x = M x x x P P P α α α p ν (x x x) dx x x = I(x x x P P P α α α ) for any α α α ∈ N m 0 .
Lemma 2 Suppose that M , ν, and k are each fully symmetric and let λ λ λ ∈ M .Then Proof This was established in [30,Lem. 3.5].
Proof For any α α α ∈ E m 0 , x x x ∈ [λ λ λ], and P P P ∈ Perm SC m , x x x P P P β β β for any P P P ∈ Perm m .Consequently, Lemma 4 Let λ λ λ, λ λ λ ∈ R m and suppose that the kernel k is fully symmetric.Then Proof For any x x x ∈ [λ λ λ] there is P P P x x x ∈ Perm SC m such that x x x = P P P x x x λ λ λ.Therefore k(P P P −1 x x x P P P x x x λ λ λ, P P P −1 k(λ λ λ, P P P −1 and the claim follows from the fact that [P P Pλ λ λ ] = [λ λ λ ] for any P P P ∈ Perm SC m . We are now ready to prove the main result of this section.Theorem 2 establishes sufficient conditions for the Bayes-Sard cubature rule to be fully symmetric and, in that case, provides an explicit simplification of the its output (13).
Then the output of the Bayes-Sard cubature method can be expressed in the fully symmetric form where , and w w w σ Λ ∈ R J are the weights w w w Λ in Thm. 1.The weights w w w k Λ ∈ R J and w w w π A ∈ R Jα form the solution to the linear system of J + J α equations, where and Proof The linear system ( 17) is equivalent to These two groups of equations are equivalent, respectively, to the N equations (Lems. 2 and 4 and Eqn.15) and to the Q equations (Lem. 1 and Eqn.16) for α α α ∈ π α .From these two equations we recognise that solve the full Bayes-Sard weight system The expression for the Bayes-Sard variance σ 2 N can be obtained by first recognising that the unique elements of K K K −1 X k k k ν,X are precisely the weights w w w Λ in Thm. 1, here denoted w w w σ Λ .Then we compute that, when expanded, yields the result.

Remark 2
The polynomial space π could be appended with fully symmetric collections of odd polynomials (i.e., by using additional basis functions x x x β β β , β β β ∈ [α α α] + for α α α / ∈ E m 0 ).However, by doing this one gains nothing since the weights in w w w π A corresponding to these basis functions turn out to be zero.This is quite easy to see from the easily proven facts that x x x∈[λ λ λ] x x x β β β = 0 and Just like Thm. 1 for the standard Bayesian cubature, Thm. 2 reduces the number of kernel and basis function evaluations from roughly N 2 + Q 2 to N J + N J α and the size of the linear system that needs to be solved from N +Q to J +J α .Typically, this translates to a significant computational speed-up; see Sec. 6.1 for a numerical example involving point sets of up to N = 179, 400.Such results could not realistically be obtained by direct solution of the original linear system in Eqn.14.

Fully Symmetric Multi-Output Bayesian Cubature
In this section we review the multi-output Bayesian cubature method recently proposed by Xi et al. [63] and show how to exploit fully symmetric sets in reducing computational complexity of this method.

Multi-Output Bayesian Cubature
One often needs to integrate a number of related integrands, f † 1 , . . ., f † D : M → R. It is of course trivial to treat these as a set of D independent integrals and apply either the standard Bayesian or Bayes-Sard cubature method to approximate each integral.However, in many cases the relationship between the integrands can be explicitly modelled and leveraged.
Such a setting can be handled by modelling a single vector-valued function f f f † := (f † 1 , . . ., f † D ) : M → R D as a vector-valued Gaussian field; full details can be found in [65].In this case, the data D consist of evaluations at points X d = {x x x d1 , . . ., x x x dN } ⊂ M for each d = 1, . . ., D. In this section we denote X = {X d } D d=1 .The assumption that each integrand is evaluated at N points is made only for notational simplicity; all results can be easily modified to accommodate different numbers of points for each integrand.Evaluations of each integrand are concatenated into the vector In multi-output Bayesian cubature the integrand is modelled as a vector-valued Gaussian field f f f ∈ R D characterised by vector-valued mean function m m m : M → R D and matrix-valued covariance function k k k : M × M → R D×D .For notational simplicity, the prior mean function is fixed at m m m ≡ 0 0 0. The conditional distribution f f f N of this field, based on the data D = (X, f f f † X ), is also Gaussian with mean and covariance functions Here, in contrast to Eqns. 3 and 4, all objects are of extended dimensions: where k k k X d (x x x) and K K K dq X d ,Xq are the N × D and N × N and matrices The output of the multi-output (or vector-valued ) Bayesian cubature method is a D-dimensional Gaussian random vector: where Equivalently, the posterior mean and variance can be written in terms of the weights where W W W d ∈ R N ×D .For example, mean of the dth integral then takes the form If the dth integrand is modelled as independent of all the other integrands, the posterior mean ( 21) reduces to the standard Bayesian cubature posterior mean (6).

Separable Kernels
The structure of matrices appearing in the multi-output Bayesian cubature equations can be simplified when the multi-output kernel is separable.This means that there is a positive-definite B B B ∈ R D×D such that for some positive-definite kernel c : M × M → R. The matrices K K K X and k k k ν,X now assume the simplified forms However, even with the simplified structure afforded by the use of separable kernels, the implementation of multi-output Bayesian cubature remains computationally challenging, calling for some (DN ) 2 kernel evaluations and solution to a linear system of dimension DN .This is problematic if a large number of integrands is to be handled simultaneously.The next section demonstrates how fully symmetric points sets can be exploited to reduce this cost.
Remark 3 Note that using the same point set X for each integrand yields immediate computational simplification, since in this case the above matrices can be written as Kronecker products: However, this case is of little practical interest because, by the properties of the Kronecker product, where w w w X ∈ R N are the standard Bayesian cubature weights (8) for the covariance function c and points X [63, Supplements B and C.1].That is, the integral estimates µ µ µ N (f f f † ) reduce to those given by the standard Bayesian cubature method applied independently to each integral.

A Symmetry Exploit for Multi-Output Bayesian Cubature
Our main result in this section is a second generalisation of Thm. 1, in this case for the multi-output Bayesian cubature method.
Theorem 3 Consider the multi-output Bayesian cubature method based on a separable matrix-valued kernel k k k.Let the domain M , measure ν, and uni-output kernel c each be fully symmetric and fix the mean function to be m m m ≡ 0 0 0. Suppose that each X d is a union of J fully symmetric sets: X d = ∪ J j=1 [λ λ λ dj ] for some Λ d = {λ λ λ d1 , . . ., λ λ λ dJ } ⊂ M such that n Λ j = #[λ λ λ dj ] does not depend on d and, consequently, #X d = N for each d = 1, . . ., D. Then the output of the multioutput Bayesian cubature method can be expressed in the fully symmetric form where B B B diag j is the diagonal D × D matrix formed out of the jth row of B B B and The weight matrix is the solution to the linear system S S SW W W Λ = k k k ν,Λ , where for (d, d , j) ∈ {1, . . ., d} 2 × {1, . . ., J}.In turn, through Lems. 2 and 4, these are equivalent to for (d, d ) ∈ {1, . . ., D} 2 and j = 1, . . ., n Λ j , j = 1, . . ., J.There are a total of D 2 J j=1 n Λ j = D 2 N of these equations.The weights in Eqn.20 are then seen to solve the full matrix equation The expressions for the posterior mean and variance follow from straightforward manipulation of Eqns.18 and 19.
The computational complexity of forming the fully symmetric weight matrix W W W Λ is dominated by the DJN kernel evaluations needed to form S S S and the inversion of this DJ × DJ matrix.Due to J often being orders of magnitude smaller than N , these tasks remain feasible even for a very large total number of points DN .For example, in Sec.6.2 the result of Thm. 3 is applied to facilitate the simultaneous computation of up to D = 50 integrals arising in a global illumination problem, each integrand being evaluated at up to N = 288 points.Such results can barely be obtained by direct solution of the original linear system in Eqn.20.

Symmetric Change of Measure
The results presented in this article, and those originally described in [30], rely on the assumption that the measure ν is fully symmetric (see Sec. 2.2.2).This is a strong restriction; most measures are not fully symmetric.However, this assumption can be avoided in a relatively straightforward manner, which is now described.
Suppose that M is a fully symmetric domain and that ν is an arbitrary measure, admitting a density p ν , against which the function f † : M → R is to be integrated.Further suppose that there is a fully symmetric measure ν * on M such that ν is absolutely continuous with respect to ν * , and therefore admits a density p ν * such that the Radon-Nikodym derivative dν/dν * = p ν (x x x)/p ν * (x x x) is well-defined.Then the integral of interest can be re-written as an integral with respect to the fully symmetric measure ν * : Note that the existence of the second integral follows from the Radon-Nikodym property and the monotone convergence theorem.Thus, the assumption of a fully symmetric measure ν in the statement of Thms. 1, 2, and 3 is not overly restrictive.This symmetric change of measure technique is demonstrated on a numerical example in Sec.6.3.
Remark 4 Note that the situation here is unlike standard importance sampling (see e.g.Sec.3.3 of [54]), in that the importance distribution ν * is required to be fully symmetric.As such, it seems not obvious how to mathematically characterise an "optimal" choice of ν * .Indeed, any notion of optimality ought also depend on the cubature method that will be used.Nevertheless, obvious constructions (e.g. the choice of ν * as an isotropic centred Gaussian for ν sub-Gaussian and M = R m ) can work rather well.

Results
In this section we assess the performance of the fully symmetric Bayes-Sard and fully symmetric multi-output Bayesian cubature methods based on computational simplifications provided in Thms. 2 and 3. Code for all examples is provided at https://github.com/tskarvone/bc-symmetry-exploits.

Zero Coupon Bonds
This example involves a model for zero coupon bonds that has been used to assess accuracy and robustness of the Bayes-Sard cubature and fully symmetric Bayesian cubature methods in [30,31].

Integration Problem
The integral of interest, arising from Euler-Maruyama discretisation of the Vasicek model, is where r ti are particular Gaussian random variables and ∆t and r t0 are parameters of the integrand.The dimension m = T − 1 of the integrand can be freely selected and the integral admits a convenient closed-form solution; see [26,Sec. 6.1] or [30, Sec.5.5] for a more complete description of this benchmark integral.

Setting
The accuracy of the standard Bayesian cubature and Bayes-Sard cubature methods was compared, for computing the integral (26) in a setting identical to that of [30,Sec. 5.5].In particular, the same parameter values and point set (a sparse grid based on a certain Gauss-Hermite sequence with the origin removed), were used.The kernel was the Gaussian kernel with length-scale > 0: Accuracy of the two cubature methods was assessed for the heuristic length-scale choices = m and = √ m.The linear space π in the Bayes-Sard method, defined by the collection π α of multi-indices, was taken to be π α = {α α α : |α| ≤ r} for either r = 1 (linear) or r = 2 (quadratic) polynomials.The dimension T ranged between 20 and 300.Since the number of points in a sparse grid depends on the dimension, the maximal N used was 179,400.Thms. 1 and 2 facilitated the computation, respectively, of the standard Bayesian cubature and Bayes-Sard cubature method.Note that, in the results that are presented next, even though N increases, no convergence (or necessarily monotonicity of the error) is to be expected because the integration problem becomes more difficult as T is increased.Fig. 2: Numerical computation of the integral (26) using fully symmetric Bayesian cubature (BC) and Bayes-Sard cubature (BSC) for different choices of the length-scale and polynomial degree r of the parameteric function space π used in BSC.

Results
The results are depicted in Fig. 2. We observe that Bayes-Sard method is much less sensitive to the length-scale choice compared to the standard Bayesian cubature method.For instance, the selection = √ m has Bayes-Sard outperform the standard Bayesian cubature by roughly three orders of magnitude.It is also clear that, in this particular problem, the addition of more polynomial basis functions can significantly improve the integral estimates.
Results at this scale were not possible to obtain in the earlier work of Karvonen et al. [31], where the largest value of N considered was 5,000.In contrast, our result in Thm. 2 enabled point sets of size up to N = 179, 400 to be used.The computational time required to produce the the results for the Bayes-Sard cubature in the most demanding case, T = 300 and m = 2, was on the order of 2.5 minutes on a standard laptop computer.However, this can be mostly attributed a sub-optimal algorithm for generating the sparse grid.Indeed, after the points had been obtained it took roughly one second to compute the Bayes-Sard weights.

Global Illumination Integrals
Next we considered the multi-output Bayesian cubature method, together with the symmetry exploit developed in Sec.4.1, to compute a collection of closely related integrals arising in a global illumination context.This is a popular application of Bayesian cubature methods; see [9,38,39,8,63] for existing work.In particular, multioutput Bayesian cubature was applied to the problem that we consider below in [63], where D = 5 integrals were simultaneously computed.Through computational simplifications obtained by using fully symmetric sets, in what follows we simultaneously compute up to D = 50 integrals, a ten-fold improvement.

Integration Problem
Global illumination is concerned with the rendering of glossy objects in a virtual environment [18].The integration problem studied here is to compute the outgoing radiance L 0 (ω ω ω o ) in the direction ω ω ω o , for different values of the observation angle ω ω ω o .In practical terms, this represents the amount of light travelling from the object to an observer at an observation angle ω ω ω o .The need for simultaneous computation for different ω ω ω o can arise when the observation angle is rapidly changing, for example as the player moves in a video game context.The outgoing radiance is given by the integral with respect to the uniform (i.e.Riemannian) measure ν on the unit sphere Here L e (ω ω ω o ) is the amount of light emitted by the object itself, essentially a constant, whilst L i (ω ω ω i ) is the amount of light being reflected from the object, originating from angle ω ω ω i ∈ S 2 .That reflection is impossible from a reflexive angle is captured by the term [ω ω ω T i n n n] + := max{0, ω ω ω T i n n n} with n n n the unit normal to the Fig. 3: The mean (green; Eqn.29) and maximal (red; Eqn.30) relative integration errors obtained when simultaneously approximating D global illumination integrals (see Eqn. 28) using fully symmetric Bayesian cubature (BC) and fully symmetric multi-output Bayesian cubature (MOBC).Here J = 3 and J = 6 random generator vectors were used to produce a fully symmetric point set of size N = 144 (for J = 3) and N = 288 (for J = 6).The displayed results have been averaged over 100 independent realisations of the point sets.
object.That light is reflected less efficiently at larger incidence angles is captured by a bidirectional reflectance distribution function Evaluation of L i (ω ω ω i ) involves a call to an environment map (in this case, a picture of a lake in California; see [8]), which is associated with a computational communication cost.The illumination integral must be computed for each of the red, green, and blue (RGB) colour channels; we treat the integration problems corresponding to different colour channels as statistically independent.

Setting
The performance of the standard Bayesian cubature and multi-output Bayesian cubature methods was assessed on a collection of D related integrals, where D was varied up to a maximum of D max = 50.The integrands were indexed by observation angles ω ω ω d o with a fixed azimuth and elevation ranging uniformly on the interval [ π 4 − π 24 , π 4 + π 24 ]: To formulate the problem in the multi-output framework, we define the associated integrands The aim is then to compute the integrals In our experiments a separable vector-valued covariance function was used, defined as in Eqn.22 with This prior structure is identical to that used in [8,63] and corresponds to assuming that the integrand belongs to a Sobolev space of smoothness3 2 .The kernel c has tractable kernel means: c ν (x x x) = 4 3 for every x x x ∈ S 2 and c ν,ν = 4  3 .In order to exploit Thms. 1 and 3, we need to restrict to fully symmetric point sets on S 2 .To obtain such sets we followed the method proposed in [30,Sec. 5.3].That is, we draw, for each d = 1, . . ., D, either J = 3 or J = 6 independent generator vectors from the uniform distribution ν on S 2 and use these to generate distinct fully symmetric point sets X 1 , . . ., X Dmax ⊂ S 2 .Eqn. 9 implies that 3 N = 3 × 48 = 144 or N = 6 × 48 = 288.This approach to generation of a point set was selected for its simplicity, our main focus being on the multi-output framework and a large number of integrals D. Alternative point sets on S 2 are numerous, such as rotated adaptations of numerically computed approximations to the optimal quasi Monte Carlo designs developed in [5].

Results
The results are depicted in Fig. 3 in terms of the relative integration error for each RGB colour channel.For D = 1, . . ., D max , define the vector-valued functions The figure shows the improvement in integration accuracy when D increases and more integrands are considered simultaneously.Displayed are the mean and maximal max d=1,...,D relative errors for D = 1, . . ., D max .For comparison, the figure also contains results for the standard Bayesian cubature method, applied separately to each of the unioutput integrands f † d .Each of the reference integrals I(f † d ) was computed using brute force Monte Carlo, with 10 million points used.
In accordance with [63], we observed that the multioutput Bayesian cubature method is superior to the standard one already when D = 5.The performance gain of the multi-output method keeps increasing when more integrands are added but is ultimately bounded.This is reasonable since integrands for wildly different ω ω ω d o can convey little information about each other.For the smallest values of D the multi-output method is less accurate that the standard Bayesian cubature method.This can be explained by potential non-uniform covering of the unit sphere when the total number DJ of fully symmetric sets is low (e.g., when some of the generator vectors happen to cluster, the fully symmetric sets they generated do not greatly differ, so that less information is obtained on the integrand).
Computational times remained reasonable throughout this experiment.For example, without symmetry exploits, the case D = 50 and J = 6 would require (DN ) 2 = 207,360,000 kernel evaluations and inversion of a 14,400-dimensional matrix while Thm. 3 reduces these numbers, respectively, to DN J = 86,400 and 300.This suggests that with more carefully selected fully symmetric point sets it may be possible to realise the desire expressed in [63,Sec. 4] of simultaneous computation of up to thousands of related integrals.

Symmetric Change of Measure Illustration
The purpose of this final experiment is to briefly illustrate the symmetric change of measure technique, proposed in Sec. 5. To limit scope we consider applying this technique in conjunction with the fully symmetric standard Bayesian cubature method (i.e., Thm. 1).

Integration Problem
Let µ µ µ f ∈ R 8 and Σ Σ Σ f ∈ R 8×8 be a vector and a positivedefinite matrix.Consider integration over R 8 of the function with respect to a Gaussian measure ν = N(µ µ µ ν , Σ Σ Σ ν ) with a non-zero mean µ µ µ ν ∈ R 8 and a positive-definite covariance matrix Σ Σ Σ ν ∈ R 8×8 .For this illustration we took  so that the true value of the integral was approximately 0.0280.Note that ν is not a fully symmetric measure.Therefore that Thm. 1 cannot be applied as such.

Setting
To perform a symmetric change of measure, we note that ν is sub-Gaussian and therefore consider a centred isotropic Gaussian distribution as the symmetric measure: ν * = N(µ µ µ ν * , Σ Σ Σ ν * ) with µ µ µ ν * = 0 0 0 and Σ Σ Σ ν * = I I I.This yields a new integrand f † * according to Eqn. 25.Note that ν is absolutely continuous with respect to ν * ; moreover the Radon-Nikodym derivative dν/dν * is in fact bounded.The new integral is with respect to the symmetric measure ν * and thus Thm. 1 can be applied.
For computation we used the Gaussian kernel ( 27) with a length-scale = 0.8 and the Gauss-Hermite sparse grid [30,Sec. 4.2] with the mid-point removed.
Note that the resulting point sets are not nested for different N .

Results
The results are depicted in Fig. 4 in terms of the relative integration error.The integral estimates µ N (f † * ) converges to I(f † ).Moreover, the qualitative characteristics of this convergence appear similar to the experiments reported in [30,Sec. 5.4], which were limited to fully symmetric measures ν so that a change of measure was not required.Thus, based on this example at least, the symmetric change of measure technique appears to be a promising strategy to generalise the results in Thms. 1, 2 and 3.The largest point sets considered contained J = 187 fully symmetric sets, which correspond to a point set of size N ≈ 2.5 × 10 6 .

Discussion
There is increasing interest in the use of Bayesian methods for numerical integration [8].Bayesian cubature methods are attractive due to analytic and theoretical tractability of the underlying Gaussian model.However, these method are also associated with a computational cost that is cubic in the number of points, N , and moreover the linear systems that must be inverted are typically ill-conditioned.
The symmetry exploits developed in this work circumvent the need for large linear systems to be solved in Bayesian cubature methods.In particular, we presented novel results for Bayes-Sard cubature [31] and multioutput Bayesian cubature [63] that make it possible to apply these methods even for extremely large datasets or when there are many function to be integrated.In conjunction with the inherent robustness of the Bayes-Sard cubature method [31], this results in a highly reliable probabilistic integration method that can be applied even to integrals that are relatively high-dimensional.
Three extensions of this work are highlighted: First, the combination of multi-output and Bayes-Sard methods appears to be a natural extension and we expect that symmetry properties can similarly be exploited for this method.This could lead to promising procedures for integration of collections of closely related high-dimensional functions appearing in, for example, financial applications [26].Similarly, our exploits should extend to the Student's t based Bayesian cubatures proposed in [52].Second, the investigation of optimality criteria for the symmetric change of measure technique in Sec. 5 remains to be explored.Third, although we focussed solely on computational aspects, the important statistical question of how to ensure Bayesian cubature methods produce output that is well-calibrated remains to some extent unresolved.As discussed in [30], it appears that symmetry exploits do not easily lend themselves to selection of kernel parameters, for instance via cross-validation or maximisation of marginal likelihood 4 .A potential, though somewhat heuristic, way to proceed might be to exploit the concentration of measure phenomenon [36] or low effective dimensionality of the integrand [61] in order to identify a suitable data subset on which kernel parameters can be calibrated more easily or a priori.

Fig. 4 :
Fig. 4: Relative error in numerical integration of the function (31) using the fully symmetric Bayesian cubature method based on a symmetric change of measure.