# Probabilistic linear solvers: a unifying view

- 262 Downloads
- 1 Citations

## Abstract

Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behaviour of the conjugate gradient method as a prototypical iterative method. In this work, surprisingly general conditions for equivalence of these disparate methods are presented. We also describe connections between probabilistic linear solvers and projection methods for linear systems, providing a probabilistic interpretation of a far more general class of iterative methods. In particular, this provides such an interpretation of the generalised minimum residual method. A probabilistic view of preconditioning is also introduced. These developments unify the literature on probabilistic linear solvers and provide foundational connections to the literature on iterative solvers for linear systems.

## Keywords

Probabilistic linear solvers Projection methods Iterative methods Preconditioning## 1 Introduction

*probability measures*, constructed to quantify uncertainty due to terminating the algorithm before the solution has been identified completely. On the surface the approaches in these two works appear different: in the matrix-based inference (MBI) approach of Hennig (2015), a posterior is constructed on the matrix \(A^{-\!1}\), while in the solution-based inference (SBI) method of Cockayne et al. (2018) a posterior is constructed on the solution vector \(\varvec{x}^*\).

These algorithms are instances of *probabilistic
numerical methods* (PNM) in the sense of Hennig et al. (2015) and Cockayne et al. (2017). PNM are numerical methods which output
posterior distributions that quantify uncertainty due to discretisation error. An
interesting property of PNM is that they often result in a posterior distributions
whose mean element coincides with the solution given by a classical numerical method
for the problem at hand. The relationship between PNM and classical solvers has been
explored for integration (e.g. Karvonen and Sarkka 2017), ODE solvers (Schober et al. 2014, 2019; Kersting
et al. 2018) and PDE solvers (Cockayne
et al. 2016) in some generality. For
linear solvers, attention has thus far been restricted to the conjugate gradient
(CG) method. Since CG is but a single member of a larger class of iterative solvers,
and applicable only if the matrix *A* is symmetric
and positive definite, extending the probabilistic interpretation is an interesting
endeavour. Probabilistic interpretations provide an alternative perspective on
numerical algorithms and can also provide extensions such as the ability to exploit
noisy or corrupted observations. The probabilistic view has also been used to the
develop new numerical methods (Xi et al. 2018), and *Bayesian* PNM can be
incorporated rigorously into pipelines of computation (Cockayne et al. 2017).

*Preconditioning*—mapping
Eq. (1) to a better conditioned system
with the same solution—is key to the fast convergence of iterative linear solvers,
particularly those based upon Krylov methods (Liesen and Strakos 2012). The design of preconditioners has been
referred to as “a combination of art and science” (Saad 2003, p. 283). In this work, we also provide a
new, probabilistic interpretation of preconditioning as a form of prior
information.

### 1.1 Contribution

- 1.
It is shown that, for particular choices of the generative model, matrix-based inference (MBI) and solution-based inference (SBI) can be equivalent (Sect. 2).

- 2.
A general probabilistic interpretation of projection methods (Saad 2003) is described (Sect. 3.1), leading to a probabilistic interpretation of the generalised minimum residual method (GMRES; Saad and Schultz (1986), Sect. 6). The connection to CG is expanded and made more concise in Sect. 5.

- 3.
A probabilistic interpretation of preconditioning is presented in Sect. 4.

### 1.2 Notation

For a symmetric positive-definite matrix \(M \in \mathbb {R}^{d\times d}\) and two vectors \(\varvec{v}, \varvec{w} \in \mathbb {R}^d\), we write \(\langle \varvec{v}, \varvec{w} \rangle _M = \varvec{v}^{\top }M \varvec{w}\) for the inner product induced by \(M\), and \(\Vert \varvec{v}\Vert _M^2 = \langle \varvec{v}, \varvec{v} \rangle _M\) for the corresponding norm.

A set of vectors \(\varvec{s}_1, \dots , \varvec{s}_m\) is called \(M\)-*orthogonal* or *M*-*conjugate* if
\(\langle \varvec{s}_i, \varvec{s}_j \rangle _M = 0\) for \(i\ne j\), and \(M\)-*orthonormal* if, in
addition, \(\Vert \varvec{s}_i\Vert _M = 1\) for \(1\le i\le m\).

*vectorisation operator*\(\text {vec} : \mathbb {R}^{d\times d} \rightarrow \mathbb {R}^{d^2}\) stacks the rows

^{1}of

*A*into one long vector:

*Kronecker product*of two matrices \(A, B \in \mathbb {R}^{d\times d}\) is \(A \otimes B\) with \([A\otimes B]_{(ij),(k\ell )} = [A]_{ik}[B]_{j\ell }\). A list of its properties is provided in “Appendix A”.

*m*generated by the matrix \(A\in \mathbb {R}^{d\times d}\) and the vector \(\varvec{b}\in \mathbb {R}^d\) is

*m*-dimensional linear subspace of \(\mathbb {R}^d\) with basis \(\{\varvec{s}_1, \dots , \varvec{s}_m\}\). Then, for a vector \(\varvec{v} \in \mathbb {R}^d\) and a matrix \(M \in \mathbb {R}^{d\times d}\), let

## 2 Probabilistic linear solvers

Several probabilistic frameworks describing the solution of
Eq. (1) have been constructed in recent
years. They primarily differ in the subject of inference: SBI approaches such as
Cockayne et al. (2018), of which*BayesCG* is an example, place a prior
distribution on the solution \(\varvec{x}^*\) of Eq. (1). Conversely,
the MBI approach of Hennig (2015) and
Bartels and Hennig (2016) places a prior
on \(A^{-\!1}\), treating the action of the inverse operator as an unknown to be
inferred.^{2} This section reviews each approach and adds some new insights. In
particular, SBI can be viewed as strict special case of MBI (Sect. 2.4).

Throughout this section, we will assume that the search directions \(S_m\) in \(S_m^\top A\varvec{x}^*= S_m^\top \varvec{b}\) are independent of \(\varvec{x}^*\). Generally speaking, this is not the case for projection methods, in which the solution space often depends strongly on \(\varvec{b}\), as described in Sects. 5 and 6. This disconnect is the source of the poor uncertainty quantification reported in Cockayne et al. (2018) and shown also to hold for the methods in this work in Sect. 6.4. This will not be examined in further detail in this work, though it remains an important area of development for probabilistic linear solvers.

### 2.1 Background on Gaussian conditioning

The propositions in this section follow from the following two classic properties of Gaussian distributions.

### Lemma 1

### Lemma 2

### 2.2 Solution-based inference

To phrase the solution of Eq. (1) as a form of probabilistic inference, Cockayne et al.
(2018) consider a Gaussian prior
over the solution \(\varvec{x}^*\), and condition on observations provided by a set of *search directions*\(\varvec{s}_1, \dots , \varvec{s}_m\), \(m < d\). Let \(S_m \in \mathbb {R}^{d \times m}\) be given by \(S_m = [\varvec{s}_1, \ldots , \varvec{s}_m]\), and let information be given by \(\varvec{y}_m:=S_m^{\top }A\varvec{x}^*=S_m^{\top }\varvec{b}\). Since the information is a linear projection of
\(\varvec{x}^*\), the posterior distribution is a Gaussian distribution on
\(\varvec{x}^*\):

### Lemma 3

The following proposition establishes an optimality property of the posterior mean \(\varvec{x}_m\). This is a relatively well-known property of Gaussian inference, which will prove useful in subsequent sections.

### Proposition 4

### Proof

### 2.3 Matrix-based inference

*right*-multiplying

^{3}\(A\) with \(S_m\), i.e. \(Y_m = AS_m\). Note that

### Lemma 5

^{4}Consider the prior

The Kronecker structure of the prior covariance matrix in
Eq. (4) is by no means the only
option that facilitates tractable inference.^{5} However, in the absence of the literature exploring other approaches
within MBI, we will assume throughout that MBI refers to the use of the
Kronecker produce prior covariance.

### 2.4 Equivalence of MBI and SBI

In practice, Hennig (2015) notes that inference on \(A^{-\!1}\) should be performed only implicitly, avoiding the \(d^2\) storage cost and the mathematical complexity of the operations involved in Lemma 5. This raises the question of when MBI is equivalent to SBI. Although, based on Lemma 1, one might suspect SBI and MBI to be equivalent, in fact the posterior from Lemma 5 is structurally different to the posterior in Lemma 3: after projecting into solution space, the posterior covariance in Lemma 5 is a scalar multiple of the matrix \(\varSigma _0\), which is not the case in general in Lemma 3.

*left*-multiplications of

*A*. We will refer to the observation model of Eq. (3) as

*right-multiplied information*, and to Eq. (6) as

*left-multiplied information*.

### Proposition 6

### Proof

See “Appendix B”. \(\square \)

The first of the two conditions requires that the prior mean on the matrix inverse be consistent with the prior mean on the solution, which is natural. The second condition demands that, after projection into solution space, the relationship between the rows of \(A^{-\!1}\) modelled by \(W_0\) does not inflate the covariance \(\varSigma _0\). Note that this condition is trivial to enforce for an arbitrary covariance \(\bar{W_0}\) by setting \(W_0 = (\varvec{b}^{\top }\bar{W_0} \varvec{b})^{-1} \bar{W_0}\).

### 2.5 Remarks

The result in Proposition 6 shows that any result proven for SBI applies immediately to MBI with left-multiplied observations. Though MBI has more model parameters than SBI, there are situations in which this point of view is more appropriate. Unlike in SBI, the information obtained in MBI need not be specific to a particular solution vector \(\varvec{x}^*\) and thus can be propagated and recycled over several linear problems, similar to the notion of subspace recycling (Soodhalter et al. 2014). Secondly, MBI is able to utilise both left- and right-multiplied information, while SBI is restricted to left-multiplied information. This additional generality may prove useful in some applications.

## 3 Projection methods as inference

This section discusses a connection between probabilistic numerical methods for linear systems and the classic framework of projection methods for the iterative solution of linear problems. Section 3.1 reviews this established class of solvers, while Sect. 3.2 presents the novel results.

### 3.1 Background

Many iterative methods for linear systems, including CG and GMRES,
belong to the class of projection methods (Saad 2003, p. 130f.). Saad describes a projection method as an
iterative scheme in which, at each iteration, a solution vector \(\varvec{x}_m\) is constructed by projecting \(\varvec{x}^*\) into a *solution space*\({\mathbb {X}}_m\subset \mathbb {R}^d\), subject to the restriction that the residual \(\varvec{r}_m = \varvec{b} - A\varvec{x}_m\) is orthogonal to a *constraint
space*\({\mathbb {U}}_m\subset \mathbb {R}^d\).

*m*fixed and determined in advance. For CG, the spaces are \({\mathbb {U}}_m = {\mathbb {X}}_m = K_m(A, \varvec{b})\), while for GMRES they are \({\mathbb {X}}_m=K_m(A, \varvec{b})\) and \({\mathbb {U}}_m=AK_m(A, \varvec{b})\) (Saad 2003, Proposition 5.1).

### 3.2 Probabilistic perspectives

In this section, we first show, in Proposition 7, that the conditional mean from SBI after*m* steps corresponds to some projection
method. Then, in Proposition 8 we
prove the converse: that each projection method is also the posterior mean of a
probabilistic method, for some prior covariance and choice of
information.

### Proposition 7

### Proof

Substituting \(U_m = S_m\) and \(X_m = \varSigma _0 A^{\top }S_m\) into Lemma 3 gives Eq. (8), as required. \(\square \)

The converse to this also holds:

### Proposition 8

### Proof

A direct way to enforce the posterior occupying the solution space is by placing a prior on the coefficients \(\varvec{\alpha }\) in \(\varvec{x} = \varvec{x}_0 + X_m \varvec{\alpha }\). Under a unit Gaussian prior \(\varvec{\alpha }\sim \mathcal {N}(\varvec{0}, I)\), the implied prior on \(\varvec{x}\) naturally has the form of Eq. (9).

Including the solution space \(X_m\) in the prior covariance matrix requires it to be specified a
priori. For solvers like CG and GMRES which construct \(X_m\) adaptively, this assumption may appear problematic—a
probabilistic interpretation should use for inference only quantities that have
already been computed. The computation of \(X_m\) could be seen as part of the initialisation, but this requires
that the number of iterations *m* to be fixed a
priori, whereas typically such methods choose *m* adaptively by examining the norm of the residual.^{6} Nevertheless, the proposition provides a probabilistic view for*arbitrary* projection methods and does not
involve \(A^{-\!1}\), unlike the results presented in Hennig (2015), Cockayne et al. (2017).

The above prior is not unique. The next proposition establishes
probabilistic interpretations of projection methods under priors that are
independent of solution- and constraint space, albeit under more restrictive
conditions. The benefit of this is that *m*
need not be fixed a priori.

### Proposition 9

### Proof

A corollary which provides further insight arises when one
considers the *polar decomposition* of*A*. Recall that an invertible matrix*A* has a unique polar decomposition
\(A = PH\), where \(P \in \mathbb {R}^{d \times d}\) is orthogonal and \(H \in \mathbb {R}^{d\times d}\) is symmetric positive definite.

### Corollary 10

*P*arises from the polar decomposition \(A = PH\). Then, under the prior

### Proof

This follows from Proposition 9. Setting \(R = P\) aligns the search directions in Corollary 10 with those in Proposition 9. Since *P* is orthogonal, \(P^{-1} = P^\top \), and since *H* is
symmetric positive definite, \(A^\top P = P^\top A = H\) by definition of the polar decomposition, which gives the
prior covariance required for Proposition 9.\(\square \)

This is an intuitive analogue of similar results in Hennig
(2015) and Cockayne et al.
(2017) which show that CG is
recovered under certain conditions involving a prior \(\varSigma _0 = A^{-\!1}\). When \(A\) is not symmetric and positive definite, it cannot be used as a
prior covariance. This corollary suggests a natural way to select a prior
covariance still linked to the linear system, though this choice is still not
computationally convenient. Furthermore, in the case that \(A\) is symmetric positive definite, this recovers the prior which
replicates CG described in Cockayne et al. (2018). Note that each of *H*
and *P* can be stated explicitly as
\(H = (A^{\top }A)^\frac{1}{2}\) and \(P = A(A^{\top }A)^{-\frac{1}{2}}\). Thus, in the case of symmetric positive-definite *A* we have that \(H = A\) and \(P = I\), so that the prior covariance \(\varSigma _0 = A^{-1}\) arises naturally from this interpretation.

## 4 Preconditioning

*P*is a nonsingular matrix satisfying two requirements:

- 1.
Linear systems \(Pz=c\) can be solved at low computational cost

- 2.
*P*is “close” to*A*in some sense.

*A*, and indeed, many preconditioners are constructed based upon this intuition. One distinguishes between

*right preconditioners*\(P_r\) and

*left preconditioners*\(P_l\), depending on whether they act on

*A*from the left or the right. Two-sided preconditioning with nonsingular matrices \(P_l\) and \(P_r\) transforms implicitly Eq. (1) into a new linear problem

### Proposition 11

### Proof

### Proposition 12

### Proof

If a probabilistic linear solver has a posterior mean which coincides
with a projection method (as discussed in Sect. 3.1), the Propositions 11 and 12 show how to
obtain a probabilistic interpretation of the *preconditioned* version of that algorithm. Furthermore, the
equivalence demonstrated in Sect. 2.4 shows
that the reasoning from Propositions 11
and 12 carries over to MBI based on
left-multiplied observations: right preconditioning corresponds to a change in prior
belief, while left-preconditioning corresponds to a change in observations.

We do not claim that this probabilistic interpretation of preconditioning is unique. For example, when using MBI with right-multiplied observations, the same line of reasoning can be used to show the converse: right preconditioning corresponds to a change in the observations and left preconditioning to a change in the prior.

## 5 Conjugate gradients

Conjugate gradients have been studied from a probabilistic point of view before by Hennig (2015) and Cockayne et al. (2018). This section generalises the results of Hennig (2015) and leverages Proposition 6 for new insights into BayesCG. For this section (but not thereafter), assume that \(A\) is a symmetric and positive definite matrix.

### 5.1 Left-multiplied view

The BayesCG algorithm proposed by Cockayne et al. (2018) encompasses conjugate gradients as a special case. BayesCG uses left-multiplied observations and was derived in the solution-based perspective.

The posterior in Lemma 3 does not immediately result in a practical algorithm as it involves the solution of a linear system based on the matrix \(S_m^{\top }A\varSigma _0 A^{\top }S_m\in \mathbb {R}^{m\times m}\), which requires \({\mathcal {O}}(m^3)\) arithmetic operations. BayesCG avoids this cost by constructing search directions that are \(A\varSigma _0A^\top \)-orthonormal, as shown below, see (Cockayne et al. 2018, Proposition 7).

### Proposition 13

With these search directions constructed, BayesCG becomes an iterative method:

### Proposition 14

The following proposition leverages these results along with Proposition 6 to show that there exists an MBI method which, under a particular choice of prior and with a particular methodology for the generation of search directions, is consistent with CG.

### Proposition 15

*m*iterations, with starting point \(\varvec{x}_0 = A_0^{-1} \varvec{b}\).

### Proof

### 5.2 Right-multiplied view

^{7}that Algorithm 1 reproduces both the search directions and solution estimates from CG under the priorwhere \(\alpha \in \mathbb {R}\setminus \{0\}\), \(\beta \in \mathbb {R}^+\) and Open image in new window denotes the symmetric Kronecker product (see Section A.1). The posterior under such a prior is described in Lemma 2.2 of Hennig (2015) (see Lemma 21), though we note that the sense in which the solution estimate \(\varvec{x}_m\) output by this algorithm is related to the posterior over \(A^{-1}\) differs from that in the previous section, in the sense that \(A^{-\!1}_m \varvec{b}\ne \varvec{x}_m\). (More precisely, \(\varvec{x}_m=A^{-\!1}_m (\varvec{b}-A\varvec{x}_0)-\varvec{x}_0 - (1-\alpha _m) \varvec{d}_m\), as the CG estimate is corrected by the step size computed in line 6. Fixing this rank-1 discrepancy would complicate the exposition of Algorithm 1 and yield a more cumbersome algorithm.) The following proposition generalises this result.

### Proposition 16

### Proof

The proof is extensive and has been moved to “Appendix B”. \(\square \)

Note that, unlike previous propositions, Proposition 16 proposes a prior that does not involve \(A^{-\!1}\) for the case when \(\gamma = 0\).

## 6 GMRES

*generalised minimal residual method*(Saad 2003, Section 6.5) applies to general nonsingular matrices

*A*. At iteration

*m*, GMRES minimises the residual over the affine space \(\varvec{x}_0 + K_m(A,\varvec{ r}_0)\). That is, \(\varvec{r}_m = \varvec{r}_0 - A\varvec{x}_m\) satisfies

We present a brief development of GMRES, starting with Arnoldi’s method (Sect. 6.1) and the GMRES algorithm (Sect. 6.2), before presenting our Bayesian interpretation (Sect. 6.3).

### 6.1 Arnoldi’s method

*A*. Starting with \(\varvec{q}_1 = \varvec{r}_0/\Vert \varvec{r}_0\Vert _2\), Arnoldi’s method recursively computes the orthonormal basis

*upper Hessenberg*matrix \(H_m\) is defined as

### 6.2 GMRES

*d*rows, GMRES solves instead a problem with only \(m+1\) rows,

### 6.3 Bayesian interpretation of GMRES

We now present probabilistic linear solvers with posterior means that coincide with the solution estimate from GMRES.

#### 6.3.1 Left-multiplied view

### Proposition 17

### Proof

Substitute \(R=A\) and \(U_m = AQ_m\) into Proposition 9.\(\square \)

Proposition 17 is intuitive in the context of Proposition 4: setting \(\varSigma _0 = (A^{\top }A)^{-1}\) ensures that the norm being minimised coincides with that of GMRES, as does the solution space \(X_m = A Q_m\). This interpretation exhibits an interesting duality with CG for which \(\varSigma _0=A^{-\!1}\).

Another probabilistic interpretation follows from Proposition 8.

### Corollary 18

Note that Proposition 17 has a posterior covariance which is not practical, as it involves \(A^{-\!1}\). (Cockayne et al. 2017) proposed replacing \(A^{-\!1}\) in the prior covariance with a preconditioner to address this, which does yield a practically computable posterior, but this extension was not explored here. Furthermore, that approach yields unsatisfactorily calibrated posterior uncertainty, as described in that work. Corollary 18 does not have this drawback, but the posterior covariance is a matrix of zeroes.

#### 6.3.2 Right-multiplied view

As for CG in Sect. 5.2, finding interpretations of GMRES that use right-multiplied observations appears to be more difficult.

### Proposition 19

### Proof

### 6.4 Simulation study

## 7 Discussion

We have established many new connections between probabilistic linear solvers and a broad class of iterative methods. Matrix-based and solution-based inference were shown to be equivalent in a particular regime, showing that results from SBI transfer to MBI with left-multiplied observations. Since SBI is a special case of MBI, future research will establish what additional benefits the increased generality of MBI can provide.

We also established a connection between the wide class of projection methods and probabilistic linear solvers. The common practice of preconditioning has an intuitive probabilistic interpretation, and all probabilistic linear solvers can be interpreted as projection methods. While the converse was shown to hold, the conditions under which generic projection methods can be reproduced are somewhat restrictive; however, GMRES and CG, which are among the most commonly used projection methods, have a well-defined probabilistic interpretation. Probabilistic interpretations of other widely used iterative methods can, we anticipate, be established from the results presented in this work.

Posterior uncertainty remains a challenge for probabilistic linear solvers. Direct probabilistic interpretations of CG and GMRES yield posterior covariance matrices which are not always computable, and even when the posterior can be computed, the uncertainty remains poorly calibrated. This is owed to the dependence of the search directions in Krylov methods on \(A\varvec{x}^*= \varvec{b}\), resulting in an algorithm which is not strictly Bayesian. Mitigating this issue without sacrificing the fast rate of convergence provided by Krylov methods remains an important focus for future work.

## Footnotes

- 1.
Stacking the columns is equivalently possible and common. It is associated with a permutation in the definition of the Kronecker product, but the resulting inferences are equivalent.

- 2.
Hennig (2015) also discusses inference over \(A\). This model class will not be discussed further in the present work. It has the disadvantage that the associated marginal on \(\varvec{x}^*\) is nonanalytic, but more easily lends itself to situations with noisy or otherwise perturbed matrix-vector products as observations.

- 3.
- 4.
This corrects a printing error in Hennig (2015). The notation has been adapted to fit the context.

- 5.
A diagonal covariance matrix would allow efficient inference, as well.

- 6.
Sometimes

*m*is fixed a priori, due to memory or computation time limits. - 7.
Algorithm 1 is not included in this form in the op.cit.

## Notes

### Acknowledgements

Open access funding provided by Max Planck Society. Ilse Ipsen wact DMS-1760374. Mark Girolami was supported by EPSRC grants [EP/R034710/1, EP/R018413/1, EP/R004889/1, EP/P020720/1], an EPSRC Established Career Fellowship EP/J016934/3, a Royal Academy of Engineering Research Chair, and The Lloyds Register Foundation Programme on Data Centric Engineering. Philipp Hennig was supported by an ERC grant [757275/PANAMA].

## References

- Bartels, S., Hennig, P.: Probabilistic approximate least-squares. In: Proceedings of Artificial Intelligence and Statistics (AISTATS) (2016)Google Scholar
- Cockayne, J., Oates, C., Sullivan, T.J., Girolami, M.: Probabilistic numerical methods for partial differential equations and Bayesian inverse problems. arXiv:1605.07811 (2016)
- Cockayne, J., Oates, C., Sullivan, T.J., Girolami, M.: Bayesian probabilistic numerical methods. 1702.03673 (2017)
- Cockayne, J., Oates, C., Ipsen, I.C.F., Girolami, M.: A Bayesian conjugate gradient method. arXiv:1801.05242 (2018)
- Diaconis, P., Shahshahani, M.: The subgroup algorithm for generating uniform random variables. Probab. Eng. Inf. Sci.
**1**(01), 15 (1987). https://doi.org/10.1017/s0269964800000255 Google Scholar - Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences, 4th edn. Johns Hopkins University Press, Baltimore (2013)Google Scholar
- Hennig, P.: Probabilistic interpretation of linear solvers. SIAM J. Optim.
**25**(1), 234–260 (2015). https://doi.org/10.1137/140955501 Google Scholar - Hennig, P., Osborne, M.A., Girolami, M.: Probabilistic numerics and uncertainty in computations. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci.
**471**, 20150142 (2015)Google Scholar - Karvonen, T., Sarkka, S.: Classical quadrature rules via gaussian processes. In: IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE (2017). https://doi.org/10.1109/mlsp.2017.8168195
- Kersting, H., Sullivan, T.J., Hennig, P.: Convergence rates of Gaussian ODE filters. arXiv:1807.09737, 7 (2018)
- Liesen, J., Strakos, Z.: Krylov Subspace Methods. Principles and Analysis. Oxford University Press, Oxford (2012). https://doi.org/10.1093/acprof:oso/9780199655410.001.0001 Google Scholar
- Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (1999)Google Scholar
- Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2nd edn. SIAM, Philadelphia (2003)Google Scholar
- Saad, Y., Schultz, M.H.: GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput.
**7**(3), 856–869 (1986). https://doi.org/10.1137/0907058 Google Scholar - Schober, M., Duvenaud, D., Hennig, P.: Probabilistic ODE solvers with Runge–Kutta means. In: Advances in Neural Information Processing Systems, vol. 27, pp. 739–747. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5451-probabilistic-ode-solvers-with-runge-kutta-means.pdf
- Schober, M., Särkkä, S., Hennig, P.: A probabilistic model for the numerical solution of initial value problems. Stat. Comput.
**29**(1), 99–122 (2019)Google Scholar - Soodhalter, K.M., Szyld, D.B., Xue, F.: Krylov subspace recycling for sequences of shifted linear systems. Appl. Numer. Math.
**81**, 105–118 (2014). https://doi.org/10.1016/j.apnum.2014.02.006 Google Scholar - Xi, X., Briol, F.-X., Girolami, M.: Bayesian quadrature for multiple related integrals. In: Proceedings of the 35th International Conference on Machine Learning (ICML) (2018). arXiv:8010.4153

## Copyright information

**Open Access**This article is distributed
under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use,
distribution, and reproduction in any medium, provided you give appropriate
credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made.