Abstract
In this paper we discuss a deterministic form of ensemble Kalman inversion as a regularization method for linear inverse problems. By interpreting ensemble Kalman inversion as a low-rank approximation of Tikhonov regularization, we are able to introduce a new sampling scheme based on the Nyström method that improves practical performance. Furthermore, we formulate an adaptive version of ensemble Kalman inversion where the sample size is coupled with the regularization parameter. We prove that the proposed scheme yields an order optimal regularization method under standard assumptions if the discrepancy principle is used as a stopping criterion. The paper concludes with a numerical comparison of the discussed methods for an inverse problem of the Radon transform.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In recent years, ensemble Kalman inversion (EKI) has become a popular tool for solving inverse problems [28]. EKI has advantages against other iterative methods in situations where the evaluation of the forward operator is costly, and information about its adjoint or its derivative is unavailable.
While there are some recent results on the convergence of EKI as an optimization method [10, 51, 52, 62], the regularization theory of EKI is still incomplete. In this paper, we provide an analysis of a deterministic form of EKI as a regularization method for solving linear inverse problems. That is, we consider the problem of determining a solution \(x_*\) of the linear operator equation
where \(L:\mathbb {X}\rightarrow \mathbb {Y}\) is a bounded linear operator between Hilbert spaces. We do not assume that we have access to y, but only to a noisy measurement
where \(\xi \) is noise.
Such an analysis is important for three reasons: First, it allows a theoretical comparison of EKI with established iterative regularization methods for inverse problems, such as the iteratively regularized Gauss-Newton [3] or the iteratively regularized Landweber [49] iteration. Secondly, it allows the transfer of knowledge between functional-analytic regularization theory, in particular the study of finite-dimensional approximation of Tikhonov regularization [20, 43], and the emerging literature on ensemble methods for the solution of inverse problems (see for example [27] or [46]). Finally, this analysis can potentially serve as the basis for a generalized analysis of EKI for nonlinear inverse problems, making use of the deterministic convergence analysis of iterative regularization methods in Hilbert space (see [3, 22, 30]).
It was already noted in [28] that in the case of a linear operator equation the first iteration of EKI converges to the Tikhonov regularized solution as the sample size approaches infinity. It can be shown that—at least for the deterministic version considered in this paper—this also holds true for all subsequent iterates, where each iterate is associated with a different choice of regularization parameter. Thus, in the linear case, EKI can be completely characterized as a stochastic low-rank approximation of Tikhonov regularization. As a consequence we can prove that under appropriate source conditions and by adapting the sample size to the regularization parameter (this method is then called adaptive EKI), we get optimal convergence rates for EKI in the sense formulated for instance in [14]. Moreover, we show that the efficiency of EKI can be increased by the use of more sophisticated low-rank approximation schemes, such as the Nyström method (see e.g. [17]).
The paper is organized as follows:
-
We continue this section by recalling some required notation and functional-analytic prerequisites (Sect. 1.1) and providing an appropriate definition of the deterministic form of EKI that is considered for the rest of this paper (see Sect. 1.2).
-
In Sect. 2.1, we discuss deterministic EKI as an approximation to Tikhonov regularization. In particular, we derive error estimates in dependence of the regularization parameter which build the foundation for the subsequent formulation of an adaptive version. In Sect. 2.2 we review some results and methods for the low-rank approximation of operators, in particular the Nyström method. We show how these methods naturally lead to new versions of EKI.
-
In Sect. 3 we propose an adaptive variant of EKI. The algorithm is described in Sect. 3.1 and analyzed as an iterative regularization method in Sect. 3.2, where we describe conditions under which we can prove optimal convergence rates in the zero-noise limit. This constitutes our main result. Further remarks comparing the proposed scheme with similar methods from the existing literature are given in Sect. 3.3.
-
We conclude our paper in Sect. 4 with numerical experiments in the context of computerized tomography. These experiments demonstrate some advantages and shortcomings of EKI for linear inverse problems. In particular, they show that the Nyström EKI method leads to considerable improvements in terms of numerical performance in comparsion to existing sampling methods.
-
The appendix reviews some prerequisites from probability theory, and discusses how our exposition relates to alternative formulations of EKI that have been studied elsewhere.
1.1 Notation and terminology
We summarize basic notation first:
-
(i)
\(\mathbb {X}\) and \(\mathbb {Y}\) denote real separable Hilbert spaces.
-
(ii)
\(\mathcal {L}(\mathbb {X};\mathbb {Y})\) denotes the space of bounded linear operators from \(\mathbb {X}\) to \(\mathbb {Y}\).
-
(iii)
If \(L:\mathbb {X}\rightarrow \mathbb {Y}\) is a linear operator, we let \( \mathcal {D}(L) \subset \mathbb {X}\) denote its domain and \( \mathcal {R}(L) \subset \mathbb {Y}\) denote its range.
-
(iv)
We call \(P \in \mathcal {L}(\mathbb {X};\mathbb {X})\) positive if \(\left<Px,x\right>_\mathbb {X} \ge 0\) for all \(x \in \mathbb {X}\).
-
(v)
For a positive and self-adjoint operator \(P \in \mathcal {L}(\mathbb {X};\mathbb {X})\), we define the P-weighted norm
$$\begin{aligned} \left\| x\right\| _P = {\left\{ \begin{array}{ll} \left\| P^{-1/2} x\right\| _\mathbb {X}, &{} \text {if } x \in \mathcal {R}(P^{1/2}) , \\ \infty , &{} \text {else}, \end{array}\right. } \end{aligned}$$where the operator \(P^{-1/2}\) is defined as the pseudoinverse of \(P^{1/2}\), which in turn can be defined via spectral theory, see for example [14, chapter 2.3].
-
(vi)
Trace class: We say that an operator \(P \in \mathcal {L}(\mathbb {X};\mathbb {X})\) is in the trace class if for any orthonormal basis \((e_n)\) of \(\mathbb {X}\) we have
$$\begin{aligned} \sum _n \left| \left<P e_n,e_n\right>_\mathbb {X}\right| < \infty . \end{aligned}$$ -
(vii)
\((\Omega , \mathcal F, \mathbb {P})\) denotes a probability space.
1.2 Ensemble Kalman inversion for linear inverse problems
Next, we present a particular form of the EKI iteration associated to problem (1.1). The original form of EKI [28], which we refer to as stochastic EKI, evolves a random ensemble through an iteration where additional noise is added in each step. In the last few years, multiple variants of EKI have been developed that incorporate adaptable stepsizes [10, 32] or additional regularization [9]. In particular, one can also formulate a deterministic version that circumvents the addition of noise by directly transforming the ensemble mean and covariance. Such a version of EKI has for example been considered in [10]. In accordance with the literature on ensemble Kalman filtering, we will refer to this as deterministic EKI [26, 57]. A more detailed discussion of its relation to the stochastic form of EKI can be found in “Appendix B”.
The EKI iteration involves two linear operators \(C_{0}:\mathbb {X}\rightarrow \mathbb {X}\) and \(R: \mathbb {Y}\rightarrow \mathbb {Y}\) that characterize regularity assumptions on the solution \(x_*\) and the noise \(\xi \). They have to be provided by the practitioner to represent prior information on the problem. In the rest of this article, we will assume that they satisfy the following conditions:
Assumption 1.1
Let \(C_{0}\in \mathcal {L}(\mathbb {X};\mathbb {X})\) and \(R\in \mathcal {L}(\mathbb {Y};\mathbb {Y})\) be injective, positive and self-adjoint linear operators such that
-
(i)
\(C_{0}\) is compact,
-
(ii)
\( \mathcal {R}(L) \subset \mathcal {D}( R^{-1/2} ) \), and there exists a constant \(c_{RL}\in \mathbb {R}\) such that
$$\begin{aligned} \left\| R^{-1/2} L\right\| _{\mathcal {L}(\mathbb {X};\mathbb {Y})} \le c_{RL}. \end{aligned}$$(1.3)
Moreover, we assume that the noisy data \(\hat{y}\) defined in Eq. 1.2 satisfies \(\hat{y}\in \mathcal {R}(R^{1/2}) \).
As the next proposition shows, the subspace \( \mathcal {D}( C_{0}^{-1/2} ) \subset \mathbb {X}\) together with the norm \( \left\| \cdot \right\| _{C_{0}} \) yields a Hilbert space. This space will play an important role for our analysis in Sect. 3.
Proposition 1.2
Let \(C_{0}\in \mathcal {L}(\mathbb {X};\mathbb {X})\) be an injective, positive and self-adjoint bounded linear operator. Let
Then \( \mathcal {D}(C_{0}^{-1/2}) \) equipped with the inner product \(\left<\cdot ,\cdot \right>_{C_{0}}\) defines a Hilbert space, denoted by \(\mathbb {X}_{C_{0}}\). Moreover
Proof
The bilinear form \(\left<\cdot ,\cdot \right>_{C_{0}}\) is well-defined on \( \mathcal {D}(C_{0}^{-1/2}) = \mathcal {R}(C_{0}^{1/2}) \) because \(C_{0}\) is injective. Furthermore, this bilinear form is symmetric and positive semidefinite because \(C_{0}^{-1/2}\) is self-adjoint and positive. The definiteness follows from the injectivity of \(C_{0}^{-1/2}\). Eq. 1.4 follows from the boundedness of \(C_{0}\), since we have
Finally, the completeness of \( \mathcal {D}(C_{0}^{-1/2}) \) with respect to \(\left\| \cdot \right\| _{C_{0}}\) is a direct consequence of the completeness of \(\mathbb {X}\). \(\square \)
Remark 1.3
At this point, we want to stress that the operator \(R\) does not correspond to the assumption that \(\xi = \hat{y}- y\) is a Gaussian random element of \(\mathbb {Y}\) with covariance \(R\). In fact, in the case where \(\mathbb {Y}\) is infinite-dimensional, one can show that \( \left\| \xi \right\| _{R} = \infty \) with probability 1 (see [7, theorem 2.4.7]). The proper interpretation of \(R\) is that it determines a subspace \(\mathbb {Y}_R \subset \mathbb {Y}\) in which \(\xi \) is assumed to lie (see Proposition 1.2).
Before we continue with the description of the deterministic EKI iteration, we present an illustrative example for a choice of the operators \(C_0\) and R that is often used in practice.
Example 1.4
If we let \(\mathbb {X}= L^2(D)\) and \(\mathbb {Y}= L^2(E)\), where \(D \subset \mathbb {R}^{d_1}\) and \(E \subset \mathbb {R}^{d_2}\) are bounded domains with piecewise smooth boundaries. Consider the choice \(C_{0}= (\text {I}_\mathbb {X}- \Delta )^{-1}\) and \(R = \text {I}_\mathbb {Y}- \Delta \). Then the operator \(C_{0}\) is compact. Here, \((\text {I}_\mathbb {X}- \Delta )^{-1}\) is the operator which maps a given function \(\rho \in \mathbb {X}\) onto the weak solution of the equation
The range of \(C_{0}\) is \(H^1(D)\), i.e. the Sobolev space of first order. It is easy to see that \(C_{0}^{-1}\) is positive and self-adjoint, and thus so is \(C_{0}\). We also have
Similarly
where \(H^{-1}(E)\) denotes the dual space of \(H^1(E)\).
The fundamental difference of ensemble methods to existing regularization methods is the use of a stochastic low-rank approximation of \(C_{0}\), which reduces the effective dimension of the parameter space \(\mathbb {X}\). The next definition gives this notion a precise meaning.
Definition 1.5
(Low-rank approximation) Let \(C_{0}\in \mathcal {L}(\mathbb {X};\mathbb {X})\) be a self-adjoint, positive and compact linear operator and let \(\gamma > 0\).
-
(i)
Let \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) be a family of bounded linear operators with \( \varvec{A}^{\scriptscriptstyle (J)} \in \mathcal {L}(\mathbb {R}^J;\mathbb {X})\) for all \(J \in \mathbb {N}\). We say that it generates a deterministic low-rank approximation of \(C_0\), of order \(\gamma \), if there exists a constant \(\nu \) such that
$$\begin{aligned} \left\| \varvec{A}^{\scriptscriptstyle (J)} { \varvec{A}^{\scriptscriptstyle (J)} }^* - C_{0}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})} \le \nu J^{-\gamma } \qquad \text { for all } J \in \mathbb {N}. \end{aligned}$$ -
(ii)
Let \(p \in [1,\infty )\) and \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) be a family of random bounded linear operators (see “Appendix A”) with \( \varvec{A}^{\scriptscriptstyle (J)} (\omega ) \in \mathcal {L}(\mathbb {R}^J;\mathbb {X})\) for all \(\omega \in \Omega \) and \(J \in \mathbb {N}\). We say that it generates a stochastic low-rank approximation of \(C_{0}\), of p-order \(\gamma \), if there exists a constant \(\nu _p\) such that
$$\begin{aligned} \mathbb {E}\left[ \left\| \varvec{A}^{\scriptscriptstyle (J)} { \varvec{A}^{\scriptscriptstyle (J)} }^* - C_{0}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})}^{p} \right] ^{1 / {p} } \le \nu _p J^{-\gamma } \qquad \text { for all } J \in \mathbb {N}. \end{aligned}$$
Under Assumption 1.1, the following algorithm is well-defined, for all \(k \in \mathbb {N}\).
Definition 1.6
(Deterministic EKI) Let \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generate a low-rank approximation of \(C_{0}\), and let \(\hat{y}\in \mathbb {Y}\), \(J \in \mathbb {N}\), and an initial guess \(x_0\in \mathbb {X}\) be given.
-
Initialization: Set \({\hat{X}}^{\scriptscriptstyle (J)}_0 := x_0\) and \( \varvec{A}^{\scriptscriptstyle (J)} _0 := \varvec{A}^{\scriptscriptstyle (J)} \).
-
Iteration (\(k \rightarrow k+1\)): Let \( {\varvec{B}_k^{\scriptscriptstyle (J)}} = R^{-1/2} L {\varvec{A}_k^{\scriptscriptstyle (J)}} : \mathbb {R}^J \rightarrow \mathbb {Y}\), and set
$$\begin{aligned}&{\hat{X}}^{\scriptscriptstyle (J)}_{k+1} = \hat{X}_k^{\scriptscriptstyle (J)} + {\varvec{A}_k^{\scriptscriptstyle (J)}} \left( {\varvec{B}_k^{\scriptscriptstyle (J)}} ^* {\varvec{B}_k^{\scriptscriptstyle (J)}} + \mathbb {I}_J\right) ^{-1} {\varvec{B}_k^{\scriptscriptstyle (J)}} ^* R^{-1/2} (\hat{y}- L \hat{X}_k^{\scriptscriptstyle (J)} ), \end{aligned}$$(1.5)$$\begin{aligned} \text {and} \quad&\varvec{A}^{\scriptscriptstyle (J)} _{k+1} = {\varvec{A}_k^{\scriptscriptstyle (J)}} \left( {\varvec{B}_k^{\scriptscriptstyle (J)}} ^* {\varvec{B}_k^{\scriptscriptstyle (J)}} + \mathbb {I}_J\right) ^{-1/2}, \end{aligned}$$(1.6)where \(\mathbb {I}_J\in \mathbb {R}^{J\times J}\) denotes the identity matrix and \( {\varvec{B}_k^{\scriptscriptstyle (J)}} ^*:\mathbb {Y}\rightarrow \mathbb {R}^J\) denotes the adjoint of \( {\varvec{B}_k^{\scriptscriptstyle (J)}} \).
Note that the adjective “deterministic” in Definition 1.6 refers only to the update formula, which—in contrast to the original, stochastic EKI iteration (see Definition B.1)—does not introduce additional noise. Even if a stochastic low-rank approximation is used in Definition 1.6, we will refer to the resulting method as deterministic EKI. In this case, the algorithm is defined pointwise, for every \(\omega \in \Omega \). That is, the quantities \(\hat{X}_k^{\scriptscriptstyle (J)} \), \( {\varvec{A}_k^{\scriptscriptstyle (J)}} \) and \( {\varvec{B}_k^{\scriptscriptstyle (J)}} \) all depend on \(\omega \). For the rest of this paper, we will suppress this dependence. This allows us to treat both deterministic and stochastic low-rank approximations at once.
Remark 1.7
We have introduced the EKI update equations (Eqs. 1.5–1.6) in the so-called square-root form. It is equivalent (see e.g. [57]) to the so-called covariance form which is more widespread in the literature on the Kalman filter and given by
The operator \(\varvec{C}_k^{\scriptscriptstyle (J)}\) is related to \( {\varvec{A}_k^{\scriptscriptstyle (J)}} \) from Definition 1.6 via the identity \(\varvec{C}_k^{\scriptscriptstyle (J)}= {\varvec{A}_k^{\scriptscriptstyle (J)}} {\varvec{A}_k^{\scriptscriptstyle (J)}} ^*\), which holds for all \(k \in \mathbb {N}\). The computational difference between these two formulations is that the square-root form requires the inversion of an operator on \(\mathbb {R}^J\), while the covariance form requires inversion of an operator on \(\mathbb {Y}\).
The existing literature on EKI focuses mostly on the case where the low-rank approximation \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) is generated by the so-called anomaly operator of an ensemble \(\varvec{U}^{\scriptscriptstyle (J)}\) of random elements—thus the name “ensemble Kalman inversion”. That is, one uses \( \varvec{A}^{\scriptscriptstyle (J)} = \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)})\), where \( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)})\) is defined as follows:
Definition 1.8
(Ensemble anomaly) A J-tuple \(\varvec{U}^{\scriptscriptstyle (J)}= (U_1,\ldots ,U_J)\) of random elements \(U_1^{(J)},\ldots ,U_J^{(J)}\) of \(\mathbb {X}\) is called a random ensemble. We call the random element
the ensemble mean. Furthermore, we call the random continuous linear operator from \(\mathbb {R}^J\) to \(\mathbb {X}\) (see “Appendix A”) defined by
the ensemble anomaly.
We will see in Sect. 2.2 that \(( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)}))_{J=1}^\infty \) generates a stochastic low-rank approximation of \(C_{0}\) if \(U_1,\ldots ,U_J\) are independent Gaussian random elements with \( \text {Cov}\left( U_j^{(J)} \right) = C_{0}\), for all \(j=1,\ldots ,J\). However, the more general Definition 1.6 allows us to consider other forms of low-rank approximations, in particular also deterministic ones (see Sect. 2.2).
Remark 1.9
The update Eq. 1.5 can also be expressed as the solution to a minimization problem, since for all \(k \in \mathbb {N}\), \({\hat{X}}^{\scriptscriptstyle (J)}_{k+1}\) is the minimizer of the functional
which is well-defined due to Assumption 1.1.
2 EKI as approximate Tikhonov regularization
2.1 Direct EKI
Ensemble Kalman methods originated in data assimilation [15] and are traditionally applied to state estimation in dynamical systems [40, 48]. Following this logic, EKI, which has been developed for the treatment of inverse problems, is often analyzed as a nonstationary regularization method with multiple steps, where the iteration number k controls the amount of regularization. For the deterministic version of EKI given by Eqs. 1.5 and 1.6, one can actually show that multiple iterations with initial covariance operator \(C_{0}\) are equivalent to a single iteration with covariance operator \(\tilde{C}_{0}= \frac{1}{k} C_{0}\). This result can be seen as direct consequence of the classical equivalence of the Kalman filter to four-dimensional variational data assimilation (4D-VAR) [47].
Theorem 2.1
Let \( {\varvec{B}^{\scriptscriptstyle (J)}} := R^{-1/2} L \varvec{A}^{\scriptscriptstyle (J)} \), and let \((X_k)_{k=1}^\infty \) denote the EKI iteration as defined in Definition 1.6. Then, the following representation holds
Proof
Follows from [40, theorem 5.4.7] by setting \(M=\text {I}_\mathbb {X}\), \(H_\xi = L\) and \(f^{(\xi )} = \hat{y}\) for \(\xi =1,\ldots , k\). \(\square \)
A first consequence of Theorem 2.1 is that it allows us to embed EKI into a parameter-dependent family of operators, which we will call direct EKI:
Definition 2.2
(Direct EKI) Suppose that Assumption 1.1 holds, and let \(\alpha > 0\). Then, we define the direct EKI in the following way
According to Eq. 2.1, we have
That is, the k-th iterate of deterministic EKI is equivalent to direct EKI with the choice \(\alpha = 1/k\).
Next, we derive error estimates between direct EKI and Tikhonov regularization in terms of the sample size J and the regularization parameter \(\alpha \). To this end, let us recall the notion of the Tikhonov-regularized solution of Eq. 1.1.
Definition 2.3
(Tikhonov regularization) Let Assumption 1.1 hold. Then the unique minimizer of
is called the Tikhonov regularized solution of Eq. 1.1according to the data \(\hat{y}\) and the regularization parameter \(\alpha \). It is denoted with \(\hat{x}_{\alpha }\) and explicitly represented by
and where \(\text {I}_\mathbb {X}:\mathbb {X}\rightarrow \mathbb {X}\) is the identity operator.
Remark 2.4
We emphasize the notational difference between Eqs. 2.2 and 2.5 that \(\text {I}_\mathbb {X}\) denotes the identity operator on \(\mathbb {X}\) while \(\mathbb {I}_J\in \mathbb {R}^{J \times J}\) denotes the identity matrix for \(\mathbb {R}^J\).
Example 2.5
Consider again Example 1.4. In that case Eq. 2.4 becomes
If we compare Eqs. 2.2 and 2.5, we observe that the main difference between Tikhonov regularization and direct EKI is the replacement of the operator \(\mathcal K_\alpha \) (Tikhonov) by a low-rank approximation \(K_\alpha ( \varvec{A}^{\scriptscriptstyle (J)} )\) (direct EKI). In the following the difference between the random element \( \hat{X}^{d, \scriptscriptstyle (J)}_\alpha \) and the Tikhonov regularized solution \(\hat{x}_{\alpha }\) is estimated.
Lemma 2.6
(Tikhonov versus direct EKI) Let \(\alpha > 0\), \(p \in [1,\infty )\), and suppose that Assumption 1.1 holds. Then there exists a constant c, independent of J, such that
Proof
Using spectral theory, one can show
for all positive and self-adjoint bounded linear operators P and Q (see [14, section 2.3]). Furthermore, recall that every linear operator A satisfies the identity \((A^*A + \alpha \text {I}_\mathbb {X})^{-1}A^* = A^* (AA^* + \alpha \text {I}_\mathbb {X})^{-1}\) if one of these expressions is well-defined. With this, one can show that
where we used the notation \( {\varvec{B}^{\scriptscriptstyle (J)}} := R^{-1/2} L \varvec{A}^{\scriptscriptstyle (J)} \) and \(B := R^{-1/2} L C_0^{1/2}\) for brevity. These identities imply
Taking norms and using Eqs. 2.8, 2.9, and 1.3, we then obtain
Taking norms in Eq. 2.7 and inserting this estimate proves the assertion. \(\square \)
This lemma shows that the difference between Tikhonov regularization and direct EKI can be bounded in terms of the difference between the operators \( \varvec{A}^{\scriptscriptstyle (J)} { \varvec{A}^{\scriptscriptstyle (J)} }^*\) and \(C_{0}\). If this difference decreases with a certain rate with respect to J, then direct EKI converges to Tikhonov regularization with the same rate.
Proposition 2.7
(Convergence of EKI to Tikhonov) Let Assumption 1.1 hold.
-
(i)
If \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generates a deterministic low-rank approximation of \(C_{0}\) of order \(\gamma \), then there exists a constant \(\kappa \) such that
$$\begin{aligned} \left\| \hat{X}^{d, \scriptscriptstyle (J)}_\alpha - \hat{x}_{\alpha }\right\| _\mathbb {X} \le \kappa \alpha ^{-1} J^{-\gamma } \qquad \text {for all } \alpha > 0 \text { and all } J \in \mathbb {N}. \end{aligned}$$ -
(ii)
Let \(p \in [1,\infty )\). If \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generates a stochastic low-rank approximation of \(C_{0}\) of p-order \(\gamma \), then there exists a constant \(\kappa _p\) such that
$$\begin{aligned} \mathbb {E}\left[ \left\| \hat{X}^{d, \scriptscriptstyle (J)}_\alpha - \hat{x}_{\alpha }\right\| _\mathbb {X}^{p} \right] ^{1 / {p} } \le \kappa _p \alpha ^{-1} J^{-\gamma } \qquad \text {for all } \alpha > 0 \text { and all } J \in \mathbb {N}. \end{aligned}$$
Proof
Follows directly from Lemma 2.6 and Definition 1.5 with \(\kappa =\nu \cdot c\) and \(\kappa _p = \nu _p \cdot c\). \(\square \)
Remark 2.8
Alternatively to the above derivation, the convergence of deterministic EKI to Tikhonov regularization can be seen as special case of the convergence of the ensemble square-root filter to the Kalman filter, see for example [35] or [40, section 5.4]. However, the alternative results presented here are better suited to investigate convergence rates of EKI as a regularization method (see Sect. 3), since they explicitly describe the dependence of the approximation error on the regularization parameter \(\alpha \). We also note that different types of finite-dimensional approximations of Tikhonov approximation have been studied elsewhere, for example in [20, 43].
2.2 Optimal low-rank approximations for EKI
Proposition 2.7 shows that direct EKI, and thus also EKI, converges to Tikhonov regularization with rate equal to the order of the employed low-rank approximation. In general, convergent low-rank approximations only exist if the eigenvalues of \(C_{0}\) satisfy a decay condition.
Assumption 2.9
(Decreasing eigenvalues of \(C_{0}\)) Let \(C_{0}\) satisfy Assumption 1.1, and let \((\lambda _n)\) denote its eigenvalues in decreasing order. We assume that there exists a constant \(\eta > 0\) such that
Remark 2.10
In this paper, we always assume that all eigenvalues are repeated according to their multiplicities.
Example 2.11
Consider Example 1.4. In this case, Assumption 2.9 is satisfied with \(\eta = 2/d\) [33].
Under Assumption 2.9, the Schmidt-Eckhardt-Young-Mirsky theorem [13, 37, 53] states that the best possible order of any low-rank approximation of \(C_{0}\) is \(\eta \), and it is achieved by the truncated singular value decomposition.
Theorem 2.12
(Schmidt–Eckhardt–Young–Mirsky) Let \(C_{0}\in \mathcal {L}(\mathbb {X};\mathbb {X})\) be positive, self-adjoint and compact, and let \((\lambda _n)\) denote its eigenvalues in decreasing order. Let \( {\varvec{A}^{\scriptscriptstyle (J)}_\text {svd}} {\varvec{A}^{\scriptscriptstyle (J)}_\text {svd}} ^*\) denote the J-truncated singular value decomposition of \(C_{0}\). Then
Remark 2.13
Note that the optimal possible order for a low-rank approximation does not directly depend on the dimension of the underlying spaces \(\mathbb {X}\) and \(\mathbb {Y}\), only on the decay of the eigenvalues of \(C_{0}\). This means that we obtain dimension-independent convergence rates as long as the eigenvalues of \(C_{0}\) decay sufficiently fast.
Existing formulations of EKI generate a stochastic low-rank approximation of \(C_{0}\) from the ensemble anomaly \( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)})\) (see Definition 1.8) of a randomly generated ensemble \(\varvec{U}^{\scriptscriptstyle (J)}\). For this type of approximation, we have the following result.
Theorem 2.14
(Low rank approximation of \(C_{0}\))] Assume that \(C_{0}\) is in the trace-class and let \(p \in [1,\infty )\) be fixed. Moreover, for every \(J \in \mathbb {N}\), let \(\varvec{U}^{\scriptscriptstyle (J)}= [U_1,\ldots ,U_J] \in \mathbb {X}^J\) be an ensemble of independent Gaussian random elements with \( \text {Cov}\left( U_j \right) = C_{0}\), for all \(j \in \{ 1,\ldots , J \}\), and let \( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)})\) be as in Definition 1.8. Then \(( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)}))_{J=1}^\infty \) generates a stochastic low-rank approximation of \(C_{0}\), of p-order 1/2, meaning that there exists a constant \(\nu _p\) such that
In particular, for \(p=1\), there exists a constant \(c > 0\) such that
Proof
See [31].
Example 2.15
We want to give some examples of trace-class operators on \(L^2(\mu )\), where \(\mu \) is a Radon measure on a domain \(U \subset \mathbb {R}^d\) with \({{\,\mathrm{supp}\,}}\mu = U\). Then, Mercer’s theorem (see e.g. [11, Theorem 5.6.9]) characterizes a large class of trace-class operators: An operator \(P: L^2(\mu ) \rightarrow L^2(\mu )\) is in the trace-class if it can be represented by an integrable continuous positive-definite kernel, i.e.
The following result on trace-class operators allows us to directly compare the order of \(( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)}))_{J=1}^\infty \) to the theoretical optimum defined in Theorem 2.12.
Proposition 2.16
Let \(C_{0}\) be a positive and self-adjoint trace-class operator with eigenvalues \((\lambda _n)\). Then
Proof
It follows from the assumptions on \(C_{0}\) that
(see e.g. [11, lemma 5.6.2]) which implies \(\lambda _n = O(n^{-1})\). \(\square \)
Therefore, if \(C_{0}\) is in the trace-class, then according to Theorem 2.12 the optimal low-rank approximation of \(C_{0}\) is given by \( {\varvec{A}^{\scriptscriptstyle (J)}_\text {svd}} {\varvec{A}^{\scriptscriptstyle (J)}_\text {svd}} ^*\) and is at least of order 1. However, since the low-rank approximation generated by \(( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)}))_{J=1}^\infty \) satisfies the lower bound Eq. 2.10, the ensemble-based low-rank approximation, while cheaper, is not of optimal order.
This leads to the question whether there exist low-rank approximations of \(C_{0}\) that are of optimal order but do not require knowledge of the singular value decomposition of \(C_{0}\). The answer to this question is yes. There exist stochastic low-rank approximations that are of optimal order and only require O(J) evaluations of \(C_{0}\) [21]. An example of such a scheme is the Nyström method [17, 41, 44]. We will consider a special case given by algorithm 1.
![figure a](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00211-022-01314-y/MediaObjects/211_2022_1314_Figa_HTML.png)
It has been shown that this method leads to a stochastic low-rank approximation of optimal order.
Theorem 2.17
(Nyström low rank approximation) Let \( {\varvec{A}^{\scriptscriptstyle (J)}_\text {nys}} \) be obtained from Algorithm 1 and let \((\lambda _n)\) denote the decreasing eigenvalues of \(C_{0}\). Then
for all \(N \in \mathbb {N}\) with \(N \le J-2\), where e denotes Euler’s number. In particular, if Assumption 2.9 is satisfied with \(\eta >1/2\), we have
Proof
It follows from lemma 4 in [12] that
where \( {\varvec{Q}^{\scriptscriptstyle (J)}} \) is as in algorithm 1. The right-hand side can be estimated using [21, theorem 10.6] (the adaptation to our infinite-dimensional setting is straightforward), yielding Eq. 2.11. If we then choose \(N = J/2\) in Eq. 2.11 (assuming without loss of generality that J is even), the right-hand side becomes
\(\square \)
Remark 2.18
By adapting the proof of [21, theorem 10.6], one could also show that the Nyström-method is of p-order \(\eta \), for all \(p \in [1,\infty )\).
We will see in Sect. 4 that the accuracy of the Nyström method is very close to the theoretical optimum given by the truncated singular value decomposition.
2.3 Convergence of direct EKI
The ensemble anomaly, truncated singular value decomposition, Nyström method, or in fact any other method for the low-rank approximation of positive operators can be used inside EKI. The corresponding error estimates with respect to Tikhonov regularization follow then directly from Proposition 2.7.
Corollary 2.19
Suppose that Assumption 1.1 is satisfied. Then:
-
(i)
Let \( \varvec{A}^{\scriptscriptstyle (J)} = \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)})\). If \(C_{0}\) is in the trace-class, then for all \(p \in [1,\infty )\) there exists a constant \(\kappa _p^\text {en}\) such that
$$\begin{aligned} \mathbb {E}\left[ \left\| \hat{X}^{d, \scriptscriptstyle (J)}_\alpha - \hat{x}_{\alpha }\right\| _\mathbb {X}^{p} \right] ^{1 / {p} } \le \kappa _p^\text {en} \alpha ^{-1} J^{-1/2} \qquad \text {for all } \alpha > 0. \end{aligned}$$(2.13) -
(ii)
Let \( \varvec{A}^{\scriptscriptstyle (J)} = {\varvec{A}^{\scriptscriptstyle (J)}_\text {svd}} \). Then \( \hat{X}^{d, \scriptscriptstyle (J)}_\alpha \) is deterministic, and if Assumption 2.9 holds, then there exists a constant \(\kappa ^\text {svd}\) such that
$$\begin{aligned} \left\| \hat{X}^{d, \scriptscriptstyle (J)}_\alpha - \hat{x}_{\alpha }\right\| _\mathbb {X} \le \kappa ^\text {svd} \alpha ^{-1} J^{-\eta } \qquad \text {for all } \alpha > 0. \end{aligned}$$(2.14) -
(iii)
Let \( \varvec{A}^{\scriptscriptstyle (J)} = {\varvec{A}^{\scriptscriptstyle (J)}_\text {nys}} \). If Assumption 2.9 holds with \(\eta > 1/2\), then there exists a constant \(\kappa ^\text {nys}\) such that
$$\begin{aligned} \mathbb {E}\left[ \left\| \hat{X}^{d, \scriptscriptstyle (J)}_\alpha - \hat{x}_{\alpha }\right\| _\mathbb {X} \right] \le \kappa ^\text {nys} \alpha ^{-1} J^{-\eta } \qquad \text {for all } \alpha > 0. \end{aligned}$$(2.15)
Proof
Let \(p \in [1,\infty )\). By Theorem 2.14, \(( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)}))_{J=1}^\infty \) generates a stochastic low-rank approximation of p-order 1/2. Thus, Eq. 2.13 follows from Proposition 2.7. The estimates Eqs. 2.14 and 2.15 then follow analogously through Theorems 2.12 and , respectively. \(\square \)
3 Adaptive ensemble Kalman inversion
We have seen in Proposition 2.7 that direct ensemble Kalman inversion can be understood as a low-rank approximation of Tikhonov regularization. It is well-known that, under a standard source-condition (see Assumption 3.4 below), the Tikhonov-regularized solution of a linear equation converges to the infinite-dimensional minimum-norm solution (see Definition 3.3) in the zero-noise limit with a certain rate (see e. g. [14]). Thus, if we ensure that the error between direct EKI and Tikhonov regularization vanishes with the same rate as Tikhonov regularization converges, then direct EKI will also converge with that rate. However, since its iterates are restricted to the finite-dimensional range of \( \varvec{A}^{\scriptscriptstyle (J)} \), direct EKI can only lead to a convergent regularization method if the sample size J is adapted to the noise level.
In this section, we describe how this can be achieved in conjunction with the discrepancy prinicple. The resulting method, which we call adaptive ensemble Kalman inversion, is a convergent regularization method of optimal order in a sense that will be given below. For this result, we require knowledge of a number \(\delta > 0\) such that
This assumption is often referred to as a deterministic noise model, and the number \(\delta \) is called the deterministic noise level. For some results on regularization with random noise, see for example [6].
We start with a precise description of the adaptive EKI method in Sect. 3.1, followed by a convergence analysis of the zero-noise limit in Sect. 3.2. General remarks explaining the connection to other forms of EKI and multiscale methods are given in Sect. 3.3.
3.1 Description of the method
We start by presenting a version of direct EKI with a-posteriori parameter choice rule in the form of the discrepancy principle. We will refer to this method as adaptive EKI.
In our definition, we distinguish between the cases where the underlying low-rank approximation is deterministic and stochastic. In the stochastic case, we will use a projection onto a suitably large ball around the initial guess \(x_0\). This projection serves to guarantee stability of the resulting iteration even in the presence of non-deterministic sampling error. In Sect. 3.2, we will see that if the radius of the ball is chosen sufficiently large, it does not negatively affect the convergence behavior.
Definition 3.1
(Adaptive EKI) Let \(\gamma > 0\), \(b \in (0,1)\), \(\alpha _0 > 0\) and \(J_0 \in \mathbb {N}\), and define
-
If \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generates a deterministic low-rank approximation of \(C_{0}\), of order \(\gamma \), we define the adaptive EKI iteration associated to \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) as
$$\begin{aligned} \hat{x}^\text {a}_k := \hat{X}^{d, \scriptscriptstyle (J_k)} _{\alpha _k}, \qquad \text {for } k \in \mathbb {N}, \end{aligned}$$(3.4)where \( \hat{X}^{d, \scriptscriptstyle (J_k)} _{\alpha _k}\) is defined in Definition 2.2. If \( \varvec{A}^{\scriptscriptstyle (J)} = {\varvec{A}^{\scriptscriptstyle (J)}_\text {svd}} \) (see Theorem 2.12), we refer to the method as adaptive SVD-EKI and denote its iterates with \(\hat{x}^\text {asvd}_k\).
-
Let \(r > 0\) and let \({\overline{B}_r(x_0)}\) denote the closed ball around \(x_0\) with radius r. Let \(P_r\) denote the orthogonal projection on \({\overline{B}_r(x_0)}\). If \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generates a stochastic low-rank approximation of \(C_{0}\), of p-order \(\gamma \), we define the adaptive EKI iteration associated to \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) as
$$\begin{aligned} \hat{X}^\text {a}_k(\omega ) := P_r\left( \hat{X}^{d, \scriptscriptstyle (J_k)} _{\alpha _k}(\omega ) \right) , \qquad \text {for } k \in \mathbb {N}\text { and } \omega \in \Omega , \end{aligned}$$(3.5)where \( \hat{X}^{d, \scriptscriptstyle (J_k)} _{\alpha _k}\) is defined in Definition 2.2. If \( \varvec{A}^{\scriptscriptstyle (J)} = \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)})\) (see Definition 1.8), we refer to this method as adaptive Standard-EKI and denote its iterates with \(\hat{X}^\text {aeki}_k\). Similarly, if \( \varvec{A}^{\scriptscriptstyle (J)} = {\varvec{A}^{\scriptscriptstyle (J)}_\text {nys}} \) (see Theorem ), we refer to the method as adaptive Nyström-EKI and denote its iterates with \(\hat{X}^\text {anys}_k\).
The exponential reduction of the regularization parameter, given by Eq. 3.2, is a typical choice for regularization methods of similar form, and can already be found in [3]. The choice of \((J_k)_{k=1}^\infty \) is motivated by Proposition 2.7: By ensuring that \(J_k^\gamma \) grows at least as fast as \(\alpha _k^{-1}\), we make sure that the approximation error between adaptive EKI and Tikhonov regularization does not explode as k increases.
In order to ensure convergence, we choose a stopping criterion for the adaptive EKI iteration. We consider the discrepancy principle, which has the advantage that it is easy to implement and it requires only little prior information on the forward operator L. In the case where the employed low-rank approximation is stochastic, the resulting stopping index is a random variable.
Definition 3.2
(Discrepancy principle) Let \(\delta \) be as in Eq. 3.1 and \(\tau > 1\). Then, adaptive EKI (Definition 3.1) is terminated after \(K_\delta \) iterations, where the integer random variable \(K_\delta : \Omega \rightarrow \mathbb {N}\cup \{\infty \}\) satisfies
where we set \(K_\delta (\omega ) = \infty \) if such a number does not exist.
For the case of Tikhonov regularization, it is known that the discrepancy principle yields a converging regularization method under standard assumptions. The main difficulty of the analysis of adaptive EKI is to show that this result also holds for the random, approximate iteration given by Definition 3.1.
Pseudo-code for the adaptive EKI method in conjunction with the discrepancy principle is given in Algorithm 2.
![figure b](http://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs00211-022-01314-y/MediaObjects/211_2022_1314_Figb_HTML.png)
3.2 Convergence analysis
Next, we show that adaptive EKI as defined above is a convergent regularization method, where convergence is considered relative to the minimum-norm solution of Eq. 1.1, defined as follows.
Definition 3.3
We call \(x^\dagger \in \mathbb {X}\) an \((x_0, C_{0})\)-minimum-norm solution of \(L x = y\) if
The existence and uniqueness of \(x^\dagger \) follow from [14, theorem 2.5] taking into account Proposition 1.2.
Before we continue, it is convenient to summarize the different inversion techniques and the according terminology.
Random variable | ||
---|---|---|
\(\hat{X}_k^{\scriptscriptstyle (J)} \) | k-th iterate of EKI with sample size J | Equation 2.1 |
\( \hat{X}^{d, \scriptscriptstyle (J)}_\alpha \) | Direct EKI with regularization parameter \(\alpha \) | Equation 2.2 |
\(\hat{x}^\text {a}_k\) | The k-th iterate of adaptive EKI with a deterministiclow-rank approximation | Equation 3.4 |
\(\hat{X}^\text {a}_k\) | The k-th iterate of adaptive EKI with a stochasticlow-rank approximation | Equation 3.5 |
\(\hat{x}_\alpha \) | Tikhonov-regularized solution according to the noisy data \(\hat{y}\) | Equation 2.5 |
\(x_\alpha \) | Tikhonov-regularized solution according to the exact data y | Equation 3.15 |
Our convergence proof is based on the assumption that \(x^\dagger \) satisfies a source condition, which is defined as follows.
Assumption 3.4
(Source condition) Let \(\mathbb {X}_{C_{0}}\) be defined as in Proposition 1.2. There exists a \((x_0, C_{0})\)-minimum-norm solution \(x^\dagger \in \mathbb {X}_{C_{0}}\) of \(Lx = y\), constants \(\mu \in (0,1/2]\), \(\rho > 0\) , and some \(v \in \mathbb {X}\) with \(\left\| v\right\| _\mathbb {X} \le \rho \) such that
where \(B = R^{-1/2} L C_0^{1/2}\).
Remark 3.5
Equation 3.7 can be interpreted as a smoothness assumption on the minimum-norm solution \(x^\dagger \). Source conditions are ubiquitous in the mathematical literature on inverse problems. Typically, convergence rates for regularization methods cannot be proven without assuming some type of source condition. Beyond the condition Equation 3.7, also logarithmic, variational, and spectral tail conditions can be considered. See [19, 24, 42, 50] and some more recent references [1, 2].
For the subsequent convergence analysis, we focus first on the more challenging case where adaptive EKI is based on a stochastic low-rank approximation. In that case, the following additional assumptions are sufficient to obtain convergence rates.
Assumption 3.6
Let \(p, q \in [1, \infty )\), \(\epsilon \in (0, \tau -1)\), and let \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generate a stochastic low-rank approximation of \(C_{0}\), of p-order \(\gamma \).
-
(i)
The projection radius r from Definition 3.1 satisfies
$$\begin{aligned} r \ge 2 \left\| C_{0}^{1/2}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})} \left\| x_0 - x^\dagger \right\| _{C_{0}} . \end{aligned}$$(3.8) -
(ii)
There holds
$$\begin{aligned} \alpha _0 J_0^\gamma \ge \frac{c_{RL} \kappa _p}{\epsilon \delta ^{1+\frac{q}{p}}}, \end{aligned}$$(3.9)
where \(c_{RL}\) is as in Eq. 1.3 and \(\kappa _p\) is as in Proposition 2.7.
Remark 3.7
Note that Eq. 3.9 together with Eqs. 3.2 and 3.3 implies that a corresponding estimate holds for all subsequent iterates, i.e.
Furthermore, the condition given by Eq. 3.8 simply means that the projection radius r has to be chosen large enough in relation to the initial error \( \left\| x_0 - x^\dagger \right\| _{C_{0}} \). We show in Proposition 3.10 that this condition ensures that the projection in Eq. 3.5 does not increase the approximation error between adaptive EKI and Tikhonov regularization.
Our strategy to obtain convergence rates for adaptive EKI is to use the error estimate between direct EKI and Tikhonov regularization, provided by Proposition 2.7, to transfer the well-established convergence results on Tikhonov regularization to adaptive EKI. The main complication is that the discrepancy principle introduces a coupling between the regularization parameter and the sampling error, which makes it challenging to estimate \(\left\| \hat{X}^\text {a}_{K_\delta } - \hat{x}_{\alpha _{K_\delta }}\right\| _\mathbb {X}\) directly. Instead, we employs a good-set strategy, similar to the one used in [4] for the analysis of the iteratively regularized Gauss-Newton method for random noise. The idea behind the good-set-strategy is to define a suitable subset of \(\Omega \) on which we can perform a deterministic analyis, and then to show that the probability of the complement vanishes sufficiently fast. For our purpose, we define the good set \(E_\text {good}(\delta )\subset \Omega \) by
Then, the law of total expectation yields, for \(q \in [1, \infty )\),
where \(E_\text {good}(\delta )^\complement \) denotes the complement of \(E_\text {good}(\delta )\). Since \(\hat{X}^\text {a}_{K_\delta } \in {\overline{B}_r(x_0)}\) holds by Eq. 3.5, we have
Hence, it suffices to estimate \(\mathbb {E}\left[ \left\| \hat{X}^\text {a}_{K_\delta } - x^\dagger \right\| _\mathbb {X}^q | E_\text {good}(\delta ) \right] \) and \(\mathbb {P}(E_\text {good}(\delta )^\complement )\) separately.
Estimates for the first term hinge on understanding the behavior of the Tikhonov-regularized solution \(\hat{x}_{\alpha _{k_\delta }}\). The following lemma summarizes existing results on Tikhonov regularization that we will make use of in our theoretical analysis of adaptive EKI. To this end, we consider as an auxiliary variable the Tikhonov-regularized solution of Eq. 1.1 according to the exact data y and regularization parameter \(\alpha \), defined as
(Compare Definition 2.3.)
Lemma 3.8
(Convergence and stability of Tikhonov regularization) Suppose that Assumptions 1.1 and 3.4 hold, and let \(x_0 \in \mathbb {X}\), \(\alpha > 0\) and \(\delta > 0\). Moreover, assume that
Then there holds
Furthermore, there exist constants \(c_1\) and \(c_2\), independent of \(\alpha \), \(\delta \) and \(\rho \), such that
Proof
Note that \(x^\dagger \) is a \((x_0, C_{0})\)-minimum-norm solution of Eq. 1.1 if and only if \(x^\dagger = x_0+ C_{0}^{1/2} w^\dagger \), where \(w^\dagger \) is a \((0,\text {I}_\mathbb {X})\)-minimum-norm solution of
where \(B := R^{-1/2} L C_{0}^{1/2}\). Similary, if \(w_\alpha \) is the corresponding Tikhonov-regularized solution of Eq. 3.21, i.e.
then \(x_\alpha = x_0+ C_{0}^{1/2} w_\alpha \). Thus, the results follow from the classical case where \(C_{0}= \text {I}_\mathbb {X}\) and \(R= \text {I}_\mathbb {Y}\): The inequalities Eqs. 3.17, 3.18 and 3.19 can be found in [14, (4.66)], [14, (4.68)] and [14, (4.70)], respectively. Equation 3.20 can be obtained from the source condition Equation 3.7 and the interpolation inequality [14, (4.64)], as in the proof of [14, theorem 4.17]. \(\square \)
Moreover, for the deterministic stopping time \(k_\delta \) the following auxiliary result holds.
Lemma 3.9
Given Assumptions 1.1 and 3.4.
-
(i)
There exists a constant \(c_3\), independent of \(\alpha \), \(\delta \), and \(\rho \), such that
$$\begin{aligned} \alpha _{k_\delta } \ge c_3 \left( \frac{\delta }{\rho } \right) ^\frac{2}{2 \mu + 1} \end{aligned}$$(3.22)for all sufficiently small \(\delta \).
-
(ii)
There holds
$$\begin{aligned} k_\delta = O \left( \log (\delta ^{-1}) \right) . \end{aligned}$$(3.23) -
(iii)
If Eq. 3.8 holds, then there exists a sufficiently small \(\bar{\delta } > 0\) such that
$$\begin{aligned} \hat{x}_{\alpha _k} \in {\overline{B}_r(x_0)}\qquad \text {for all } k \le k_\delta \text { and all } \delta \le \bar{\delta }. \end{aligned}$$(3.24)
Proof
-
(i)
Using the same transformations as in the proof of Lemma 3.8, the statement follows from the proof of theorem 4.17 in [14]. Note that this proof uses a discrepancy principle where \(\alpha \) can vary continuously. However, the same argument applies also to the discretized sequence satisfying Eq. 3.2, see [14, remark 4.18].
-
(ii)
Inserting Eq. 3.2 in Eq. 3.22 yields
$$\begin{aligned} b^{k_\delta } \alpha _0 \ge c_3 \left( \frac{\delta }{\rho } \right) ^\frac{2}{2 \mu + 1}, \end{aligned}$$or equivalently
$$\begin{aligned} \left( \frac{1}{b}\right) ^{k_\delta } \le \frac{\alpha _0}{c_3} \left( \frac{\rho }{\delta } \right) ^\frac{2}{2 \mu + 1}. \end{aligned}$$Taking the logarithm and using the fact that \(b \in (0,1)\), we arrive at
$$\begin{aligned} k_\delta \le \log (b^{-1})^{-1} \cdot \left[ \log \left( \frac{\alpha _0}{c_3} \right) + \frac{2}{2 \mu + 1} \log \left( \frac{\rho }{\delta } \right) \right] . \end{aligned}$$This proves Eq. 3.23.
-
(iii)
As in the proof of Lemma 3.8, let \(B = R^{-1/2} L C_{0}^{1/2}\), \(x^\dagger = x_0 + C_{0}w^\dagger \) and \(\hat{x}_\alpha = x_0+ C_{0}^{1/2} \hat{w} _\alpha \), such that
$$\begin{aligned} \hat{w} _\alpha = \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* R^{-1/2}(\hat{y}- L x_0). \end{aligned}$$Then
$$\begin{aligned} \hat{w} _\alpha= & {} \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* R^{-1/2}(\hat{y}- L x_0)\nonumber \\= & {} \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* R^{-1/2}(\hat{y}- y) + \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* R^{-1/2}(y - L x_0). \qquad \end{aligned}$$(3.25)If we insert \(y = L x^\dagger = L (x_0 + C_{0}^{1/2} w^\dagger )\) in the second term on the right-hand side of Eq. 3.25, we obtain after cancellation and using the definition of B,
$$\begin{aligned} \hat{w} _\alpha - w^\dagger&= \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* R^{-1/2}(\hat{y}- y) + \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* B w^\dagger . \end{aligned}$$Using Eq. 3.1 and the spectral estimates (see e.g. [30, lemma 4.5])
$$\begin{aligned} \left\| \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^*\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})}&\le \frac{1}{2}\alpha ^{-1/2}, \\ \text {and} \quad \left\| \left( B^* B + \alpha \text {I}_\mathbb {X}\right) ^{-1} B^* B\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})}&\le 1, \end{aligned}$$in Eq. 3.25, we obtain
$$\begin{aligned} \left\| \hat{w} _\alpha - w^\dagger \right\| _\mathbb {X} \le \frac{1}{2} \alpha ^{-1/2} \delta + \left\| w^\dagger \right\| _\mathbb {X}. \end{aligned}$$(3.26)Finally, it follows from Eq. 3.22 that
$$\begin{aligned} \alpha ^{-1/2}_k \delta \le \alpha ^{-1/2}_{k_\delta } \delta \le c_3 \left( \frac{\rho }{\delta } \right) ^\frac{1}{2 \mu + 1} \delta = c_3 \rho ^\frac{1}{2 \mu + 1} \delta ^\frac{2}{2 \mu + 1} \qquad \text {for all } k \le k_\delta , \end{aligned}$$(3.27)which vanishes as \(\delta \rightarrow 0\). Therefore, if we set
$$\begin{aligned} \bar{\delta }= 2^\frac{2 \mu + 1}{2} c_3^{-\frac{2 \mu + 1}{2}} \rho ^{-\frac{1}{2}} \left\| w^\dagger \right\| _\mathbb {X}^\frac{2 \mu + 1}{2}, \end{aligned}$$then it follows from Eqs. 3.26 and 3.27 that
$$\begin{aligned} \left\| \hat{w} _{\alpha _k}\right\| _\mathbb {X} \le 2 \left\| w^\dagger \right\| _\mathbb {X} \qquad \text {for all } k \le k_\delta \end{aligned}$$(3.28)holds for all \(\delta \le \bar{\delta }\). By definition of \( \hat{w} _{\alpha _k}\), Eq. 3.28 implies
$$\begin{aligned} \left\| \hat{x}_{\alpha _k} - x_0\right\| _{C_{0}} = \left\| \hat{w} _{\alpha _k}\right\| _\mathbb {X} \le 2 \left\| w^\dagger \right\| _\mathbb {X} = 2 \left\| x^\dagger - x_0\right\| _{C_{0}} , \end{aligned}$$and hence, by Eqs. 1.4 and 3.8,
$$\begin{aligned} \left\| \hat{x}_{\alpha _k} - x_0\right\| _\mathbb {X} \le \left\| C_{0}^{1/2}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})} \left\| \hat{x}_{\alpha _k} - x_0\right\| _{C_{0}} \le 2 \left\| C_{0}^{1/2}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})} \left\| x^\dagger - x_0\right\| _{C_{0}} \le r, \end{aligned}$$for all \(k \le k_\delta \) and \(\delta \le \bar{\delta }\).
\(\square \)
With this lemma, we are able to show that the projection in Eq. 3.5 cannot increase the approximation error between adaptive EKI and the corresponding Tikhonov iteration, at least for \(k \le k_\delta \). More precisely, we have the following proposition.
Proposition 3.10
Let Assumptions 1.1, 3.4 and Eq. 3.8 hold. Let \(\delta \le \bar{\delta }\), where \(\bar{\delta }\) is as in Lemma 3.9. Then
In particular,
where \(\kappa _p\) is as in Proposition 2.7.
Proof
Let \(k \le k_\delta \) and \(\omega \in \Omega \). By Eq. 3.5, we have
By Lemma 3.9, we have \(\hat{x}_{\alpha _k} \in {\overline{B}_r(x_0)}\). Consequently, by the property of the orthogonal projection, we have
This yields
Together with Eq. 3.31, this yields Eq. 3.29. Eq. 3.30 then follows from Eq. 3.29 and Proposition 2.7. \(\square \)
The next proposition provides the desires asymptotic convergence rates of the probability \(\mathbb {P}(E_\text {good}(\delta )^\complement )\).
Proposition 3.11
Given Assumptions 1.1, 3.4 and 3.6, there holds
Proof
By Eq. 3.11 and the subadditivity of \(\mathbb {P}\), we have
By Eq. 3.12 and Markov’s inequality (see Lemma A.2), we have
Without loss of generality, let \(\delta \le \bar{\delta }\), where \(\bar{\delta }\) is as in Lemma 3.9. Using Proposition 3.10 and then Eq. 3.10 in Eq. 3.35 yields
Inserting this inequality in Eq. 3.34, we arrive at
From Lemma 3.9 we know that \(k_\delta = O(\log (\delta ^{-1}))\). Since we have \(1 - \frac{2\mu }{2 \mu + 1} > 0\), we obtain
Hence, we have from Eq. 3.36 that
\(\square \)
Finally, we show convergence of the random element \(\hat{X}^\text {a}_{K_\delta }\) on the “good set” \(E_\text {good}(\delta )\). The construction of \(E_\text {good}(\delta )\) allows to apply the proof of [14, theorem 4.17] with straightforward modifications to each individual realization \(\hat{X}^\text {a}_{K_\delta (\omega )}(\omega )\), given \(\omega \in E_\text {good}(\delta )\).
Proposition 3.12
Given Assumptions 1.1, 3.4 and 3.6, there exists \(C > 0\), independent of \(\omega \) and \(\delta \), such that
Proof
Let \(\omega \in E_\text {good}(\delta )\).
-
First, we show that \(K_\delta (\omega ) \le k_\delta \): To see this, note that
$$\begin{aligned} \left\| \hat{y}- L \hat{X}^\text {a}_{k_\delta }(\omega )\right\| _{R}&\le \left\| \hat{y}- L \hat{x}_{\alpha _{k_\delta }}\right\| _{R} + \left\| L \left( \hat{x}_{\alpha _{k_\delta }} - \hat{X}^\text {a}_{k_\delta }(\omega ) \right) \right\| _{R} \\&\le \left\| \hat{y}- L \hat{x}_{\alpha _{k_\delta }}\right\| _{R} + c_{RL} \left\| \hat{x}_{\alpha _{k_\delta }} - \hat{X}^\text {a}_{k_\delta }(\omega )\right\| _{C_{0}} . \end{aligned}$$By definition of \(k_\delta \) and \(E_\text {good}(\delta )\), this implies
$$\begin{aligned} \left\| \hat{y}- L \hat{X}^\text {a}_{k_\delta }(\omega )\right\| _{R} \le (\tau - \epsilon ) \delta + \epsilon \delta = \tau \delta . \end{aligned}$$Hence, by definition of \(K_\delta \), there must hold \(K_\delta (\omega ) \le k_\delta \).
-
Since \(K_\delta (\omega ) \le k_\delta \), we have by definition of \(E_\text {good}(\delta )\),
$$\begin{aligned} \left\| \hat{X}^\text {a}_k(\omega ) - \hat{x}_k\right\| _\mathbb {X} \le c_{RL}^{-1} \epsilon \delta \qquad \text {for all } k \le K_\delta (\omega ) , \end{aligned}$$(3.39)and consequently also
$$\begin{aligned} \left\| L(\hat{X}^\text {a}_k(\omega ) - \hat{x}_{\alpha _k})\right\| _{R} \le \epsilon \delta \qquad \text {for all } k \le K_\delta (\omega ) . \end{aligned}$$(3.40) -
Next, we show that there exists a constant \(c_4\), independent of \(\rho \), \(\delta \) and \(\omega \), such that
$$\begin{aligned} \alpha _{ K_\delta (\omega ) }^{-1/2} \le c_4 \left( \frac{\rho }{\delta }\right) ^\frac{1}{2 \mu + 1}. \end{aligned}$$(3.41)From Eq. 3.20, we obtain
$$\begin{aligned} \left\| y - L x_{\alpha _{ K_\delta (\omega ) - 1}}\right\| _{R}&\le c_2 \rho \alpha _{ K_\delta (\omega ) - 1}^{\mu + 1/2} \nonumber \\&= c_2 \rho (b^{-1} \alpha _{K_\delta (\omega )} )^{\mu + 1/2}. \end{aligned}$$(3.42)On the other hand,
$$\begin{aligned} \left\| y - L x_{\alpha _{ K_\delta (\omega ) - 1}}\right\| _{R} \ge \left\| \hat{y}- L \hat{x}_{\alpha _{ K_\delta (\omega ) - 1}}\right\| _{R} - \left\| (\hat{y}- y) - L(\hat{x}_{\alpha _{ K_\delta (\omega ) - 1}} - x_{\alpha _{ K_\delta (\omega ) - 1}})\right\| _{R} . \end{aligned}$$Inserting Eq. 3.18 yields
$$\begin{aligned} \left\| y - L x_{\alpha _{ K_\delta (\omega ) - 1}}\right\| _{R}&\ge \left\| \hat{y}- L \hat{x}_{\alpha _{ K_\delta (\omega ) - 1}}\right\| _{R} - \delta \\&\ge \left\| \hat{y}- L \hat{X}^\text {a}_{ K_\delta (\omega ) - 1}(\omega )\right\| _{R} - \left\| L (\hat{X}^\text {a}_{ K_\delta (\omega ) - 1}(\omega ) - \hat{x}_{\alpha _{ K_\delta (\omega ) - 1}})\right\| _{R} - \delta . \end{aligned}$$By the definition of \(K_\delta \) and Eq. 3.40, this reduces to
$$\begin{aligned} \left\| y - L x_{\alpha _{ K_\delta (\omega ) - 1}}\right\| _{R} \ge \tau \delta - \epsilon \delta - \delta = (\tau - \epsilon - 1) \delta . \end{aligned}$$(3.43)Combining Eqs. 3.42 and 3.43 yields
$$\begin{aligned} (\tau - \epsilon - 1) \delta \le c_2 \rho (b^{-1} \alpha _{K_\delta (\omega )} )^{\mu + 1/2}. \end{aligned}$$Since \(\tau - \epsilon - 1 > 0\), we can rearrange this inequality to
$$\begin{aligned} \alpha _{K_\delta (\omega )} ^{-1/2} \le b^{-1/2} \left( \frac{c_2}{\tau - \epsilon - 1} \right) ^\frac{1}{2 \mu + 1}\left( \frac{\rho }{\delta }\right) ^\frac{1}{2 \mu + 1}, \end{aligned}$$which shows Eq. 3.41 for suitable choice of \(c_4\).
-
Next, we show that there exists a constant \(c_5\), independent of \(\omega \) and \(\delta \), such that
$$\begin{aligned} \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _{C_{0}} \le c_5 \delta ^\frac{2 \mu }{2 \mu + 1}. \end{aligned}$$(3.44)We start with the triangle inequality
$$\begin{aligned} \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _{C_{0}} \le \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x_{ \alpha _{K_\delta (\omega )} }\right\| _{C_{0}} + \left\| x_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _{C_{0}} . \end{aligned}$$(3.45)By Eq. 3.19, the first term on the right-hand side satisfies
$$\begin{aligned} \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x_{ \alpha _{K_\delta (\omega )} }\right\| _{C_{0}} \le c_1 \delta \alpha _{K_\delta (\omega )} ^{-1/2}. \end{aligned}$$Inserting Eq. 3.41 yields
$$\begin{aligned} \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x_{ \alpha _{K_\delta (\omega )} }\right\| _{C_{0}} \le c_1 c_4 \rho ^\frac{1}{2 \mu + 1} \delta ^\frac{2 \mu }{2 \mu + 1}. \end{aligned}$$(3.46)For the second term on the right-hand side of Eq. 3.45, we have by Eq. 3.17:
$$\begin{aligned} \left\| x_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _{C_{0}} \le \rho ^\frac{1}{2 \mu + 1} \left\| L x_{ \alpha _{K_\delta (\omega )} } - y\right\| _{R} ^\frac{2 \mu }{2 \mu + 1}. \end{aligned}$$(3.47)We then estimate, using Eq. 3.18,
$$\begin{aligned} \left\| L x_{ \alpha _{K_\delta (\omega )} } - y\right\| _{R}&\le \left\| \hat{y}- L \hat{x}_{ \alpha _{K_\delta (\omega )} }\right\| _{R} + \left\| (y - \hat{y}) - L \left( x_{ \alpha _{K_\delta (\omega )} } - \hat{x}_{ \alpha _{K_\delta (\omega )} } \right) \right\| _{R} \\&\le \left\| \hat{y}- L \hat{x}_{ \alpha _{K_\delta (\omega )} }\right\| _{R} + \delta . \end{aligned}$$From this, another use of the triangle inequality yields
$$\begin{aligned} \left\| L x_{ \alpha _{K_\delta (\omega )} } - y\right\| _{R} \le \left\| \hat{y}- L \hat{X}^\text {a}_{ K_\delta (\omega ) }(\omega )\right\| _{R} + \left\| L \left( \hat{X}^\text {a}_{ K_\delta (\omega ) } - \hat{x}_{ \alpha _{K_\delta (\omega )} } \right) \right\| _{R} + \delta . \end{aligned}$$Finally, using the definition of \(K_\delta \) and Eq. 3.40 yields
$$\begin{aligned} \left\| L x_{ \alpha _{K_\delta (\omega )} } - y\right\| _{R} \le (\tau + \epsilon + 1) \delta . \end{aligned}$$(3.48)Inserting Eq. 3.48 in Eq. 3.47 yields
$$\begin{aligned} \left\| x_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _{C_{0}} \le (\tau + \epsilon + 1) \rho ^\frac{1}{2 \mu + 1} \delta ^\frac{2 \mu }{2 \mu + 1}. \end{aligned}$$(3.49)Finally, inserting both Eqs. 3.46 and 3.49 in Eq. 3.45 yields Eq. 3.44 for sutable choice of \(c_5\).
-
From the triangle inequality and Eq. 1.4, we have
$$\begin{aligned} \left\| \hat{X}^\text {a}_{ K_\delta (\omega ) }(\omega ) - x^\dagger \right\| _\mathbb {X}&\le \left\| \hat{X}^\text {a}_{ K_\delta (\omega ) } - \hat{x}_{ \alpha _{K_\delta (\omega )} }\right\| _\mathbb {X} + \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _\mathbb {X} \\&\le \left\| \hat{X}^\text {a}_{ K_\delta (\omega ) } - \hat{x}_{ \alpha _{K_\delta (\omega )} }\right\| _\mathbb {X} + \left\| C_{0}^{1/2}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})} \left\| \hat{x}_{ \alpha _{K_\delta (\omega )} } - x^\dagger \right\| _{C_{0}} . \end{aligned}$$We can use Eq. 3.39 to estimate the first and Eq. 3.44 to estimate the second term of the right-hand side, which yields
$$\begin{aligned} \left\| \hat{X}^\text {a}_{ K_\delta (\omega ) }(\omega ) - x^\dagger \right\| _\mathbb {X} = c_{RL}^{-1} \epsilon \delta + \left\| C_{0}^{1/2}\right\| _{\mathcal {L}(\mathbb {X};\mathbb {X})} \cdot c_5 \delta ^\frac{2 \mu }{2 \mu + 1}. \end{aligned}$$Hence, we can choose \(C > 0\), independently of \(\delta \) and \(\omega \), such that Eq. 3.38 holds.
\(\square \)
With this, we arrive at convergence rates for adaptive EKI under a stochastic low-rank approximation.
Theorem 3.13
Given Assumptions 1.1, 3.4 and 3.6, there holds
Proof
By Proposition 3.12, there holds
This implies in particular
Using this inequality and Proposition 3.11 in Eq. 3.14 yields
from which Eq. 3.50 follows. \(\square \)
For completeness, we also formulate the convergence rate results under a deterministic low-rank approximation. In this case, the proof of Proposition 3.12 applies without change, and we obtain the following result.
Theorem 3.14
Let Assumptions 1.1 and 3.4 hold, and let \(( \varvec{A}^{\scriptscriptstyle (J)} )_{J=1}^\infty \) generates a deterministic low-rank approximation of \(C_{0}\), of order \(\gamma \). Assume there is \(\epsilon \in (0, \tau - 1)\) such that
where \(\kappa \) is as in Proposition 2.7. Then
Remark 3.15
Comparing the condition Eq. 3.51 for the deterministic case to the condition Eq. 3.10 for the stochastic case, we see that the major difference is that the stochastic case requires an additional multiplicative factor \(\delta ^{-\frac{q}{p}}\). This additional factor is used in the proof of Proposition 3.11 to ensure that \(\mathbb {P}((E_\text {good}(\delta )^\complement )) = O(\delta ^\frac{2 \mu q}{2 \mu + 1})\). Formally, we recover the deterministic case from the stochastic case in the limit \(p \rightarrow \infty \) (where \(p=\infty \) corresponds to almost sure convergence).
Remark 3.16
The proven convergence rate is optimal for \(\mu \in (0,\frac{1}{2})\), in the sense that if only Assumption 3.4 is known, there exists no regularization method that satisfies a better general bound with respect to \(\delta \) and \(\mu \) [14, proposition 3.15].
Continuing our discussion from Sect. 2.2, we see from Theorem 3.13 and Theorem 3.14 that the three special cases of adaptive EKI defined in Definition 3.1, namely adaptive Standard-, SVD- and Nyström-EKI, are all of (stochastic) optimal order. However, the faster convergence of the SVD- and Nyström-based low-rank approximation means that the sample size \(J_k\) does not have to grow as fast as for Standard-EKI, which makes those two methods computationally cheaper.
Corollary 3.17
Let Assumptions 1.1 and 3.4 hold.
-
(i)
Let \(p \in [1,\infty )\), \(C_{0}\) be in the trace class, and suppose that Assumption 3.6 is satisfied for \(\gamma =1/2\). Then there holds
$$\begin{aligned} \mathbb {E}\left[ \left\| \hat{X}^\text {aeki}_{K_\delta } - x^\dagger \right\| _\mathbb {X}^{p} \right] ^{1 / {p} } = O( \delta ^\frac{2 \mu }{2 \mu + 1}). \end{aligned}$$ -
(ii)
Assume that \(C_{0}\) satisfies Assumption 2.9 with constant \(\eta >0\), and suppose that Eq. 3.51 is satisfied for \(\gamma = \eta \). Then there holds
$$\begin{aligned} \left\| \hat{x}^\text {asvd}_{K_\delta } - x^\dagger \right\| _\mathbb {X} = O( \delta ^\frac{2 \mu }{2 \mu + 1}). \end{aligned}$$ -
(iii)
Let \(p \in [1,\infty )\), assume that \(C_{0}\) satisfies Assumption 2.9 with constant \(\eta >1/2\), and suppose tat Assumption 3.6 is satisfied for \(\gamma =\eta \). Then there holds
$$\begin{aligned} \mathbb {E}\left[ \left\| \hat{X}^\text {anys}_{K_\delta } - x^\dagger \right\| _\mathbb {X} \right] = O(\delta ^\frac{2 \mu }{2 \mu + 1}). \end{aligned}$$
Proof
Recall that \((\hat{X}^\text {aeki}_k)_{k=1}^\infty \) is a special case of adaptive EKI where the low-rank approximation is generated by \(( \mathcal {A} (\varvec{U}^{\scriptscriptstyle (J)}))_{J=1}^\infty \) (see Theorem 2.14). Thus, if \(C_{0}\) is in the trace-class, Theorem 3.13 applies with \(\gamma =1/2\) and yields the desired convergence rate. The corresponding result for \((\hat{x}^\text {asvd}_k)_{k=1}^\infty \) follows analogously from Theorems 3.14 and 2.12, while the result for \((\hat{X}^\text {anys}_k)_{k=1}^\infty \) follows from Theorems 3.13 and . \(\square \)
As an example, suppose we know that \(C_{0}\) is in the trace class, i.e. \(\eta \ge 1\). Then Corollary 3.17 implies that Standard-EKI is of optimal order if \(J_k \ge b^{-2 k} J_0\), whereas Nyström-EKI is of optimal order if \(J_k \ge b^{- k} J_0\) (see Eq. 3.3). This means that Nyström-EKI performs comparably with only a square-root of the sample size. Furthermore, if the eigenvalues of \(C_{0}\) decay faster than \(O(n^{-1})\), Nyström-EKI can take advantage of this, whereas Standard-EKI is limited by the lower bound Eq. 2.10.
3.3 General remarks
3.3.1 Relation to other versions of EKI
Note that our focus differs from the strictly Bayesian setting in which ensemble Kalman inversion is often introduced. In the Bayesian setting, it is assumed that the regularization parameter represents the available prior information, and the regularized solution is identified with the MAP estimate. In regularization theory, we are interested in showing convergence rates in the zero-noise limit, which requires the use of parameter choice rules that select the regularization parameter \(\alpha \) in terms of the noise level and properties of the forward operator L. Above, we have focused on the discrepancy principle. In contrast to a-priori choice rules, the use of the discrepancy principle has the advantage that it requires only little prior information on the operator L. However, its use is contingent on performing multiple steps of EKI with decreasing values of the regularization parameter. This strategy has a lot of similarities to the empirical Bayesian approach, where we assume a Gaussian prior but treat the regularization parameter as unknown and try to estimate it from the data (see e.g. [59]). Our analysis shows that, by coupling the sample size to the regularization parameter, it becomes possible to obtain the optimal convergence rates in the zero-noise limit. This is also the major difference of the presented scheme to other versions of EKI.
3.3.2 Relation to multiscale methods
The ideas behind adaptive EKI are similar to sequential multiscale methods, where one iteratively moves from a low-dimensional coarse-scale subspace to finer scales. A related work along these lines is [39], which also applies to ensemble methods, but considers the setting where in each step an approximate solution on a different subspace is computed. Under certain conditions on the multiscale decomposition, this approach can be shown to be equivalent to Tikhonov regularization in the full space. In contrast, the idea behind adaptive EKI is only to approximate Tikhonov regularization, in a way that achieves the same convergence order in the zero-noise limit.
3.3.3 Localization
In some practical applications (e.g. numerical weather prediction [26]) it is only feasible to work with ensemble sizes that are orders of magnitude smaller than the parameter dimension. In these situations, localization [18] is often used to increase the effective ensemble size through incorporation of domain knowledge on the correlation structure of the parameter or observation of interest. Since adaptive EKI can be formulated both in square-root and covariance form (see Remark 1.7), it can be combined with most of the existing localization methods, such as covariance localization [25] or local analysis [45]. Note that localization for stochastic EKI has been studied in [58].
4 Numerical experiments
We performed numerical experiments to evaluate the performance of adaptive EKI.
4.1 Test problem
We have chosen inversion of the Radon transform L (see for instance [34]) as our test example. The analytical results show that the large ensemble limit approximates the Tikhonov regularized solution, which we aim to verify numerically. And we also compare the different variants of EKI in terms of efficiency. As a test object, we use the classic Shepp-Logan phantom [54] with size \(d \times d\), \(d=100\), (see Fig. 1). This corresponds to a parameter dimension of \(n := \dim \mathbb {X}= d^2 = 10{,}000\) and a measurement dimension of \(m := \dim \mathbb {Y}= 14{,}200\).
4.2 Data simulation
We generated noise \(\xi _s \sim \mathcal {N}(0, \text {I}_m)\) from a standard normal distribution and then rescaled the noise by setting
thereby ensuring a signal-to-noise ratio of exactly 10. We then used \({\hat{y}} = y + \xi \) as noisy measurement for the tested methods. We also rescaled the measurement and the observation operator by \(\left\| \xi \right\| \) so that \(\delta = \left\| \hat{y}- y\right\| = 1\).
4.3 Considered methods
In our experiment, we set \(R= \mathbb {I}_m\), and chose \(C_{0}\in \mathbb {R}^{n \times n}\) equal to a discretized covariance operator of an Ornstein-Uhlenbeck process,
with correlation length \(h > 0\) (we used the value \(h = 0.01\)). Such operators are often used as prior covariance for Bayesian MAP estimation in tomography, for example in [56]. They correspond to the assumption that the correlation between individual pixels decreases exponentially with distance, where \(q_i\) denotes the normalized position of the i-th pixel if the image is scaled to \([0,1]\times [0,1]\). We compared the 3 different instances of adaptive EKI discussed in Sect. 3:
-
Standard EKI with \(\alpha _k = b^k\), \(b = \sqrt{0.8}\), \(J_k = \lceil b^{-2(k-1)} J_1 \rceil \), and \(J_1 = 50\).
-
Nyström-EKI with \(\alpha _k = b^k\), \(b = 0.8\), \(J_k = \lceil b^{-(k-1)} J_1 \rceil \), and \(J_1 = 50\).
-
SVD-EKI with \(\alpha _k = b^k\), \(b = 0.8\), \(J_k = \lceil b^{-(k-1)} J_1 \rceil \), and \(J_1 = 50\).
The different values of b are used in order to ensure that the sequence of sample sizes \((J_k)_{k=1}^\infty \) is equal for all three methods. Moreover, all methods used the discrepancy principle (see Definition 3.2) with \(\tau = 1.2\). In any case, the iterations where aborted once \(J_k\) was larger than n, since at this point the computational complexity of EKI is higher than of Tikhonov regularization.
4.4 Implementation
The algorithms were implemented in Python and use efficient Numpy [23] and SciPy [60] routines. We used the existing implementation of the Radon transform in the scikit-image library [61], and took advantage of the Ray framework [38] to parallelize the operator evaluations. The computations were performed on a Dell XPS-15-7590 Laptop with 12 2.60 GHz CPUs and 15.3 GiB RAM.
4.5 Convergence of adaptive EKI
For each iteration, we evaluated the relative reconstruction error
The results are visualized in Fig. 2. Note that every iteration is computationally more expensive than the previous one since the sample size \(J_k\) increases steadily. While the Nyström-EKI and the SVD-EKI methods were able to satisfy the discrepancy principle after 18 iterations (with sample size \(J_{18} = 2221\)), the Standard EKI-iteration was not able to satisfy the discrepancy principle for a sample size less than n. Apart from that, one clearly sees that Nyström-EKI and SVD-EKI both significantly outperform Standard-EKI. Consistent with Theorem 2.12, one may observe that SVD-EKI yields the most accurate reconstruction for given sample size.
4.6 Comparison of Standard-EKI with Nyström-EKI
In Fig. 3, we visually compare the reconstruction with Standard-EKI to the reconstruction with Nyström-EKI. Both reconstructions use the same value of \(\alpha \) and sample size \(J=2000\). One can see that the standard method is much more noisy than the Nyström method. This noise does not come from the noisy measurement, it is introduced by the sampling process.
4.7 Convergence to Tikhonov regularization for large sample sizes
In Fig. 4, we have plotted the reconstruction with Nyström-EKI for increasing values of J. For \(J = 500\), the reconstruction is hardly useful. However, for \(J=2000\) the reconstruction is already almost comparable to the Tikhonov reconstruction, although a little bit blurred. For higher values of J, the improvement is only marginal. This shows that the presence of noise allows considerable a-priori (that is, not using knowledge on L or \(\hat{y}\)) dimensionality reduction.
We also repeated the experiment for fixed regularization parameter \(\alpha =0.03\) and different values of J in order to examine the convergence estimate from Sect. 2.3 numerically. In Fig. 5, we plotted the approximation error with respect to Tikhonov regularization, normalized with \(\left\| x_*\right\| \), i.e.
In accordance with Proposition 2.7, the approximation error of Standard-EKI decreases like \(J^{-1/2}\). However, it is still significant even if the number of ensembles is close to n. With Nyström-EKI or SVD-EKI, the approximation error becomes negligible even for relatively small sample sizes.
4.8 Divergence for small values of \(\alpha \)
Keeping the sample size fixed at \(J=2000\), we then repeated the experiment for different values of \(\alpha \) (see Fig. 6). One sees that the approximation error of all three methods explodes as \(\alpha \rightarrow 0\), which demonstrates the necessity of adapting the sample size. Again, Nyström-EKI and SVD-EKI are superior to Standard-EKI.
The Standard-EKI, Nyström-EKI and SVD-EKI iterations for fixed sample size J and varying regularization parameter. The x-axis denotes the regularization parameter \(\alpha \). The y-axis denotes the scaled approximation error \( e_\text {app}\) for EKI with sample size \(J=2000\). As \(\alpha \) approaches 0, the approximation error explodes
5 Conclusions
We have shown that ensemble Kalman inversion is a convergent regularization method if the sample size is adapted to the regularization parameter. The interpretation of EKI as a low-rank aproximation of Tikhonov regularization shows that it provides a trade-off between exactness and computational cost by shrinking the search space in which we try to reconstruct the unknown parameter x. This approach is suited for problems where the adjoint is not available and the noise is significant, since then the optimal regularization parameter \(\alpha \) will typically be larger and a good approximation to the Tikhonov-regularized solution can be achieved for relatively small sample sizes (see Fig. 6).
It is important to note that the dimensionality reduction in EKI is completely a-priori. It uses no knowledge about the forward operator L or the measurement \(\hat{y}\). This has the advantage that it also works in the case where the adjoint of L is not available. On the other hand, if one has access to the adjoint of L, one can compute a low-rank approximation of the whole operator \(C_{0}^{1/2} L^* R^{-1} L C_{0}^{1/2}\) instead [16]. This can yield superior results as it allows to also exploit the spectral decay of the forward operator L [55].
While EKI was originally developed for nonlinear inverse problems, our insights from the linear case—in particular the need for adapting the sample size to the noise level—can serve as an Ansatz for an analysis of EKI as a regularization method for nonlinear inverse problems.
After all, the basic ideas of ensemble methods are simple and constitute a very general way to obtain linear dimensionality reduction and algorithms for black-box inverse problems. Therefore, another natural direction of research is to study the resulting stochastic approximations of classical iterative regularization methods, such as the iteratively regularized Gauss-Newton iteration [3], and compare their performance to EKI for the case of nonlinear inverse problems.
References
Albani, V., Elbau, P., de Hoop, M.V., Scherzer, O.: Optimal convergence rates results for linear inverse problems in Hilbert spaces. Numer. Funct. Anal. Optim. 37(5), 521–540 (2016). ISSN: 0163-0563. https://doi.org/10.1080/01630563.2016.1144070
Andreev, R., Elbau, P., de Hoop, M.V., Qiu, L., Scherzer, O.: Generalized convergence rates results for linear inverse problems in Hilbert spaces. Numer. Funct. Anal. Optim. 36(5), 549–566 (2015). ISSN: 0163-0563. https://doi.org/10.1080/01630563.2015.1021422
Bakushinskii, A.B.: The problem of the convergence of the iteratively regularized Gaufi-Newton method. Comput. Math. Math. Phys. 32(9), 1353–1359 (1992)
Bauer, F., Hohage, T., Munk, A.: Iteratively regularized Gauss-Newton method for nonlinear inverse problems with random noise. SIAM J. Numer. Anal. 47(3), 1827–1846 (2009). ISSN: 0036-1429. https://doi.org/10.1137/080721789
Bishop, C.H., Etherton, B.J., Majumdar, S.J.: Adaptive sampling with the ensemble transform Kalman filter. Part I: theoretical aspects. Mon. Weather Rev. 129, 17 (2001)
Bissantz, N., Hohage, T., Munk, A., Ruymgaart, F.: Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Anal. 45(6), 2610–2636 (2007). ISSN: 0036-1429. https://doi.org/10.1137/060651884
Bogachev, V.: Gaussian Measures. Vol. 62. Mathematical Surveys and Monographs. American Mathematical Society (1998)
Burgers, G., van Leeuwen, P.J., Evensen, G.: Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev. 126, 1719–1724 (1998)
Chada, N.K., Stuart, A.M., Tong, X.T.: Tikhonov regularization within ensemble Kalman inversion. SIAM J. Numer. Anal. 58(2), 1263–1294 (2020). ISSN: 0036-1429. https://doi.org/10.1137/19m1242331
Chada, N., Tong, X.: Convergence acceleration of ensemble Kalman inversion in nonlinear settings. Math. Comput. (2021). ISSN: 0025-5718. https://doi.org/10.1090/mcom/3709
Davies, E.B.: Linear Operators and Their Spectra. Cambridge University Press (2007). ISBN: 9780511618864. https://doi.org/10.1017/cbo9780511618864
Drineas, P., Mahoney, M.W.: On the Nystrom method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. (JMLR) 6 (2005). ISSN: 1532-4435
Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936). https://doi.org/10.1007/bf02288367
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. In: Mathematics and Its Applications, vol. 375, p. viii-321. Kluwer Academic Publishers Group, Dordrecht. ISBN: 0-7923-4157-0 (1996)
Evensen, G.: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. 99(C5), 10143 (1994). ISSN: 0148-0227. https://doi.org/10.1029/94jc00572
Flath, H.P., Wilcox, L.C., Akcdik, V., Hill, J., van Bloemen Waanders, B., Ghattas, O.: Fast algorithms for bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations. SIAM J. Sci. Comput. 33(1), 407–432 (2011). ISSN: 1064-8275. https://doi.org/10.1137/090780717
Gittens, A., Mahoney, M.W.: Revisiting the Nystrom method for improved large-scale machine learning. J. Mach. Learn. Res. (JMLR) 1(17), 3977–4041 (2016)
Greybush, S.J., Kalnay, E., Miyoshi, T., Ide, K., Hunt, B.R.: Balance and ensemble Kalman filter localization techniques. Mon. Weather Rev. 139(2), 511–522 (2011). https://doi.org/10.1175/2010mwr3328.1
Groetsch, C.W.: Comments on Morozov’s discrepancy principle. In: Hammerlin, G., Hoffmann, K.H. (eds.) Improperly Posed Problems and Their Numerical Treatment, pp. 97–104. Birkhauser, Basel (1983)
Groetsch, C.W.: The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Pitman, Boston (1984)
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011). ISSN: 0036-1445. https://doi.org/10.1137/090771806
Hanke, M., Neubauer, A., Scherzer, O.: A convergence analysis of the Landweber iteration for nonlinear ill-posed problems. Numer. Math. 72(1), 21–37 (1995). ISSN: 0029-599X. https://doi.org/10.1007/s002110050158
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Hohage, T.: Regularization of exponentially ill-posed problems. Numer. Funct. Anal. Optim. 21(3–4), 439–464 (2000). ISSN: 0163-0563. https://doi.org/10.1080/01630560008816965
Houtekamer, P.L., Mitchell, H.L.: A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev. 129(1), 123–137 (2001). https://doi.org/10.1175/1520-0493(2001)129<0123:asekff>2.0.co;2
Houtekamer, P.L., Zhang, F.: Review of the ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev. 144(12), 4489–4532 (2016). https://doi.org/10.1175/mwr-d-150440.1
Iglesias, M.A.: Iterative regularization for ensemble data assimilation in reservoir models. Comput. Geosci. 19(1), 177–212 (2014). https://doi.org/10.1007/s10596-014-9456-5
Iglesias, M.A., Law, K.J.H., Stuart, A.M.: Ensemble Kalman methods for inverse problems. Inverse Probl. 29(4), 045001 (2013). ISSN: 0266-5611. https://doi.org/10.1088/0266-5611/29/4/045001
Kallenberg, O.: Foundations of Modern Probability. Springer, New York (2002). https://doi.org/10.1007/9781-4757-4015-8
Kaltenbacher, B., Neubauer, A., Scherzer, O.: Iterative Regularization Methods for Nonlinear Ill-Posed Problems. Vol. 6. Radon Series on Computational and Applied Mathematics. Walter de Gruyter, Berlin (2008). ISBN: 978-3-11-020420-9. https://doi.org/10.1515/9783110208276
Koltchinskii, V., Lounici, K.: Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23(1) (2017). ISSN: 1350-7265. https://doi.org/10.3150/15-bej730
Kovachki, N.B., Stuart, A.M.: Ensemble Kalman inversion: a derivative-free technique for machine learning tasks. Inverse Probl. 35(9), 095005 (2019). ISSN: 0266-5611. https://doi.org/10.1088/1361-6420/ab1c3a
Kroger, P.: Upper bounds for the Neumann eigenvalues on a bounded domain in Euclidean space. J. Funct. Anal. 106(2), 353–357 (1992). https://doi.org/10.1016/0022-1236(92)90052-k
Kuchment, P.: The Radon transform and medical imaging. In: CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (2013)
Kwiatkowski, E., Mandel, J.: Convergence of the square root ensemble Kalman filter in the large ensemble limit. SIAM/ASA J. Uncertain. Quantif. 3(1), 1–17 (2015). ISSN: 2166-2525. https://doi.org/10.1137/140965363
LeGland, F., Monbet, V., Tran, V.-D.: Large sample asymptotics for the ensemble Kalman filter. Research Report RR-7014. INRIA (2009)
Mirsky, L.: Symmetric gauge functions and unitarily invariant norms. Q. J. Math. 11(1), 50–59 (1960). https://doi.org/10.1093/qmath/11.1.50
Moritz, P., et al.: Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation. OSDI’18, pp. 561–577. USENIX Association, Carlsbad (2018). ISBN: 9781931971478
Nadeem, A., Potthast, R., Rhodin, A.: On sequential multiscale inversion and data assimilation. J. Comput. Appl. Math. 336, 338–352 (2018). https://doi.org/10.1016/j.cam.2017.08.013
Nakamura, G., Potthast, R.: Inverse Modeling. IOP Publishing, Bristol (2015)
Nakatsukasa, Y.: Fast and stable randomized low-rank matrix approximation. Preprint (2020)
Neubauer, A.: On converse and saturation results for Tikhonov regularization of linear ill-posed problems. SIAM J. Numer. Anal. 34(2), 517–527 (1997). ISSN: 0036-1429. https://doi.org/10.1137/s0036142993253928
Neubauer, A., Scherzer, O.: Finite-dimensional approximation of Tikhonov regularized solutions of nonlinear ill-posed problems. Numer. Funct. Anal. Optim. 11(1–2), 85–99 (1990). ISSN: 0163-0563. https://doi.org/10.1080/01630569008816362
Nystrom, E.J.: Uber Die Praktische Auflosung von Integralgleichungen mit Anwendungen auf Randwertaufgaben. Acta Math. 54, 185–204 (1930). ISSN: 0001-5962. https://doi.org/10.1007/bf02547521
Ott, E., Hunt, B.R., Szunyogh, I., Zimin, A.V., Kostelich, E.J., Corazza, M., Kalnay, E., Patil, D.J., Yorke, J.A.: A local ensemble Kalman filter for atmospheric data assimilation. Tellus A Dyn. Meteorol. Oceanogr. 56(5), 415–428 (2004). https://doi.org/10.3402/tellusa.v56i5.14462
Raanes, P.N., Stordal, A.S., Evensen, G.: Revising the stochastic iterative ensemble smoother. Nonlinear Process. Geophys. 26(3), 325–338 (2019). https://doi.org/10.5194/npg-26-325-2019
Rauch, H.E., Tung, F., Striebel, C.T.: Maximum likelihood estimates of linear dynamic systems. AIAA J. 3(8), 1445–1450 (1965). https://doi.org/10.2514/3.3166
Reich, S., Cotter, C.: Probabilistic Forecasting and Bayesian Data Assimilation (2015). https://doi.org/10.1017/cbo9781107706804
Scherzer, O.: A modified Landweber iteration for solving parameter estimation problems. Appl. Math. Optim. 38(1), 45–68 (1998). ISSN: 0095-4616. https://doi.org/10.1007/s002459900081
Scherzer, O.: A posteriori error estimates for the solution of nonlinear ill-posed operator equations. Nonlinear Anal. Theory Methods Appl. 45(4), 459–481 (2001). ISSN: 0362-546X https://doi.org/10.1016/S0362-546X(99)00413-7
Schillings, C., Stuart, A.M.: Analysis of the ensemble Kalman filter for inverse problems. SIAM J. Numer. Anal. 55(3), 1264–1290 (2017). ISSN: 0036-1429. https://doi.org/10.1137/16m105959x
Schillings, C., Stuart, A.M.: Convergence analysis of ensemble Kalman inversion: the linear, noisy case. Appl. Anal. 97(1), 107–123 (2017). ISSN: 0003-6811. https://doi.org/10.1080/00036811.2017.1386784
Schmidt, E.: Zur Theorie der linearen und nichtlinearen Integralgleichungen. Math. Ann. 63(4), 433–476 (1907). ISSN: 0025-5831. https://doi.org/10.1007/bf01449770
Shepp, L.A., Logan, B.F.: The Fourier reconstruction of a head section. IEEE Trans. Nucl. Sci. 21(3), 21–43 (1974). https://doi.org/10.1109/tns.1974.6499235
Spantini, A., Solonen, A., Cui, T., Martin, J., Tenorio, L., Marzouk, Y.: Optimal low-rank approximations of Bayesian linear inverse problems. SIAM J. Sci. Comput. 37(6), A2451–A2487 (2015). https://doi.org/10.1137/140977308
Tarvainen, T.: Quantitative photoacoustic tomography in Bayesian framework. In: Ramlau, R., Scherzer, O. (eds.) The Radon Transform: The First 100 Years and Beyond. Radon Series on Computational and Applied Mathematics, vol. 22, pp. 239–272. De Gruyter (2019). ISBN: 978-3-11-056085-5
Tippett, M.K., Anderson, J.L., Bishop, C.H., Hamill, T.M., Whitaker, J.S.: Ensemble square root filters. Mon. Weather Rev. 131(7), 1485–1490 (2003). https://doi.org/10.1175/1520-0493(2003)131<1485:esrf>2.0.co;2
Tong, X.T., Morzfeld, M.: Localization in ensemble Kalman inversion (2022). Preprint on ArXiv arXiv:2201.10821
Vidal, A.F., Pereyra, M.: Maximum likelihood estimation of regularisation parameters. In: 2018 25th IEEE International Conference on Image Processing (ICIP) (2018). https://doi.org/10.1109/icip.2018.8451795
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
van der Walt, S., Schonberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T., the scikit-image contributors: scikit-image: image processing in Python. PeerJ 453 (2014)
Weissman, S.: Gradient flow structure and convergence analysis of the ensemble Kalman inversion for nonlinear forward models (2022). Preprint on ArXiv arXiv:2203.17117
Acknowledgements
FP and OS are supported by the Austrian Science Fund (FWF) with Project I3661-N27 (Novel Error Measures and Source Conditions of Regularization Methods for Inverse Problems). Moreover, FP and OS are supported by the Austrian Science Fund (FWF), with SFB F68, Project F6807-N36 (Tomography with Uncertainties). The financial support by the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development and the Christian Doppler Research Association is gratefully acknowledged.
Funding
Open access funding provided by Austrian Science Fund (FWF).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A. Appendix: Random elements of Hilbert spaces
We recapitulate basic notions from probability theory on Hilbert spaces.
Definition A.1
(Random element, expectation, covariance) Let \((\Omega ,\mathcal {F},\mathbb {P})\) denote a probability space.
-
(i)
A random element of a real Hilbert space \(\mathbb {X}\) is a measurable function \(X: \Omega \rightarrow \mathbb {X}\). We call
$$\begin{aligned} \mathcal {M}(\mathbb {X}) := \left\{ X:\Omega \rightarrow \mathbb {X}: X \text { is measurable}\right\} . \end{aligned}$$(A.1) -
(ii)
A random continuous linear operator from \(\mathbb {X}\) to \(\mathbb {Y}\) is a measurable map \(A: \Omega \rightarrow \mathcal {L}(\mathbb {X};\mathbb {Y})\).
-
(iii)
The expectation of a random element X of \(\mathcal {M}(\mathbb {X})\) is defined as
$$\begin{aligned} \mathbb {E}\left[ X \right] = \int _\Omega X(\omega ) \,\text {d}\mathbb {P}(\omega ) \in \mathbb {X}\end{aligned}$$ -
(iv)
Furthermore, its covariance operator \( \text {Cov}\left( X \right) : \mathbb {X}\rightarrow \mathbb {X}\) is defined by
$$\begin{aligned} \text {Cov}\left( X \right) u = \int _\Omega \left<X(\omega )-\mathbb {E}\left[ X \right] ,u\right>_\mathbb {X}(X(\omega ) - \mathbb {E}\left[ X \right] ) \,\text {d}\mathbb {P}(\omega ), \qquad u \in \mathbb {X}. \end{aligned}$$ -
(v)
We call a random element \(X:\Omega \rightarrow \mathbb {X}\) of a Hilbert space \(\mathbb {X}\) Gaussian if for every continuous linear functional \(L \in \mathbb {X}^*\), \(LX: \Omega \rightarrow \mathbb {R}\) is a Gaussian random element of \(\mathbb {R}\). That is there exist \(\sigma _L > 0\) and \(m_L \in \mathbb {R}\) such that for all \(z \in \mathbb {R}\)
$$\begin{aligned} \mathbb {P}\left( \left\{ \omega :LX(\omega ) \le z\right\} \right) = \frac{1}{\sqrt{2\pi \sigma _L^2}} \int _{-\infty }^z \text {e}^{-\frac{(\xi -m_L)^2}{2 \sigma _L^2}} d\xi . \end{aligned}$$(A.2) -
(vi)
It can be shown that for every \(m \in \mathbb {X}\) and every positive and self-adjoint trace class operator C there exists a unique Gaussian random element X with \(\mathbb {E}\left[ X \right] = m\) and \( \text {Cov}\left( X \right) = C\). In that case, we will use the notation \(X \sim \mathcal {N}(m, C)\).
-
(vii)
Let \(\varvec{X} = (X_1,\ldots ,X_J) \in \mathcal (\mathbb {X})^J\) be a random ensemble. We call the mapping
$$\begin{aligned} \begin{aligned} \mathcal {C}(\varvec{X}^{\scriptscriptstyle (J)}): \mathbb {H}&\rightarrow \mathbb {H}, \\ v&\mapsto \frac{1}{J} \sum _{j=1}^J (X_j - {\bar{\mathbf{X}}} ) \left<X_j - {\bar{\mathbf{X}}},v\right> \end{aligned} \end{aligned}$$(A.3)the sample covariance.
Furthermore, we recall Markov’s inequality as it is used in the proof of Proposition 3.11 (see e.g. [29, lemma 4.1]).
Lemma A.2
(Markov) Let \(X: \Omega \rightarrow [0,\infty )\) be a nonnegative real-valued random variable, \(p \in [1, \infty )\) and \(a > 0\). Then
B. EKI with stochastic perturbations
The deterministic formulation of EKI that we considered in this paper (see Definition 1.6) is based on the ensemble-transform Kalman filter (ETKF) by Bishop, Etherton and Majumdar [5]. It was for example also studied in [10]. In contrast, the original formulation of EKI [28] was based on the EnKF with perturbation of measurements [8]. While the ETKF updates the current state estimate \(\hat{X}_k^{\scriptscriptstyle (J)} \) and the ensemble anomaly \( {\varvec{A}_k^{\scriptscriptstyle (J)}} \) directly, the EnKF iterates a complete ensemble \(\varvec{X}^{\scriptscriptstyle (J)}_k\) and updates each ensemble member individually. We call this variant the stochastic form of EKI:
Definition B.1
(Stochastic EKI) Given is \(R \in \mathcal {L}(\mathbb {Y};\mathbb {Y})\) and an ensemble
\(\varvec{X}^{\scriptscriptstyle (J)}_{0} = (X_{0,1},\ldots , X_{0,J})\) of independent and identically distributed random elements \(X_{0,1},\ldots ,X_{0,J}\).
-
Initialization: Set \(\varvec{C}_0 =\mathcal {C}(\varvec{X}^{\scriptscriptstyle (J)}_0)\) (see Eq. A.3).
-
Iteration (\(k \rightarrow k+1\)): Let \(\xi _{k,1},\ldots ,\xi _{k,J}\) be independent and identically distributed Gaussian random elements of \(\mathbb {Y}\) with \(\xi _{k,1},\ldots ,\xi _{k,J} \sim \mathcal {N}(0,R)\). For each \(j \in \lbrace 1,\ldots ,J \rbrace \), set
$$\begin{aligned} {\hat{X}}^{\scriptscriptstyle (J)}_{k+1,j} = {\hat{X}}^{\scriptscriptstyle (J)}_{k,j} + \varvec{C}_k^{\scriptscriptstyle (J)}L^* \left( L \varvec{C}_k^{\scriptscriptstyle (J)}L^* + R \right) ^{-1} (\hat{y}+ \xi _{k,j} - L {\hat{X}}^{\scriptscriptstyle (J)}_{k,j}), \end{aligned}$$(B.1)and then set \(\varvec{C}^{\scriptscriptstyle (J)}_{k+1} = \mathcal {C}(\varvec{X}^{\scriptscriptstyle (J)}_{k+1})\).
In the present paper, we have focused on the deterministic version of EKI given by Definition 1.6, since it has been observed to perform more reliably in practice as it does not introduce additional noise at every step of the iteration [10]. Nevertheless, both variants seem to be equivalent in the large ensemble-limit. In the linear case, this has been proven:
Proposition B.2
Suppose that Assumption 1.1 holds and \(C_{0}\) is in the trace class. Let \((\hat{X}_k^{\scriptscriptstyle (J)} )_{k=1}^\infty \) be the deterministic EKI iteration (see Definition 1.6), and \((\varvec{X}^{\scriptscriptstyle (J)}_k)_{k=1}^\infty \) be the stochastic EKI iteration (see Definition B.1). Let \(p \in [1,\infty )\) and \(k \in \mathbb {N}\). Then, we have
as \(J \rightarrow \infty \).
Proof
Compare [36, theorem 5.2] and [35, theorem 6.1]. \(\square \)
While we do not know of a corresponding proof in the nonlinear case, it has been observed in numerical experiments that also in that case both the deterministic form and the stochastic of the ensemble Kalman filter converge to the same limit [46].
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Parzer, F., Scherzer, O. On convergence rates of adaptive ensemble Kalman inversion for linear ill-posed problems. Numer. Math. 152, 371–409 (2022). https://doi.org/10.1007/s00211-022-01314-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-022-01314-y