Convergence Rates of First and Higher Order Dynamics for Solving Linear Ill-posed Problems

Recently, there has been a great interest in analysing dynamical flows, where the stationary limit is the minimiser of a convex energy. Particular flows of great interest have been continuous limits of Nesterov's algorithm and the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), respectively. In this paper we approach the solutions of linear ill-posed problems by dynamical flows. Because the squared norm of the residuum of a linear operator equation is a convex functional, the theoretical results from convex analysis for energy minimising flows are applicable. We prove that the proposed flows for minimising the residuum of a linear operator equation are optimal regularisation methods and that they provide optimal convergence rates for the regularised solutions. In particular we show that in comparison to convex analysis results the rates can be significantly higher, which is possible by constraining the investigations to the particular convex energy functional, which is the squared norm of the residuum.


Introduction
We consider the problem of solving a linear operator equation where L : X → Y is a bounded linear operator between (infinite dimensional) real Hilbert spaces X and Y. If the range of L is not closed, Equation 1.1 is ill-posed, see [13], in the sense that small perturbations in the data y can cause non-solvability of the Equation 1.1 or large perturbations of the corresponding solution of Equation 1.1 by perturbed right hand side. These undesirable effects are prevented by regularisation.
In this particular paper we consider dynamical regularisation methods for solving Equation 1.1. That is, we approximate the minimum norm solution x † of Equation 1.1 by the solution ξ of a dynamical system of the form see [13,Chapter 3.3]. We are now interested under which conditions the regularised solution ξ(t) can be guaranteed to converge to the solution x † as t → ∞ and how fast this convergence happens.
Studying first the case of exact dataỹ = y, it turns out that the convergence rate, that is, the decay of ξ(t) − x † 2 in the limit t → ∞, can be uniquely characterised by the spectral decomposition of the minimum norm solution x † with respect to the operator L * L, which allows us to get optimal convergence rates as a function of the "regularity" of the source x † . This regularity is usually described by so-called source conditions, the most common ones being of the form x † ∈ R((L * L) µ 2 ) for some µ > 0; we refer to [13,Chapter 2.2] and [9,Chapter 3.2] for an introduction to the use of those source conditions for obtaining convergence rates. Moreover, these convergence rates for exact data are seen to be in a one-to-one correspondence to certain convergence rates for perturbed data as the perturbation ỹ − y 2 goes to zero.
Outside the regularisation community source conditions might appear technical because they involve the operator L. However, it was demonstrated that for differential and integral operators L, these conditions very well coincide with smoothness conditions in Sobolev spaces. See for instance [14], where the analogy of smoothness and source conditions has been explained for the problem of numerical differentiation. For this analogy these conditions are also often termed smoothness conditions.
Some convergence rates from the literature are listed in Table 3.
Especially the vanishing viscosity method has recently been heavily investigated, see [29,8,5,6], for example, as it shows a faster convergence compared to the other two methods, and it was demonstrated to be a time continuous formulation of Nesterov's algorithm, see [20], providing an explanation of the rapid convergence of this algorithm. Consequently, it was not only studied in the form of Equation 1.5, but more generally with the right hand side (which in Equation 1.5 is the negative gradient of J 0 (x) = 1 2 Lx − y 2 ) replaced by the negative gradient of an arbitrary convex and differentiable functional J . But, since our theory relies on spectral analysis, we limit our discussion to the quadratic functional J 0 .
In terms of convergence rates, however, the discussions for general functionals J are often limited to the estimation of the convergence of J (ξ(t)) − min x∈X J (x), which for J = J 0 is given by 1 2 Lξ(t) − y 2 . In the well-posed case where the operator L has a bounded pseudoinverse L † , this convergence of the squared norm of the residual is equivalent to the convergence of the error ξ(t) − x † 2 , but this is no longer true in the ill-posed case where the pseudoinverse is unbounded. In contrast to this, our approach directly gives convergence rates for ξ(t) − x † 2 , which then imply a convergence (typically of higher order) of the squared norm of the residual.
We will proceed as follows:  [28], we remark that the condition y ∈ R(LL * ) given therein is equivalent to x † ∈ R((L * L) 1 2 ), which can be directly seen, for example, from the characterisation of the range of a dual operator given in [26,Lemma 8.31]. ) We also remark that in the well-posed case L † < ∞, the rates for ξ(t) − x † 2 and for Lξ(t)−y 2 are always the same, since Source Condition x † ∈ R((L * L)  Table 2. Convergence rates for the heavy ball method. Here, ε > 0 denotes an arbitrarily small parameter and we have set β( • In Section 2 we revisit convergence rates results of regularisation methods from [3], which, in particular, allow to analyse first and higher order dynamics.
• In the following sections we apply the general results of Section 2 to regularising flow equations. In Section 4 we derive well-known convergence rates results of Showalter's method and prove optimality of this method. In Section 5 we prove regularising properties, optimality and convergence rates of the heavy ball dynamical flow. In the context of inverse problems this method has already been analysed by [33], however not in terms of optimality, as it is done here.
• In Section 6 we consider the vanishing viscosity flow. We apply the general theory of Section 2 and prove optimality of this method. In particular we prove under source conditions (see for instance [13,9]) optimal convergence rates (in the sense of regularisation theory) for ξ(t) − x † 2 . These rates (and the resulting ones for the squared norm of the residual) are seen to interpolate nicely between the known rates in the well-posed (finite-dimensional) and those in the ill-posed setting when varying the regularity of the solution x † (via changing the parameter µ in Table 3).
We want to emphasise that the terminologies optimal from [7] (a representative reference for this field) and [3] differ by the class of problems and the amount of a priori information taken into account. In [7] best worst case error rates in the class of convex energies are derived, while we focus on squared functionals J . Moreover, we take into account prior knowledge on the solution. In view of this, it is not surprising that we get different "optimal" rates.

Generalisations of Convergence Rates Results
In the following we slightly generalise convergence rates and saturation results from [3] so that they can be applied to prove convergence of the second order regularising flows in Section 5 and Section 6. Thereby one needs to be aware that in classical regularisation theory, the regularisation parameter α > 0 is considered a small parameter, meaning that we consider small perturbations of Equation 1.1. For dynamic regularisation methods of the form of Equation 1.2 we take large times to approximate the stationary state. To link these two theories, we will apply an inverse polynomial identification of optimal regularisation time and regularisation parameter.

Source Condition
Parameters  Table 3. Convergence rates for the vanishing viscosity flow. As before, ε > 0 denotes an arbitrarily small parameter.
Let L : X → Y be a bounded linear operator between two real Hilbert spaces X and Y with operator norm L , y ∈ R(L), and let x † ∈ X be the minimum norm solution of Lx = y defined by is non-negative and monotonically decreasing on the interval (0, α); (iii) there exists for every α > 0 a monotonically decreasing, continuous functionR α : (0, ∞) → [0, 1] such that R α ≥ |r α | and α →R α (λ) is continuous and monotonically increasing for every fixed λ > 0; (iv) there exists for everyᾱ > 0 a constantσ ∈ (0, 1) such that Remark: The definition of the generator of a regularisation method differs from the one in [3] by allowing the regularisation method to overshoot, meaning that r α (λ) > 1 λ is possible at some points λ > 0 (the choice r α (λ) = 1 λ , which is not a regularisation method in the sense of Definition 2.1, would correspond to taking the inverse without regularisation, see Equation 2.3). Consequently, we also relaxed the assumption that the error functionr α is monotonically decreasing to the existence of a monotonically decreasing upper boundR α forr α . We also want to remark that in the definition of the error function in [3],r [3] α , there is an additional square included, that is,r [3] α =r 2 α . Definition 2.2 Let (r α ) α>0 be the generator of a regularisation method.
(i) The regularised solutions according to a generator (r α ) α>0 and dataỹ are defined by  (2.4) and the corresponding regularised solutions Remark: The family (R α ) α>0 is also a generator of a regularisation method, since we have The idea of these regularised solutions is to replace the unbounded inverse of L : N (L) ⊥ → R(L) by the bounded approximation x α , where the parameter α > 0 quantifies the regularisation. It should disappear in the limit α → 0, where we typically expect r α (λ) → 1 λ corresponding to x α (y) → (L * L) † L * y = x † (this is, however, not enforced by Definition 2.1, but we will add in Definition 2.9 a compatibility condition to ensure this).

Example 2.3
The most prominent regularisation method is probably Tikhonov regularisation, where the regularised solution x α (ỹ) is defined as the minimisation point of the Tikhonov functional Solving the optimality condition, gives us for x α (ỹ) the expression where I : X → X denotes the identity map on X , which has with r α (λ) := 1 λ+α the form of Equation 2.3 and r α satisfies all the conditions of Definition 2.1, see [3,Example 2.4].
We will show later in Section 4, Section 5, and Section 6 that also some common dynamical regularisation methods fall into this regularisation scheme so that all the convergence rates results from this section can be applied to these methods.

Definition 2.4
We denote by A → E A and A → F A the spectral measures of the operators L * L and LL * , respectively, on all Borel sets A ⊆ [0, ∞); and we define the right-continuous and monotonically increasing function We remark that the minimum norm solution x † is in the orthogonal complement of the null space N (L) of L and we therefore have Moreover, if f : (0, ∞) → R is a right-continuous, monotonically increasing, and bounded function, we write for the Lebesgue-Stieltjes integral of f , where µ f denotes the unique non-negative Borel measure defined by µ f ((λ 1 , We introduce the following quantities, whose behaviour we want to relate to each other: • the spectral tail of the minimum norm solution x † with respect to the operator L * L, that is, the asymptotic behaviour of e(λ) as λ tends to zero, see [21]; • the error between the minimum norm solution x † and the regularised solution x α (y) or X α (y) for the exact data y called the noise free regularisation error, that is, respectively, as α tends to zero; • the best worst case error between the minimum norm solution x † and the regularised solution x α (ỹ) or X α (ỹ) for some dataỹ with distance less than or equal to δ > 0 to the exact data y under optimal choice of the regularisation parameter α, that is, respectively, as δ tends to zero; • the noise free residual error, which is the error between the image of the regularised solution x α (y) or X α (y) and the exact data y, that is, respectively, as α tends to zero.
To describe the behaviour of these quantities, we consider, for example, convergence rates of the form with some constant C d > 0 for the noise free regularisation error d, characterised by the decay of a monotonically increasing function ϕ : (0, ∞) → (0, ∞) for α → 0, and look for a corresponding (equivalent) characterisation of the convergence rates of the other quantities, such as Example 2.5 Common families of functions ϕ used to describe the convergence rates are Hölder functions see [13], for example; and logarithmic for all µ > 0, (2.12) or even double logarithmic functions, see for instance [17,25]. See Figure 1 for a sketch of the graphs of these functions.
Noise free regularisation error for r α Equation 2.8 • The statements for the residual errors q and Q are then concluded from Theorem 2.21 by using the identification of q and Q for the minimum norm solution x † with the noise free errors d and D for the minimum norm solutionx † = (L * L) 1 2 x † of the problem Lx =ȳ withȳ = Lx † , and they are summarised in Corollary 2.24, Corollary 2.25, and Corollary 2.27.
In the remaining of this section, we will always consider (r α ) α>0 to be the generator of a regularisation method with an envelope (R α ) α>0 and corresponding regularised solutions (x α ) α>0 and (X α ) α>0 , respectively. Moreover, we use the functions e, d, D,d,D, q, and Q as defined in Definition 2.4, see Table 4 for a summary of the notation.

Spectral Representations of the Regularisation Errors.
To do the analysis, we will expand the quantities of interest with respect to the measure A → E A x † 2 , which describes the spectral decomposition of x † with respect to the operator L * L. With the function e defined in Equation 2.7, we can write the resulting integrals in the form of Lebesgue-Stieltjes integrals. for the residuals q and Q, respectively.
Proof: We can write the differences between one of the regularised solutions x α (y) or X α (y) and the minimum norm solution x † in the form respectively, where I : X → X denotes the identity map on X . According to spectral theory, we can formulate this with the definition of the error functionsr α andR α , see Equation 2.2 and Equation 2.4, as For the differences between the image of the regularised solution x α (y) or X α (y) and the exact data, we find similarly Thus, we have From this representation, we immediately get that the regularised solutions (x α ) α>0 and (X α ) α>0 converge to the minimum norm solution x † if the error functions (r α ) α>0 and (R α ) α>0 tend to zero as α → 0. Moreover, if lim α→0rα (λ) = 0 (or lim α→0Rα (λ) = 0, respectively) for every λ > 0, then the regularised solutions x α (y) (or X α (y), respectively) converge for α → 0 in the norm topology to the minimum norm solution x † .
Proof: By assumption, see Definition 2.1 (iii), α →R α (λ) is monotonically increasing, and so are the functions The monotonicity ofd andD follows directly from their definition in Equation 2.9 as suprema over the increasing setsB δ (y), δ > 0.

Bounds for the Noise Free Regularisation Errors.
The representations of the noise free regularisation errors as integrals over the spectral tail e allow us to characterise the convergence of the regularisation errors d(α) and D(α) in the limit α → 0 in terms of the behaviour of the spectral tail e(λ) for λ → 0.

Lemma 2.8
With the constant σ ∈ (0, 1) from Definition 2.1 (i), we have for every α > 0 the relation That is, (1 − σ) 2 times the spectral tail is a lower bound for the noise free regularisation error of the regularisation method, which in turn is a lower bound for the error of the regularisation method of the envelope generator.
Proof: Let α > 0 be fixed. With Equation 2.13 andR α ≥ |r α |, according to Definition 2.1 (iii), we find for the errors d and D that Furthermore, sincer 2 α is monotonically decreasing on [0, α], according to Definition 2.1 (ii), and e(λ) = e( L 2 ) for all λ ≥ L 2 , we can estimate Inserting the expression of Equation 2.2 forr α and using the upper bound from Definition 2.1 (i), we thus have Since we did not require so far that the error functionsr α andR α vanish as α → 0, we cannot assure that the regularised solutions x α (y) and X α (y) converge as α → 0 to the minimum norm solution or even get an upper bound on the regularisation errors d and D. We therefore impose the following additional constraint for a function ϕ to serve as an upper bound for the regularisation error.
.16 is exactly the condition from [3,Equation 7] for the error functionR α (there we assume thatr α satisfies Definition 2.1 (iii) and (iv) such that we can takeR α =r α ).
These sort of conditions for ensuring convergence rates of the method have a long history. For the special choice F (z) = Az −2 , it was introduced as qualification of the regularisation method in [19, Definition 1 and 2], which is now commonly used for characterising convergence rates, see [16,12], for example. Even before that, the condition was used for the convergence rates ϕ H µ , see, for example, the textbooks [30,

Then, with a monotonically decreasing and integrable function
That is, the order of the noise free regularisation error D of the envelope generator (R α ) α>0 is given by the function ϕ.
Proof: We first extend the function F toF : Taking for D the representation from Equation 2.13 and using thatF is monotonically decreasing, we get Then, the substitution z = e(λ) ϕ(α) gives us Remark: The result of Lemma 2.10 is analogous to [3,Proposition 2.3] where the noise free regularisation error produced by a generator (r α ) α>0 is estimated.
The compatibility condition in Equation 2.16 is essentially a way to measure if the regularisation method converges at each spectral value faster than a given convergence rate ϕ, see Equation 2.17. It is therefore not surprising that if some convergence rate is compatible with (r α ) α>0 , then all slower convergence rates are also compatible with it.
Proof: Let Λ > 0 be arbitrary. Since ψ is continuous and everywhere positive, we have the positive bounds By definition of ψ, this means that Since the functionF : [1, ∞) → R given byF (z) := F ( m M z) is also monotonically decreasing and integrable, this proves that ϕ 2 is compatible with (r α ) α>0 , too.
In particular, if one of the Hölder rates from Example 2.5 is compatible with (r α ) α>0 , then all the logarithmic rates are compatible.

Corollary 2.12
Let ϕ H µ and ϕ L µ , µ > 0, be the rates defined in Example 2.5. Then ϕ L µ is for every µ > 0 compatible with the regularisation method (r α ) α>0 in the sense of Definition 2.9 if there exists a parameter ν > 0 such that ϕ H ν is compatible with (r α ) α>0 .

Relation between Convergence Rates for Noise Free and for Noisy Data.
We will see that when applying the regularisation to noisy data, the convergence rates D give rise to convergence rates of the formD(δ) ≤ CDΦ[D](δ) for some constant CD > 0 and the transform Φ[D] of the function D which satisfies the equation system where we introduce the functionφ and writeφ −1 for the generalised inversê Remark: We emphasise that the considered functions need to be neither continuous nor surjective to be able to define a generalised inverse. In particular the functionê : (0, ∞) → [0, ∞), λ → λe(λ), with e defined in Equation 2.7, is only right-continuous and not surjective in general. Nevertheless, a generalised inverse exists.
We also note that if ϕ : (0, ∞) → [0, ∞) is a monotonically increasing function which is not everywhere zero and α 0 : Later on, we will apply this transform to the functions describing the convergence rates. We therefore calculate (at least in leading order) the noise-free to noisy transforms for the families of convergence rates introduced in Example 2.5.
Lemma 2.14 Let ϕ H µ and ϕ L µ be the functions introduced in Example 2.5. Then, we have for every µ > 0 that Proof: (i) We find directly from Definition 2.13 that Let us collect some elementary properties of the transform Φ before estimating the quantitiesd andD. Then, we have (ii) Since ϕ is right-continuous and monotonically increasing, it is upper semi-continuous and so isφ. Thus, the set {α > 0 |φ(α) ≥ δ} is closed and thereforeφ −1 (δ) = min{α > 0 |φ(α) ≥ δ}. In particular, we have that the inequalitŷ holds.

Bounds for the Best Worst Case Errors.
Let us finally come back to the functionsd andD, the best worst case errors of the regularisation methods defined by the generators (r α ) α>0 and (R α ) α>0 , respectively. Here we derive an estimate between the best worst case errors and the noise free regularisation errors.

Lemma 2.19
Let x † = 0. Then, we have with the constant σ ∈ (0, 1) from Definition 2.1 (i) that Proof: To estimate the distance between the regularised solutions for exact data y and inexact datã y ∈B δ (y), we define the Borel measure where F denotes the spectral measure of the operator LL * . Then, we get with Equation 2.6 the relation Thus, we have with Equation 2.1 the upper bound The triangular inequality gives us theñ (2.20) We estimate the infimum therein from above by the value at α :=D −1 (δ), where we setD(α) := αD(α).
Since the function D is according to Corollary 2.7 monotonically increasing and continuous, we get from Lemma 2.15 and Definition 2.13 the identity D( , so that both terms in the infimum are for this choice of α of the same order. This gives us Because of Equation 2.15, we get in the same waỹ where we used Equation 2.21 in the last inequality. The following lemma provides relations between the best worst case errorsd andD of the regularisation methods generated by (r α ) α>0 and (R α ) α>0 , respectively, and the spectral tail e.
in Equation 2.23 and obtain Here, we may drop the last term as it is non-negative, which gives us the lower bound .23 vanishes and we find again Therefore, we end up with Using Equation 2.6 and thatR α is by Definition 2.1 (iii) monotonically decreasing, we get the inequality and since we already proved in Lemma 2.8 that d ≥ (1 − σ) 2 e, we can estimate further Now, the first term is monotonically increasing in α and, since α →R α (λ) is for every λ > 0 monotonically increasing, see Definition 2.1 (iii), the second term is monotonically decreasing in α. Thus, we can estimate the expression for α < α δ from below by the second term at α = α δ , and for α ≥ α δ by the first term at α = α δ : Recalling that α δ =ê −1 (δ) and that the function e is right-continuous, we get from Lemma Since e is right-continuous and monotonically increasing, the infimum is achieved and we have that e(α 0 ) = e(α δ ). Moreover, α 0 ∈ σ(L * L), since e is constant on every interval in (0, ∞) \ σ(L * L) and so α 0 / ∈ σ(L * L) would imply that e(λ) = e(α δ ) for all λ ∈ (α 0 − ε, α 0 + ε) for some ε > 0 which would contradict the minimality of α 0 .

Optimal Convergence Rates.
Putting together all these results, we can characterise the convergence of the regularisation errors for noise free data and the best worst case errors equivalently in terms of the regularity of the minimum norm solution, concretely, in the behaviour of the spectral tail. And we have shown in [3] that this can also be written in the form of variational source conditions. (i) There exists a constant C e > 0 such that e(λ) ≤ C e ϕ(λ) for every λ > 0, meaning that the ratio of the spectral tail and the expected convergence rate is bounded.
(ii) There exists a constant C d > 0 such that d(α) ≤ C d ϕ(α) for every α > 0, meaning that the ratio of the noise free rate of the regularisation method and the expected convergence rate is bounded.
(iii) There exists a constant C D > 0 such that D(α) ≤ C D ϕ(α) for every α > 0, meaning that the ratio of the noise free rate of the envelope generated regularisation method and the expected convergence rate is bounded.
(iv) The expected convergence rate satisfies the variational source condition that there exists a constant If the function ϕ is additionally right-continuous and G-subhomogeneous in the sense that there exists a continuous and monotonically increasing function G : (0, ∞) → (0, ∞) such that We also remark that if ϕ is compatible with a regularisation method in the sense of Definition 2.9 and C > 0, then Cϕ is compatible with the regularisation method.

(iii) =⇒ (v):
Since D ≤ C D ϕ, we get from Lemma 2.16 and Lemma 2.17 that Now, using the assumption from Equation 2.29, we find with Lemma 2.18 that We therefore get from Lemma 2.19 that where σ ∈ (0, 1) is the constant from Definition 2.1 (i).
(iii) =⇒ (vi): As before, Lemma 2.19 implies Remark: We note that the conditions in Theorem 2.21 (ii), (iii), (v), and (vi) are convergence rates for the regularised solutions, which are equivalent to the spectral tail condition in Theorem 2.21 (i) and to the variational source conditions in Theorem 2.21 (iv). We also want to stress, and this is a new result in comparison to [3], that this holds for regularisation methods (r α ) α>0 whose error functionsr α are not necessarily non-negative and monotonically decreasing and that this also enforces optimal convergence rates for the regularisation methods generated by the envelopes (R α ) α>0 .
The first work on equivalence of optimality of regularisation methods is [21], which has served as a basis for the results in [3]. The equivalence of the optimal rate in Theorem 2.21 (i) and the variational source condition in Theorem 2.21 (iv) has been analysed in a more general setting in [15,11,12,10] In particular, all the equivalent statements of is fulfilled.
Let us finally take a look at the additional condition of G-subhomogeneity introduced in Equation 2.29 in Theorem 2.21 to prove optimal convergence rates for the best worst case errors and check that the convergence rates from Example 2.5 satisfy this condition. Proof: (i) We clearly have ϕ H µ (γα) = γ µ ϕ H µ (α) for all γ > 0 and α > 0.
Proof: We first remark that since Proof: The first inequality follows with Definition 2.1 (iii) directly from the representation in Equation 2.14 for q and Q: Sinceē tends by definition faster to zero than the identityφ : (0, ∞) → (0, ∞),φ(α) = α, the noise free residual errors q and Q also convergence (without imposing an additional source condition) faster than the identity provided thatφ is compatible with (r α ) α>0 .

Corollary 2.27
If the convergence rateφ : (0, ∞) → (0, ∞),φ(α) = α is compatible with (r α ) α>0 in the sense of Definition 2.9, then we have that Proof: Since q ≤ Q, see Equation 2.34, it is enough to prove it for the function Q. We defineē as in Equation 2.30 and differ between two cases.
• Ifē(λ) = 0 for all λ ∈ [0, λ 0 ] for some λ 0 > 0, then we estimate, using the integral representation for Q from Equation 2.31, Sinceφ is compatible to (r α ) α>0 , we known from Equation 2.17 that • Ifē(λ) > 0 for all λ > 0, then we first construct using the compatibility ofφ, as in the proof of Lemma 2.10, a monotonically decreasing and integrable functionF : Next, we pick a monotonically increasing function f : And since this holds for arbitrary ε > 0, we see that -For the second term in Equation 2.39, we remark that Equation 2.40 also implies that there exists a constant C > 0 withē (λ) ≤ Cλ for all λ > 0.
Thus, we find with the substitution z =ē (λ) Cα that According to our choice of f , see Equation 2.38, the integral converges to zero for α → 0 and we therefore obtain The results of this section explain the interplay of the convergence rates of the spectral tail of the minimum norm solution, the noise free regularisation error, and the best worst case error. For these different concepts equivalent rates can be derived. Moreover, these rates also infer rates for the noise free residual error. In addition to standard regularisation theory, we proved rates on the associated regularisation method defined in Equation 2.4.

Spectral Decomposition Analysis of Regularising Flows
We now turn to the applications of these results to the method in Equation 1.2 with some continuous functions a k ∈ C((0, ∞); R), k = 0, . . . , N − 1. We hereby consider the solution as a function of the possibly not exact dataỹ ∈ Y. Thus, we look for a solution ξ : such that ξ(·;ỹ) is N times continuously differentiable for everyỹ.
The following proposition provides an existence and uniqueness of the solution of flows of higher order. In case that the coefficients a k are in C ∞ ([0, ∞); R) the result can also be derived simpler from an abstract Picard-Lindelöf theorem, see, for example, [18, Section II.2.1]. However, in our case a k might also have a singularity at the origin, such as in Equation 1.5, and the proof gets more involved.

Proposition 3.1
Let N ∈ N andỹ ∈ Y be arbitrary, and let A → E A denote the spectral measure of the operator L * L.

Assume that the initial value problem
Then, the function ξ(·;ỹ), given by which proves that ∂ k t ρ is for every k ∈ {0, . . . , N } continuous in [0, ∞) × [0, ∞). • Next, we are going to show that the function ξ is N times continuously differentiable with respect to t and that its partial derivatives are for every k ∈ {0, . . . , N } given by To see this, we assume by induction that Equation 3.6 holds for k = for some ∈ {0, . . . , N − 1}. Then, we get with the Borel measure  Finally, the continuity of the N th derivative ∂ N t ξ follows in the same way directly from Lebesgue's dominated convergence theorem: • To prove that ξ solves Equation 3.1, we plug the definition of ρ from Equation 3.3 into Equation 3.6 and find Making use of Equation 3.2, we get that ξ fulfils Equation 3.1a: And for the initial conditions, we get, in agreement with Equation 3.1b, from Equation 3.6 that So assume that we have two different solutions of Equation 3.1 and call ξ 0 the difference between the two solutions. We choose an arbitrary t 0 > 0 and write ∂ k t ξ 0 (t 0 ;ỹ) = ξ (k) for every k ∈ {0, . . . , N − 1}. Then, ξ 0 is a solution of the initial value problem with the functions ρ solving for every λ ∈ [0, ∞) the initial value problems (Since a k is continuous on (0, ∞), Lebesgue's dominated convergence theorem is applicable to every compact set [t 1 , Now, we have for every measurable subset A ⊂ [0, ∞) and every k ∈ {0, . . . , N − 1} that where the signed measures µ η1,η2 , η 1 , η 2 ∈ X , are defined by µ η1,η2 (A) = η 1 , E A η 2 .

Showalter's method
Showalter's method, given by Equation 1.3, is the gradient flow method for the functional J . According to Proposition 3.1, we rewrite it as a system of first order ordinary differential equations for the error functionρ of the spectral values λ of L * L, which in this particular case reads

Lemma 4.1 The solutionρ of Equation 4.1 is given bỹ
In particular, the solution of Showalter's method, that is, the solution of Equation 3.1 with N = 1, is given by Next, we want to show that, by identifying α = 1 t as regularisation parameter, the solution ξ( 1 α ;ỹ) is a regularised solution of the equation Lx = y in the sense of Definition 2.2. For the verification of the property in Definition 2.1 (i) of the regularisation method, it is convenient to be able to estimate the function 1 − e −z by √ z.

Lemma 4.2
There exists a constant σ 0 ∈ (0, 1) such that Proof: We consider the function f : Since lim z→0 f (z) = 0 and lim z→∞ f (z) = 0, f attains its maximum at the only critical point z 0 > 0 given as the unique solution of the equation where the uniqueness follows from the convexity of the exponential function. Since 2z + 1 > e z at z = 1, we know additionally that z 0 > 1. Therefore, we have in particular In order to show that Showalter's method is a regularisation method we verify now all the assumptions in Definition 2.1.

Proposition 4.3 Letρ be the solution of Equation 4.1 given in Equation 4.2. Then, the functions (r α ) α>0 defined by
generate a regularisation method in the sense of Definition 2.1.
Proof: We verify that (r α ) α>0 satisfies the four conditions from Definition 2.1.
To prove the second part of the inequality Definition 2.1 (i), we use Lemma 4.2 and find where σ 0 ∈ (0, 1) denotes the constant found in Lemma 4.2.
Finally, we check that the common convergence rate functions are compatible with this regularisation method. Proof: According to Corollary 2.12, it is enough to prove that ϕ H µ is for arbitrary µ > 0 compatible with (r α ) α>0 . To see this, we remark that We have thus shown that we can apply Theorem 2.21 to the regularisation method which is induced by Equation 1.3, that is, the regularisation method generated by the functions (r α ) α>0 defined in Equation 4.5, and the convergence rate functions ϕ H µ or ϕ L µ for arbitrary µ > 0. This gives us optimal convergence rates under variational source conditions as defined in Equation 2.28, for example.
However, to compare with the literature, see [9,Example 4.7], we formulate the result under the slightly stronger standard source condition, see Proposition 2.22.

Corollary 4.5
Let y ∈ R(L) be given such that the corresponding minimum norm solution x † ∈ X , fulfilling Lx † = y and x † = inf{ x | Lx = y}, satisfies for some µ > 0 the source condition and (iii) there exists a constant C 3 > 0 such that Proof: We consider the regularisation method defined by the functions (r α ) α>0 from Equation 4.5. We have already seen in Lemma 2.23 and Lemma 4.4 that the function ϕ H µ (α) = α µ is G-subhomogeneous in the sense of Equation 2.29 with G(γ) = γ µ and compatible with the regularisation method given by (r α ) α>0 .
(i) According to Proposition 2.22 and Theorem 2.21 with the convergence rate function ϕ = ϕ H µ , the source condition in Equation 4.6 implies the existence of a constant C d such that Thus, by definition of d, we have that (ii) According to Theorem 2.21, we also find a constant Cd such that where Φ denotes the noise-free to noisy transform defined in Definition 2.13 andd is given by Equation 2.9 with the regularised solution x α given by Equation 4.7. Therefore, we have that (iii) Furthermore, Theorem 2.21 implies that there is a constant C e > 0 such that e(λ) ≤ C e ϕ H µ (λ). In particular, we then have λe(λ) ≤ ϕ H µ+1 (λ). And since ϕ H µ+1 is by Lemma 4.4 compatible with (r α ) α>0 , we can apply Corollary 2.25 and find a constant C > 0 such that the function q, defined in Equation 2.10 with the regularised solution x α as in Equation 4.7, fulfils Thus, by definition of q, we have We emphasise that for Showalter's method we did not make use of the extended theory involving envelopes of regularisation methods (cf. Definition 2.2), and this theory could have been developed also with the regularisation results from [3].
see Figure 2. In particular, the solution of Equation 1.4 is given by where A → E A denotes the spectral measure of L * L.

Proof: The characteristic equation of Equation 5.1 is
and has the solutions Thus, for λ < b 2 4 , we have the solutioñ for λ > b 2 4 , we get the oscillating solutioñ and for λ = b 2 4 , we haveρ (t; λ) = e − bt 2 (C 1 (λ) + C 2 (λ)t). Plugging in the initial conditionρ(0; λ) = 1, we find that C 1 (λ) = 1 for all λ > 0, and the initial condition ∂ tρ (0; λ) = 0 then implies , and Moreover, sinceρ is smooth and the unique solution of Equation  To see that this solution gives rise to a regularisation method as introduced in Definition 2.1, we first verify that the function λ →ρ(t; λ), which corresponds to the error functionr α in Definition 2.1, is non-negative and monotonically decreasing for sufficiently small values of λ as required forr α in Definition 2.1 (ii).
• We remark that the function is non-negative and fulfils for arbitrary τ > 0 that since tanh(z) ≤ z for all z ≥ 0. Thus, writing the functionρ for λ ∈ (0, b 2 4 ) with the function β − given by Equation 5.3 in the formρ Therefore, the function λ →ρ(t; λ) is non-negative and monotonically decreasing on (0, b 2 4 ).
In a next step, we introduce the functionP (t; ·) as a correspondence to the upper boundR α and show that it fulfils the properties necessary for Definition 2.1 (iii). Proof: Sinceρ(t; λ) =P (t; λ) for λ ≤ b 2 4 for every t > 0, we only need to consider the case λ > b 2 4 . Using that |cos(z)| ≤ 1 and |sin(z)| ≤ |z| for all z ∈ R, we find with β + as in Equation 5.3 for every λ > b 2 4 and every t > 0 that
Using further that t →P (t; λ) is strictly decreasing, see Lemma 5.4, we have for every α > 0 that To be able to apply Theorem 2.21 for the regularisation method generated by (r α ) α>0 from Equation 5.9 to the common convergence rates ϕ H µ and ϕ L µ , it remains to show that they are compatible with (r α ) α>0 .

Lemma 5.8
The functions ϕ H µ and ϕ L µ defined in Example 2.5 are for all µ > 0 compatible with the regularisation method (r α ) α>0 defined by Equation 5.9 in the sense of Definition 2.9.
Proof: We know from Corollary 2.12 that we only need to prove the statement for ϕ H µ for every µ > 0. The functionR α defined in Equation 5.10 fulfils according to Lemma 5.5 for arbitrary Λ > 0 that where Ψ Λ is given by Equation 5.6. Since z → Ψ 2 We can therefore apply Theorem 2.21 to the regularisation method induced by Equation 1.4, which is the regularisation method generated by the functions (r α ) α>0 defined in Equation 5.9, and the convergence rate functions ϕ H µ or ϕ L µ for arbitrary µ > 0. Thus, although the functions t →ρ(t; λ) and λ →ρ(t; λ) are not monotonic, we obtain optimal convergence rates of the regularisation method under variational source conditions such as in Equation 2.28.
If we formulate it with the stronger standard source condition, see Proposition 2.22, we can reproduce a result similar to [33, Theorem 5.1].

Corollary 5.9
Let y ∈ R(L) be given such that the corresponding minimum norm solution x † ∈ X , fulfilling Lx † = y and x † = inf{ x | Lx = y}, satisfies for some µ > 0 the source condition Then, if ξ is the solution of the initial value problem in Equation 1.4, (i) there exists a constant C 1 > 0 such that and (iii) there exists a constant C 3 > 0 such that Proof: The proof follows exactly the lines of the proof of Corollary 4.5, where the compatibility of ϕ H µ is shown in Lemma 5.8 and we have here the slightly different scaling

The Vanishing Viscosity Flow
We consider now the dynamical method Equation 1.2 for N = 2 with the variable coefficient a 1 (t) = b t for some parameter b > 0, that is, Equation 1.5. According to Proposition 3.1, the solution of Equation 1.5 is defined via the spectral integral in Equation 3.4 of ρ(t; λ) = 1−ρ(t;λ) λ , whereρ solves for every λ ∈ (0, ∞) the initial value problem where Γ is the gamma function and J ν denotes the Bessel function of first kind of order ν ∈ R. See Figure 3 for a sketch of the graph of the function u.
Proof: We rescale Equation 6.1 by switching to the function with some parameters σ λ ∈ (0, ∞) and κ ∈ R. The function v thus has the derivatives and . We use Equation 6.1 to replace the second derivative ofρ and obtain which, after writing ∂ tρ andρ via Equation 6.4 and Equation 6.3 in terms of the function v, becomes the differential equation for the function v. Choosing now κ = 1 2 (b − 1), so that b − 2κ = 1, and σ λ = 1 √ λ , we end up with Bessel's differential equation for which every solution can be written as for some constants C 1,κ , C 2,κ ∈ R, where J ν and Y ν denote the Bessel functions of first and second kind of order ν ∈ R, respectively; see, for example, [1, Chapter 9.1].
We can therefore write the solutionρ as To determine the constants C 1,κ and C 2,κ from the initial conditions, we remark that the Bessel functions have for all κ ∈ R \ (−N) and all n ∈ N asymptotically for τ → 0 the behaviour We consider the cases κ ≥ 0 and κ ∈ (− 1 2 , 0) separately.
• In particular, the relations in Equation 6.6 imply that, for the last terms in Equation 6.5, we have with τ = t √ λ asymptotically for τ → 0 -for κ = 0: because of the third relation in Equation 6.6; for κ ∈ N: because of the second relation in Equation 6.6; and for κ ∈ (0, ∞) \ N: because of the first relation in Equation 6.6.
Thus, the last terms in Equation 6.5 diverge for every κ ≥ 0 as t → 0.
Since the first terms in Equation 6.5 converge according to the first relation in Equation 6.6 for t → 0, the initial conditionρ(0; λ) = 1 can only be fulfilled if the coefficients C 2,κ , κ ≥ 0, in front of the singular terms are all zero so that we havẽ Furthermore, the initial conditionρ(0; λ) = 1 implies according to the first relation in Equation 6.6 that C 1,κ = 2 κ Γ(κ + 1) for all κ ≥ 0, which gives the representation of Equation 6.2 for the solutionρ.
It remains to check that also the initial condition ∂ tρ (0; λ) = 0 is for all κ ≥ 0 fulfilled, which again follows directly from the first relation in Equation 6.6:   Figure 3. Graph of the function u, defined in Equation 6.2, which gives the solutionρ of Equation 6.1 viaρ(t; λ) = u(t √ λ). As in the heavy ball method, the functionρ is not monotonic (in either component) so that we used the envelope of the regularisation method to obtain the optimal convergence rates.

Corollary 6.2
The unique solution ξ : [0, ∞) × Y → R of the vanishing viscosity flow, Equation 1.5, which is twice continuously differentiable with respect to t is given by where the function u is defined by Equation 6.2.
Proof: We have already seen in Lemma 6.1 that Equation 6.1 has the unique solutionρ given bỹ ρ(t; λ) = u(t √ λ). To apply Proposition 3.1, it is thus enough to show thatρ is smooth.
As before, we also verify that the classical convergence rate functions ϕ H µ and ϕ L µ are compatible with the regularisation method (r α ) α>0 . In contrast to Showalter's method and the heavy ball method, the compatibility for ϕ H µ only holds up to a certain saturation value for the parameter µ.

Lemma 6.7
The functions ϕ H µ for all µ ∈ (0, b 2 ) and the functions ϕ L µ for all µ > 0, as defined in Example 2.5, are compatible with the regularisation method (r α ) α>0 defined by Equation 6.8 in the sense of Definition 2.9.
Proof: As before, it is because of Corollary 2.12 enough to check this for the functions ϕ H µ , µ ∈ (0, b 2 ). The functionR α defined in Equation 6.9 fulfils according to Lemma 6.4 that there exists a constant C > 0 withR 2 , which is Equation 2.16 with the compatibility function F µ (z) = C 2 τ −b b z − b 2µ . It remains to check that F µ : [1, ∞) → R is integrable, which is the case for µ < b 2 .
We can therefore apply Theorem 2.21 to the regularisation method generated by the functions (r α ) α>0 defined in Equation 6.8 and the convergence rates ϕ H µ , µ ∈ (0, b 2 ), and ϕ L µ , µ > 0. By using that we have by construction x α (ỹ) = ξ( τ b √ α ;ỹ), see Equation 6.10 below, this gives us equivalent characterisations for convergence rates of the flow ξ of Equation 1.5. As before for Showalter's method and the heavy ball method, we formulate the resulting convergence rates under the stronger, but more commonly used standard source condition, see Proposition 2.22. Corollary 6.8 Let y ∈ R(L) be given such that the corresponding minimum norm solution x † ∈ X , fulfilling Lx † = y and x † = inf{ x | Lx = y}, satisfies for some µ ∈ (0, b 2 ) the source condition Then, if ξ is the solution of the initial value problem in Equation 1.5, (i) there exists a constant C 1 > 0 such that ξ(t; y) − x † 2 ≤ C 1 t 2µ for all t > 0; (i) In the above referenced paper optimality is considered with respect to all possible smooth and convex functionals J , while in our work optimality is considered with respect to all possible variations of y only. The papers [29,7,4] consider a finite dimensional setting where J maps a subset of a finite dimensional space R d into the extended reals.
(ii) The second difference in the optimality results is that we consider primarily optimal convergence rate of ξ(t) − x † for t → ∞ and not of J (ξ(t)) → min x∈X J (x), that is, we are considering rates in the domain of L, while in the referenced papers convergence in the image domain is considered.
Consequently, we get rates for the residual squared (which is the rate of J(ξ(t)) in the referenced papers), which are based on optimal rates (in the sense of this paper) for ξ(t) − x † → 0. The presented rates in the image domain are, however, not necessarily optimal.
Nevertheless, it is very interesting to note that the two cases b ≥ 3 and 0 < b < 3, referred to as heavy and low friction cases, do not result in a different analysis in our paper, compared to, for instance, [4]. This is of course not a contradiction, because we consider a different optimality terminology.

Conclusions
The paper shows that the dynamical flows provide optimal regularisation methods (in the sense explained in Section 2). We proved optimal convergence rates of the solutions of the flows to the minimum norm solution for t → ∞ and we also provide convergence rates of the residuals of the regularised solutions.
We observed that the vanishing viscosity method, heavy ball dynamics, and Showalter's method provide optimal reconstructions for different times. In fact, eventually, for a fair numerical comparison of the results of all three methods one should compare the results of Showalter's method and the heavy ball dynamics, respectively, at time t 2 0 with the vanishing viscosity flow at time t 0 . Acknowledgements.
• RB acknowledges support from the Austrian Science Fund (FWF) within the project I2419-N32 (Employing Recent Outcomes in Proximal Theory Outside the Comfort Zone). • PE and OS are supported by the Austrian Science Fund (FWF), with SFB F68, project F6804-N36 (Quantitative Coupled Physics Imaging) and project F6807-N36 (Tomography with Uncertainties).
• OS acknowledges support from the Austrian Science Fund (FWF) within the national research network Geometry and Simulation, project S11704 (Variational Methods for Imaging on Manifolds) and I3661-N27 (Novel Error Measures and Source Conditions of Regularization Methods for Inverse Problems).