Computable upper error bounds for Krylov approximations to matrix exponentials and associated \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\varphi }}$$\end{document}φ-functions

An a posteriori estimate for the error of a standard Krylov approximation to the matrix exponential is derived. The estimate is based on the defect (residual) of the Krylov approximation and is proven to constitute a rigorous upper bound on the error, in contrast to existing asymptotical approximations. It can be computed economically in the underlying Krylov space. In view of time-stepping applications, assuming that the given matrix is scaled by a time step, it is shown that the bound is asymptotically correct (with an order related to the dimension of the Krylov space) for the time step tending to zero. This means that the deviation of the error estimate from the true error tends to zero faster than the error itself. Furthermore, this result is extended to Krylov approximations of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi $$\end{document}φ-functions and to improved versions of such approximations. The accuracy of the derived bounds is demonstrated by examples and compared with different variants known from the literature, which are also investigated more closely. Alternative error bounds are tested on examples, in particular a version based on the concept of effective order. For the case where the matrix exponential is used in time integration algorithms, a step size selection strategy is proposed and illustrated by experiments.


Introduction
We consider Krylov approximations to the matrix exponential function for the purpose of the solution of a linear, homogeneous system of differential equations The complex-valued matrix M commonly results from the discretization of a partial differential equation. In this work we present new results for precise a posteriori error estimation, which also extend to the evaluation of so-called ϕ-functions. The application of these estimates for the purpose of time propagation is also discussed and illustrated. Theoretical results are verified by numerical experiments, which are classified into Hermitian (dissipative), skew-Hermitian (Schrödinger-type) and general non-normal problems.

Overview of existing approaches and results
The approximate evaluation of large matrix exponential functions is a topic which has been extensively treated in the numerical analysis literature, for basic reference see e.g., [18,35]. A standard approach is to project the given matrix M to a low-dimensional Krylov space via Arnoldi or Lanczos iteration, and to directly exponentiate the projected small matrix. A first mention of the Lanczos approach can be found in [42], where it is also recognized that for the method to perform satisfactorily, the time-steps have to be controlled. However, the control mechanism from [42] is not very elaborate and is based on a series expansion of the error, which is only valid in the asymptotic regime, see for instance [39]. For discretizations of parabolic problems, [17] uses an error estimator to choose the step-size, this approach is improved in [48] and has been generalized in [34]. Notably, in the latter reference a strict error bound is used to estimate the time-step instead of asymptotic techniques. It is argued in [34] that the strategy from [34] performs better than [33] and better in turn than [42].
A first systematic study of Krylov-based methods for the matrix exponential function was given in [45]. The error is analyzed theoretically, yielding both a priori and computable a posteriori estimates. The analysis there relies on approximation theory and yields a priori error bounds which are asymptotically optimal in the dimension of the Krylov subspace in important situations. The analysis moreover implies correction schemes to lift the convergence order which are cheap to compute based on the already available information. The error expansion also suggests a posteriori error estimators resorting to the leading error term. This approach relies on the assumption of the sufficiently rapid decay of the series representation of the error. A recent generalization of this work together with a more rigorous justification is given in [30]. For early studies of a priori error estimates see also [10,12].
A thorough theoretical analysis of the error of Krylov methods for the exponential of a Hermitian or skew-(anti-) Hermitian matrix was given in [24]. The analysis derives an asymptotic error expansion and shows superlinear error decay in the dimension m of the approximation subspace for sufficiently large m. These results are further improved in [4]. In [24], a posteriori error estimation is also discussed. This topic is furthermore addressed in [31]. There, the Krylov approximation method is interpreted as a Galerkin method, whence an error bound can be obtained from an error representation for this variational approximation. This yields a computable estimate via a quadrature approximation of the error integral involving the defect of the numerical approximation. The a priori error analysis reveals a step-size restriction for the convergence of the method, which is less stringent when the subspace dimension is larger.
Further work in the direction of controlling the Lanczos process through information gained from the defect is given in [7]. The defect is a scalar multiple of the successive Krylov vector arising in the iteration and can be evaluated efficiently. If the error is approximated by a Galerkin approach, the resulting estimator corresponds to the difference of two Lanczos iterates. For the purpose of practical error estimation, in [7] it is seen as preferable to continue the original Krylov process. Some other defectbased upper bounds for the error of the matrix exponential are given in [30], including a closer analysis of the error estimate of [45]. These results still require some a priori information on the matrix spectrum.
Various improved methods for computing the matrix exponential function are given in the literature, for example restarted methods, deflated restarting methods or quadrature based restarting methods, see [1,15,16].
It has also been advocated in [49] to use preconditioning in the Lanczos method by a shifted inverse in order to get a good approximation of the leading invariant subspaces. The shift-and-invert approach (a specific choice to construct a rational Krylov subspace) for the matrix exponential function was introduced earlier in [37]. However, the choice of the shift is critical for the success of this procedure. This strategy amounts to a transformation of the spectrum which grants a convergence speed which is independent of the norm of the given matrix. In [49], a posteriori error estimation based on the asymptotical expansion of the error is advocated as well. We note that our results do not immediately carry over to the shift-and-invert approach, see Remark 3. Overview of present work In Sect. 2 we introduce the Krylov approximation and the integral representation of the approximation error in terms of its defect. In Sect. 3 we derive a new computable upper bound for the error by using data available from the Krylov process with negligible additional computational effort (Theorem 1). This upper bound is cheap to evaluate and update on the fly during the Lanczos iteration. It is also asymptotically correct, i.e., for t → 0 the error of the error estimator tends to zero faster asymptotically than the error itself. In Sect. 4 these results are extended to the case where the Krylov approach is employed to approximate the ϕ-functions of matrices (generalizing the exponential function), see Theorem 2. In Sect. 5, improved approximations derived from a corrected Krylov process [45] are discussed, and corresponding error estimators are analyzed, including an asymptotically correct true upper bound on the error (Theorem 3). This approach can be used to increase the order, but it has the drawback of violating mass conservation. In Proposition 6 error estimates are particularized to the Hermitian case. Another view on defect-based error estimation is presented in Sect. 6. Section 7 is devoted to practical application of the various error estimators for the control of the time steps t including smaller substeps Δt if it appears indicated.
In Sect. 8 we present numerical results for a finite difference discretization of the  free Schrödinger equation, a Hubbard model of solar cells, the heat equation, and a convection-diffusion problem, illustrating our theoretical results. Additional practical aspects are also investigated: A priori estimates and the role of restarting are discussed in particular in the context of practical step-size adaptation. Finally, we demonstrate the computational efficiency of our adaptive strategy.

Problem setting, Krylov approximation, and defect-based representation of the approximation error
We discuss the approximation of the matrix exponential, with step size t, applied to an initial vector v ∈ C n . To simplify the notation we assume |σ | = 1 and v 2 = 1 without loss of generality. In many relevant applications (Schrödinger-type problems) a complex prefactor is applied to the matrix A.
The parameter σ is introduced here to separate the prefactor of the matrix A. The standard notation for Schrödinger-type problems is obtained in (2.1) with σ = −i and a Hermitian matrix A. For such problems our notation is helpful to simplify the construction of the Krylov subspace.
The exponential E(t) = e σ t A satisfies the matrix differential equation We assume that μ 2 (σ A) ≤ 0, where μ 2 (σ A) denotes the logarithmic norm of σ A, or equivalently, W (σ A) ⊆ C − , where W (σ A) denotes the field of values of σ A and we will refer to this assumption as the nonexpansive case. Following [23,Theorem 10.11] and associated references we conclude that μ 2 (σ A) ≤ 0 implies E(t) 2 ≤ 1 for t ≥ 0. This is essentially a technical assumption, and most of our theoretical results carry over to a more general setting, in particular if a priori information about μ 2 (σ A) is available, such that E(t) can be estimated as E(t) 2 ≤ e μ 2 (σ A) .
For the skew-Hermitian case with σ = −i we write 1 In this case, E(t) represents a unitary evolution, i.e., E(t) 2 = 1.
Krylov subspaces and associated identities The numerical approximation of (2.1) considered here (see (2.6) below) is based on the conventional Krylov subspace First, an orthonormal basis of K m (A, v) is obtained by the well-known Arnoldi iteration, see [46]. This produces a basis matrix V m ∈ C n×m satisfying V * m V m = I m×m , and an upper Hessenberg matrix T m ∈ C m×m such that the Krylov identity 2 is valid, with τ m+1,m ∈ R + and v m+1 ∈ C n with v m+1 2 = 1.

Remark 1
We are assuming that the Arnoldi iteration is executed until the desired dimension m. Then, by construction, all lower diagonal entries of T m are positive [46]. If this is not the case, i.e., if a breakdown occurs, it is known that this breakdown is lucky,, i.e., the approximation (2.6) below obtained in the step before breakdown is already exact, see [45].
For the case of a Hermitian matrix A the Krylov subspace can be constructed using the Lanczos iteration, which is a special case of the Arnoldi iteration, resulting in a tridiagonal matrix T m ∈ R m×m . In the following we discuss the general case and comment on the case of a Hermitian matrix A whenever appropriate.
The following identities hold true due to the upper Hessenberg [tridiagonal] structure of T m together with (2.2): and see for instance [10,Theorem 2] or [45]. Furthermore, let where the claimed identity also follows from the upper Hessenberg [tridiagonal] structure of T m .

Krylov approximation The standard Krylov approximation to E(t)v is
We denote the corresponding error operator by L m (t), with Defect-based integral representation of the approximation error We define the defect 2 Here, e m = (0, . . . , 0, 1) * ∈ C m , and in the sequel we also denote e 1 = (1, 0, . . . , 0) * ∈ C m . Then, L m (t)v and D m (t)v are related via the differential equation Asymptotically for t → 0, We can also characterize the asymptotically leading term of the error: Proposition 1 For any A ∈ C n×n the error L m (t)v satisfies the asymptotic relation with Taylor remainder R m+1 (t) = O(t m+1 ).

Remark 2
The Taylor remainder R m+1 in (2.12) can be specified in a more explicit way showing its dependence on m,

An upper error bound for the nonexpansive case in (2.1)
For the nonexpansive case we have E(t − s) 2 ≤ 1 for 0 ≤ s ≤ t, and (2.8) implies With v m+1 2 = 1, and together with (2.9) we obtain This estimate is also given in [31, Section III.2] and appeared earlier in [13, Subsection 2.2]. Of course, the integral in (3.1b) cannot be computed exactly. In [31] it is proposed to use numerical quadrature 3 to approximate the integral in (3.1b). In contrast, our aim here is to derive a computable upper bound. We proceed in two steps. 4 Analytic matrix function via interpolation on the spectrum To approximate the error integral in (3.1b) we use the representation of matrix exponentials via Hermite interpolation of the scalar exponential function on the spectrum of the matrix T m , see [23, Chap. 1]: If μ 1 , . . . , μ r (r ≤ m) denote the distinct eigenvalues of T m and n j is the dimension of the largest Jordan block associated with μ j , then where p t (λ) is the Hermite interpolant of degree ≤ m − 1 of the function For a general matrix, the degree of p t may be smaller than m − 1. However, in our context a special case occurs: Since the lower diagonal entries of T m do not vanish, T m is nonderogatory, i.e., for each eigenvalue μ j the associated eigenspace is onedimensional, see [27,Section 3.1]. Then, r j=1 n j = m, which implies that the degree of p t is exactly m − 1.
In the following we denote the full sequence of the m eigenvalues of T m by λ 1 , . . . , λ m . By applying basic properties of the Krylov decomposition and imposed conditions on the numerical range of A we obtain The following proposition is partially related to [9,Sect. 3] or [30]. Here, divided differences have to be understood in the general sense, i.e., in the confluent sense if multiple eigenvalues occur; for the detailed definition and properties see [23,Section B.16].
Proposition 2 Let T m ∈ C m×m be an upper Hessenberg matrix with eigenvalues λ 1 , . . . , λ m and spec(σ T m ) ⊆ C − . Then the function δ m (t) defined as in (3.1a), i.e., Proof We proceed from the Newton representation of the interpolant p t (λ) from (3.2), 3) and by definition of γ m , it is obvious that the ω j satisfy According to [23, (B.28)] the divided difference can be estimated by which implies the estimate (3.5) for δ m (t).
Error estimate and asymptotical correctness Now we apply Proposition 2 in the context of our Krylov approximation.

Theorem 1 (Computable upper bound) For the nonexpansive case the error L m (t)v of the Krylov approximation (2.6) to E(t)v satisfies
with τ m+1,m from (2.2) and γ m from (2.5).
Proof We proceed from (3.1). For δ m defined in (3.1a), Proposition 2 implies and this gives an upper bound for the error integral (3.1b): which completes the proof. Proposition 2 is applied here in the nonexpansive case The upper bound (3.6) corresponds to the 2-norm of the leading error term (2.12) according to Proposition 1. It is easily computable from the Krylov decomposition (2.2). We denote the error estimate given by (3.6) as

Remark 3 In [49, Section 4] a defect-based error formulation is given for the shift-
and-invert Krylov approximation of the matrix exponential function. In contrast to the standard Krylov method, the defect is not of order m − 1 for t → 0 there. Hence, our new results do not directly apply to shift-and-invert Krylov approximations. A study of a posteriori error estimates for the shift-and-invert approach is a topic of future investigations.

Krylov approximation to '-functions
As another application we consider the so-called ϕ-functions, with power series representation We have ϕ 0 (z) = e z , and As the matrix exponential, ϕ-functions of matrices also appear in a wide range of applications, such as exponential integrators, see for instance [2,25,26,39,47]. Krylov approximation is a common technique to evaluate ϕ-functions of matrices applied to a starting vector, Since ϕ-functions are closely related to the matrix exponential, our ideas can be applied to these as well. We use the following notation for the error in the ϕ-functions: With (4.3) we generalize the previously used notation: L m (t) = L 0 m (t).

Theorem 2 The error of the Krylov approximation (4.2) to
Furthermore, in the nonexpansive case its norm is bounded by and this bound is asymptotically correct for t → 0.
Proof For p = 0 the result directly follows from Propositions 1, 3 and Theorem 1. We now assume p ≥ 1. Via the series representation (4.1a) of ϕ p we can determine the leading term of the error in an analogous way as in Proposition 1: which proves (4.4a). Furthermore, proceeding from (4.1b) we obtain with the error L m (t)v for the matrix exponential case. Now we apply Theorem 1 to obtain which proves (4.4b).

Corrected Krylov approximation for the exponential and '-functions
Let us recall the well-known error representation given in [45].
In [45] it is stated that, typically, the first term of the sum given in Proposition 4, formula (Err 1 ), is already a good approximation to L m (t)v. Analogously to [45, Section 5.2] we use the notation Err 1 for the norm of this term, In [30] it is even shown that Err 1 is an upper bound up to a factor depending on spectral properties of the matrix A. For the case of Hermitian σ A we show L m (t)v 2 ≤ Err 1 in Proposition 6 below.
In Remark 4 below we show that Err 1 is also an asymptotically correct approximation for the error norm (in the sense of Proposition 3). Furthermore, the error estimate Err 1 is computable at nearly no extra cost, see [ can be used to evaluate the error estimate Err 1 or a corrected Krylov approximation in the form for which the first term of the error expansion according to Proposition 4 vanishes, see [45]. For the error of the corrected Krylov approximation we use the notation For general ϕ-functions we obtain an error representation similar to Proposition 4 and a corrected Krylov approximation for ϕ-functions. The corrected Krylov approximation to ϕ p (σ t A)v is given in [47,Theorem 2]: with T + m and V + m given in (5.2a) and (5.3). The error of the corrected Krylov approximation is denoted by The error of the corrected Krylov approximation L p,+ The following remark will be used later on.

Remark 4
From the representation (4.1a) for the ϕ j together with (2.3) and (2.5) we observe for j ≥ 0 and we conclude that the asymptotically leading order term of L p m (t)v for t → 0 is obtained by the leading term ( j = p + 1) of the series (5.5a): Analogously we obtain the asymptotically leading order term of L p,+ m (t)v for t → 0 by the leading term ( j = p + 2) of the series (5.5b): The asymptotically leading terms in (5.7a) and (5.7b) can be used as error estimators: The error estimators (5.8a) and (5.8b) are already suggested in [39,47]. We will refer to them as Err 1 in the context of the ϕ-functions with standard and corrected Krylov approximation, generalizing the corresponding quantities for the exponential case p = 0.
We also obtain true upper bounds for the matrix exponential ( p = 0) and general ϕ-functions with p ≥ 1.

Theorem 3 The error of the corrected Krylov approximation
Furthermore, in the nonexpansive case its norm is bounded by and this bound is asymptotically correct for t → 0.
Proof Applying (5.6) (with j = p + 2) to (5.7b) shows (5.9a): From Proposition 5 we observe Using the integral representation analogously as in the proof of Theorem 2 for L p+1 With norm inequalities (note the nonexpansive case) and Proposition 2 we obtain which proves (5.9b). Proposition 2 is applied here in the nonexpansive case, see also the proof of Theorem 1.
If the error estimate (5.9b) is to be evaluated, the effort of the computation of Av m+1 2 is comparable to one additional step of the Krylov iteration.
As mentioned before, we also can show that for Hermitian σ A the estimate Err 1 gives a true upper bound: For the nonexpansive case with σ = 1 and a Hermitian matrix A we obtain Proof For a Hermitian matrix A we obtain a symmetric, tridiagonal matrix T m with distinct, real eigenvalues via Lanczos approximation, see [27,Chap. 3.1]. By Proposition 2 we observe We continue with (5.10a) in the case p = 0: For the case p ≥ 1 we start analogously to Theorem 2. Using definition (4.1b) for the ϕ-functions and resorting to the case p = 0 we find

Evaluation of the integral yields
This shows (5.10a). To show (5.10b) we start analogously to Theorem 3: Using |δ m (s)| = δ m (s) and evaluating the inner integral by the ϕ 1 function, we obtain Evaluation of the integral analogously to (5.12), completes the proof.

Defect-based quadrature error estimates revisited
The term on the right-hand side of (2.12) is a computable error estimate, which has been investigated more closely in Sect. 3. It can also be interpreted in an alternative way. To this end we again proceed from the integral representation (2.8), and the same is true for the integrand in (6.1), Analogously as in [3], this allows us to approximate (6.1) by a Hermite quadrature formula in the form which is the same as (2.12). This means that the quadrature approximation (6.2) approximates the leading error term in an asymptotically correct way. From (6.2), (2.9) and (3.1a) we obtain 3) The quadrature error in (6.2) is O(t m+1 ). It is useful to argue this also in a direct way: By construction, the Hermite quadrature formula underlying (6.2) is of order m, and its error has the Peano representation (cf. also [3]) . This shows that, indeed, the quadrature error (6.4) is O(t m+1 ).
Furthermore, a quadrature formula of order m + 1 can be constructed by including an additional evaluation of A routine calculation shows where the error depends on d m+1 . This may be considered as an improved error estimate 5 which can be evaluated using With the solution in the Krylov subspace, e σ t T m e 1 with e * m e σ t T m e 1 = (e σ t T m e 1 ) m , we can compute the derivative of the defect at Also longer expansions may be considered, for instance etc. This alternative way of computing improved error estimates is worth investigating but will not be pursued further here.
Quadrature estimate for (3.1b) revisited We proceed from (3.1b) which is valid for the nonexpansive case. In [31] it is suggested to use the right-endpoint rectangle rule as a practical approximation to the integral (3.1b), 6) or alternatively the Simpson rule, which is also suggested in [50]. The error estimate in (6.6) is also referred to as generalized residual estimate in [7,25] and similar error estimates also appeared earlier in [45]. In [13, eq. (32)] an a priori upper bound on the integral in (6.6) is obtained by t max s∈[0,t] |δ m (s)|. Applying Hermite quadrature to (3.1b) also directly leads to the error estimate (6.3). For a better understanding of the approximation (6.6) we consider the effective order of |δ m (t)| as a function of t. Let us denote f (t) := |δ m (t)| and assume f (t) > 0 in a sufficiently small interval (0, T ]. For the Hermitian case this assumption is fulfilled for all t > 0, see Proposition 6. The effective order of the function f (t) can be understood as the slope of the double-logarithmic function We denote the order by and obtain Integration and application of the mean value theorem shows the existence of t * ∈ [0, t] with With the plausible assumption that the order is bounded by This shows that under such an assumption the generalized residual estimate (6.6) gives an upper bound on the error. With the assumption 0 ≤ ρ(t), the defect |δ m (t)| (also called the residual in the literature) is monotonically increasing and the upper bound suggested in [13, eq. (32)] is equivalent to the generalized residual estimate. However, in contrast to (6.3), the error estimate (6.6) is not asymptotically correct for t → 0. In the following remark we suggest a practical approach to tighten the generalized residual estimate retaining the property of an upper bound in (6.8).

Remark 5
With ρ(0+) = m − 1 and the assumptions that the effective order is slowly decreasing locally at t = 0 and sufficiently smooth, we suggest choosingm = ρ(t) for a step of size t to improve the quadrature based estimate.
We will refer to this as effective order quadrature estimate. Substitute f (t) = |δ m (t)| = (e σ t T m e 1 ) m (eσ tT m e 1 ) m 1/2 in the ansatz (6.7) to obtain a computable formula for the effective order ρ(t), The computation of (e σ t T m e 1 ) m−1 usually comes hand in hand with the computation of δ m (t) = (e σ t T m e 1 ) m . For a numerical implementation of (6.10) we suggest computing e σ t T m e 1 by a Taylor or Páde approximation. In the limit t → 0 this choice of quadrature is equivalent to the Hermite quadrature and, therefore, asymptotically correct.
Up to now we did refer to the effective order of the defect |δ m (t)|. For t → 0 the effective order of the error is given by ρ(t) + 1.

The matrix exponential as a time integrator
For simplicity we assume the nonexpansive case of (2.1) in this section.
We recall from [24] that superlinear convergence as a function of m, the dimension of the underlying Krylov space, sets in for t A 2 m. (7.1) This relation also affects the error considered as a function of time t. Equation (7.1) can be seen as a very rough estimate for a choice of t which leads to a systematic error and convergence behavior. Only for special classes of problems as for instance symmetric negative definite matrices, the relation (7.1) can be weakened, see [4,24] for details. In general a large time step t would necessitate large m or a restart of the Krylov method. For larger dimensional problems memory issues can limit the choice of m and make a restart necessary. Considering global computational cost it may also be favorable to use a moderate value of m in combination with restarts. Even if increasing m results in a larger time step t, the increase in computational cost can lead to a decrease of total performance in some cases. This issue is relevant particularly for rather large choices of m, especially if computational cost scaling with m 2 or worse gets noticeable. We further discuss effects of computer arithmetic on the Krylov approximation of the matrix exponential in Sect. 8 without going into details.
For the matrix exponential seen as a time propagator, a simple restart is possible. The following procedure has been introduced in [47] and is recapitulated here to fix the notation.
We split the time range [0, t] into N subintervals, The exact solution at time t j is denoted by v For simplicity we assume that the dimension m of the Krylov subspace is fixed over the substeps. We obtain approximations w [ j] to v [ j] The error matrix in the j-th step is denoted by m (Δt j ), and the accumulated error by

Recall our premise that E(·) is nonexpansive and assume that the local error is bounded by
2 denotes the truncation error of a single substep and is studied in the first part of this paper. We now apply local error estimates to predict acceptable time steps.
Step size control For a single substep, the error estimate (Err a ) suggests a step size to satisfy a given error tolerance tol as For a local error as in (7.3), we replace tol by (Δt j tol) in (7.4) and obtain We remark that Δt j can be computed together with the construction of the Krylov subspace, therefore, τ The error estimator Err 1 and estimates given in Sect. 6 cannot be inverted directly to predict the step size. Computing a feasible step size is still possible via heuristic step size control. This approach will be formulated for a general error estimate Err.
Ideas of heuristic step size control are given in [20] in general and [47]   In (7.7) we only need the evaluation of the error estimate for the previously computed step with step size Δt j−1 . In the Expokit package [47] the heuristic step size control is used similarly to (7.7) and an initial step size Δt 1 is chosen by an a priori choice, which we recall for comparison on numerical examples, In many cases the construction of the Krylov subspace, which is independent of the step size, contributes the largest part to the computational cost. In this case we can improve the choice of Δt j relatively cheaply in an iterative manner before continuing to time step j + 1:  The aim of the iteration (7.9) is to determine a step size Δt j,∞ with Err [ j,∞] = Δt j,∞ tol, see (7.3). The convergence behavior of iteration (7.9) depends on the structure of the corresponding error estimate. The idea of the heuristic step size control is based on the asymptotic order of the error for Δt → 0, which in (7.7) and (7.9) is assumed to be m, see (2.11). By substituting the asymptotic order m by the effective order ρ(Δt) + 1, which is introduced in (6.10), the iteration (7.9) could be improved for a step size Δt away from the asymptotic regime. In our practical examples this iteration does not seem to be sensitive with respect to the effective order of the error and converges in a small number of steps using the asymptotic order m.
For the following remarks on T m we neglect the index j in T [ j] m to simplify the notation. In the case of a Hermitian matrix A the matrix T m is symmetric, tridiagonal and real-valued which allows cheap and robust computation of its eigenvalue decomposition. The eigenvalue decomposition of T m is independent of the step size Δt and allows cheap evaluation of e σ Δt T m e 1 or ϕ 1 (σ Δt T m )e 1 and corresponding error estimates for multiple choices of Δt.
For a non-Hermitian matrix A computing Err [ j,l] for multiple choices of l, hence different step sizes Δt j,l , only leads to slightly larger computational cost, which is usually negligible.

Numerical considerations and examples
We give an illustration of our theoretical results for two different skew-Hermitian problems in Sect. 8.1, a Hermitian problem in Sect. 8.2, and a non-normal problem in Sect. 8.3. We also compare the performance of different error estimates for practical step size control (Sect. 7) in Sect. 8.1. To show that our error estimate (3.6) is efficient in practice we also compare it with results delivered by the standard package Expokit [47] and a priori error estimates.

The skew-Hermitian case
For our tests we use different types of matrices. Discrete Hubbard model For the description of the Hubbard model we employ a selfcontained notation. The Hubbard model first appears in [28] and was further used in many papers and books, e.g., [32,44]. The Hubbard model is used to describe electron density on a given number of sites, which correspond to Wannier discretization of orbitals, and spin up or down. We consider the following Hubbard Hamiltonian, in second quantization and without chemical potential:

Free Schrödinger equation We consider
Un jσn jσ , (8.2) where i, j sum over the number of sites n sites and the spins σ, σ ∈ {↑, ↓} where σ is the opposite spin to σ . The entries v i j with i, j = 1, . . . , n sites describe electron hopping from site i to j. In (8.2), the notation c † jσ c iσ describes the 2nd quantization operator andn jσ = c † jσ c jσ the occupation number operator. For details on the notation in (8.2) we can recommend several references, e.g., [28,29,32,44].
For our tests we model 8 electrons at 8 sites (n sites = 8) with spin up and down for each site, this leads to 16 possible states for electrons. Such an electron distribution is also referred to as half-filled in the literature. We also restrict our model by considering the number of electrons with spin up and down to be fixed as n sites /2. This leads to n = (binomial(8, 4)) 2 = 4900 considered occupation states which create a discrete basis. For the numerical implementation of the basis we consider 16-bit integers for which each bit describes a position which is occupied in case the bit is equal to 1 or empty otherwise. The set of occupation states can be ordered by the value of the integers which leads to a unique representation of the Hubbard Hamiltonian (8.2) by a matrix H ∈ C n×n . Such an implementation of the Hubbard Hamiltonian is also described in [29,Section 3].
In our test setting we use U = 5 and parameter-dependent values for electron sin ω for j = 1, . . . , 7 and v i j = 0 otherwise.
For this choice of v i j (ω) we obtain a Hermitian matrix H ω ∈ C n×n with 43,980 nonzero entries (for a general choice of ω) and spec(H ω ) ⊆ (−19.1, 8.3). The spectrum of H ω is independent of ω.
A relevant application where the Hubbard Hamiltonian (8.2) is of importance is the simulation of oxide solar cells with the goal of finding candidates for new materials promising a gain in the efficiency of the solar cell, see [21]. The study of solar cells considers time-dependent electron hoppings v i j = v i j (t) to model time-dependent potentials which lead to Hamiltonian matrices H (t). The time-dependent Hamiltonian can be parameterized via ω. Time propagation of a linear, non-autonomous ODE system can be approximated by Magnus-type integrators which are based on one or more evaluations of matrix exponentials applied to different starting vectors at several times t, see for instance [5,6]. Our test setting for the Hubbard Hamiltonian with arbitrary ω is then obtained by (2.1) with the matrix A = H ω as described above and σ = −i.
In the following Sect. 8.1 we focus on the skew-Hermitian case. For tests on the Hermitian case see Sect. 8.2 below. We observe that the error estimate Err 1 is a good approximation to the error, but it is not an upper bound in general. In contrast, Err a is a proven upper error bound. Up to round-off error, for m = 10 we observe the correct asymptotic behavior of Err a and Err 1 . For larger choices of m the asymptotic regime starts at time steps for which the error is already close to round-off precision. Therefore, for larger choices of m, the Krylov approximation, as a time integrator, cannot achieve its full order for typical time steps in double precision.

Verification of upper error bound
The matrix (8.1) has been scaled such that spec(H ) ⊆ (0, 1) and H 2 ≈ 1. In accordance with (7.1) stagnation of the error is observed for times t m, see Fig. 1.
We verify the error estimates in the skew-Hermitian setting of the free Schrödinger Eq. (8.1) for the standard Krylov approximation of the ϕ 1 function in Fig. 3 and the corrected Krylov approximation of the matrix exponential function in Fig. 4. In Fig. 3 the error estimator Err 1 refers to formula (5.8a) and Err a shows the upper error bound (4.4b) from Theorem 2, both for the case p = 1. In Fig. 4, Err 1 is from formula (5.8b) and Err a denotes the upper error bound (5.9b) from Theorem 3, both for the case p = 0.   Both estimates are asymptotically correct, whereas the improved quadrature (6.5) is slightly better for larger time steps t, with the drawback of one additional matrixvector multiplication. (See Remark 7 below for cost efficiency of more expensive error estimates.) Figure 6 refers to the generalized residual estimate (6.6), and estimates based on the effective order quadrature according to Remark 5, and the Hermite quadrature (6.3). For our test problems the assumptions from Sect. 6 on the defect and its effective order are satisfied for a significant range of values of t. We also observe that the inequalities (6.8) are satisfied. The effective order and Hermite quadrature estimates behave in an asymptotically correct way, while the generalized residual estimate leads to an upper error bound which is, however, not sharp for t → 0.
For the skew-Hermitian case use σ = −i and T m ∈ R m×m in (6.10) to obtain For computing the effective order we only consider time steps ρ(t) > 0, and where ρ appears indeed to be monotonically decreasing over the computed discrete time steps. This restriction is compatible with our assumptions in Sect. 6. Corrected Krylov approximation and mass conservation We remark that error estimates for the corrected Krylov approximation usually require one additional matrix-vector multiplication, and applying a standard Krylov approximation of dimension m + 1 seems to be a more favorable choice in our approach to error estimation. The Krylov approximation of the matrix exponential conserves the mass for the skew-Hermitian case in contrast to the corrected Krylov approximation. Whether this is a real drawback of the corrected Krylov approximation depends on the emphasis placed on mass conservation. In the following examples we focus on the standard Krylov approximation, with some exceptions which serve for comparisons with the original Expokit code, which is based on the corrected Krylov approximation.
In exact arithmetic we obtain mass conservation for the skew-Hermitian case: For the case v 2 = 1 and the standard Krylov approximation S m (t)v we have The requirement V * m V m = I is essential to obtain mass conservation in (8.3). In computer arithmetic the loss of orthogonality of the Krylov basis V m has been studied earlier, see also [40]. To preserve the property of mass conservation a reorthogonalization, see [43], may be advisable in this case.
Krylov approximation of the matrix exponential in computer arithmetic It has been shown in [11,13,19] that a priori error estimates for the Krylov approximation of the matrix exponential remain valid also taking account of affects of arithmetic. Such results imply that in general in computer arithmetic the convergence of the Krylov approximation is not precluded and round-off errors are not critical. In practice round-off errors may in some cases lead to a delay of convergence which can make a reorthogonalization relevant. Stability of the Krylov approximation has been discussed by many authors, see also [38], but is not further discussed here in detail.
In the next paragraph we will give an argument, following [13], that the a posteriori error estimates which are the topic of this work are robust with respect to round-off errors.
We recall that the Krylov subspace constructed in computer arithmetic satisfies the Krylov identity (2.2) with a small perturbation, see also [40] for the Lanczos case and [8,51] for the Arnoldi case, which can both be extended to complex-valued problems using results from [22]. Following results from [13] we conclude that a small perturbation of the Krylov identity leads to a small perturbation of the defect (residual) δ m (t) in (3.1a) and the integral representation of the error in (3.1b). Thus the error estimates given in Sect. 6 remain stable with respect to round-off.
We further use that by construction the computed T m is still upper Hessenberg with a positive lower diagonal and in the Lanczos case also real-valued and symmetric. Then following Proposition 6 in the Hermitian (Lanczos) case, the integral representation of the error in (3.1b) results in the upper error bound Err 1 , which is not critically affected by round-off errors. For the upper bound Err a we further assume that spectral properties of (3.4) still hold mutatis mutandis under a small perturbation, see [41] for such results for the Lanczos case, to obtain stability of this upper error bound also with round-off.
Numerical tests for step size control The idea of choosing discrete time steps for the Krylov approximation is described in Sect. 7. The following tests are applied to the matrix exponential of the Hubbard Hamiltonian. We first clarify the notation used for our test setting.
Expokit and Expokit The original Expokit code uses the corrected Krylov approximation with heuristic step size control and an error estimator which is based on the error expansion (5.1), see [47,Algorithm 3.2] for details. Since the standard Krylov approximation is not part of the Expokit package, we have slightly adapted the code and its error estimate such that the standard Krylov approximation is used. We refer to the adapted package as Expokit . With Expokit our comparison can be drawn with the standard Krylov approximation which may in some cases be the method of choice as discussed above.
Step size based on Err a In another test code the upper error bound Err a from Theorem 1 is used. With Err a we obtain proven upper bounds on the error and reliable step sizes (7.5).
By gen.res, eff.o.quad, and Err 1 we refer to the generalized residual estimate (6.6), the effective order quadrature (6.9), and (Err 1 ), respectively. Because these error estimates cannot be inverted directly we need to apply heuristic ideas for the step size control, see (7.7). In addition, we use the iteration (7.9) to improve step sizes. For the test problems we have solved, iteration (7.9) converges in less than 2 iterations for m = 10 or less than 5 iterations for m = 30. We simply choose N j = 5 for our tests.
The a priori estimates (7.8), [24,Theorem 4] and [34, eq. (20)] are given in the corresponding references. Formula (7.8) taken from the Expokit code directly provides a step size. In [34, eq. (20)] the computation of the step size is described. For the error estimate given in [24,Theorem 4] we apply Newton iteration to determine an appropriate step size. For tests on the Hubbard model we use (λ max − λ min ) = 27.4 as suggested in the description of the Hubbard Hamiltonian.
In Remark 7 below we also investigate the following variants: Step size based on Err + a By Err + a we denote the upper error bound for the corrected Krylov approximation as given in Theorem 3 with p = 0. The corresponding step size is given by (7.6). By i.H.quad we refer to the improved Hermite quadrature (6.5). Similarly to other quadrature error estimates we use heuristic step size control and iteration (7.9) to determine adequate step sizes.

Remark 6
In the Expokit code the step sizes are rounded to 2 digits in every step. Rounding the step size can give too large errors in some steps. This makes it necessary to include safety parameters in Expokit which on the other hand slow down the performance of the code. It seems advisable to avoid any kind of rounding of step sizes.
In Table 1 we compare the total time step t for the Krylov approximation with m = 10 and m = 30 after N = 10 steps obtained with the different step size control strategies. For the local error we choose the tolerance tol = 10 −8 . The original Expokit code seems to give larger step sizes, but it also uses a larger number of matrix-vector multiplications, see Remark 7. The error estimate Err a leads to optimal step sizes for m = 10 and close to optimal step sizes for m = 30. For any choice of m the error estimate Err a gives reliable step sizes. The generalized residual estimate overestimates the error and, therefore, step sizes are smaller. The effective order quadrature and Err 1 give optimal step sizes. With the assumptions from Sect. 6 (which apply to our test examples), the generalized residual estimate and effective order quadrature give reliable step sizes. For the error estimate Err 1 we do not have results on the reliability of the step sizes since the error estimate Err 1 does not lead to an upper bound of the error in general. The tested a priori estimates (7.8), [24,Th. 4], and [34, (20)] overestimate the error and lead to precautious step size choices. For all the tested versions the accumulated error L m (see (7.2)) satisfies L m v 2 /t ≤ tol.
Apart from step size control, the upper error bound Err a can be used on the fly to test if the dimension of the Krylov subspace is already sufficiently large to solve the problem in a single time step with the required accuracy. For our test problems Table 1 The displayed step size t is the sum of N = 10 substeps computed by different versions of step size control, as described above  Table 2 With a test setting similar to Table 1, we now compute up to a fixed time t We use a tolerance tol = 10 −8 and m = 30. For this problem we see a significant reduction in the number of matrix-vector multiplications used for the estimate Err a by the stopping critera described in the text this stopping criterion is applied to the Err a estimate. We refer to Table 2, in which we observe the Krylov method with error estimate Err a to stop after 17 steps instead of computing the full Krylov subspace of dimension 30. In comparison, the original Expokit package needs a total of 62 matrix-vector multiplications.

Remark 7
Error estimates for the corrected Krylov approximation or improved error estimates such as the improved Hermite quadrature (6.5) require additional matrixvector multiplications. Instead of investing computational effort in improving the error estimate, one may as well increase the dimension of the standard Krylov subspace. For comparison we test the original Expokit code, the corrected Krylov approximation with error estimate Err + a and the improved Hermite quadrature (6.5) with Krylov subspace m − 1. Table 3 shows that a standard Krylov approximation with dimension m leads to better results, although all considered versions use the same number of matrixvector multiplications. Since the reliability of error estimates such as Err a has been demonstrated earlier, it appears that additional cost to improve the error estimate is not justified.

The Hermitian case
To obtain a more complete picture, we also briefly consider the case of a Hermitian matrix A = H with σ = 1 in (2.1). Such a model is typical of the discretization of a parabolic PDE. Thus, the result may depend on the regularity of the initial data, which is chosen to be random in our experiments. For the heat equation, H given in (8.1), we can also verify the error estimates, see Fig. 7. In comparison to the skew-Hermitian case we do not observe a large time regime for which the error is of the asymptotic order m. As shown in Proposition 6 we do obtain an upper error bound using Err 1 for the heat equation.
Similarly to the skew-Hermitian case, we can also apply the effective order quadrature according to Remark 5 to the Hermitian case. Use σ = −1 and T m ∈ R m×m in (6.10) to obtain For computing the effective order we only consider time steps ρ(t) > 0, and where ρ appears indeed to be monotonically decreasing over the computed discrete time steps. This restriction is compatible with our assumptions in Sect. 6.
The error estimates Err a and Err 1 are compared to the exact error norm L m (t)v 2 in Fig. 8 for the case (8.6) and in Fig. 9 for the case (8.7). As shown in Theorem 1 the error estimate Err a constitutes an upper error bound. The error estimate Err 1 gives a good approximation of the error but has not been proven to give an upper bound in general.
Compared to (8.7), the spectrum for (8.6) is closer to the Hermitian case. The spectrum for (8.7), on the other hand, is dominated by large imaginary parts similarly as in the skew-Hermitian case.
In Fig. 8 we observe effects similar to the Hermitian case. The asymptotic order m of the error does not hold for a large time regime, and the error estimate Err a is not as sharp as in the skew-Hermitian case. On the other hand, in Fig. 9, we observe that the performance of the error estimates is closer to the skew-Hermitian case. Therefore, the upper error bound Err a is sharp for a larger range of time steps. As already observed for the Hermitian and skew-Hermitian cases, the error of the Krylov approximation is closer to its asymptotic order m for smaller choices of m.

Summary and outlook
We have studied a new reliable error estimate Err a for Krylov approximations to the matrix exponential and ϕ-functions. This error estimate constitutes an upper bound on the error, and it can be computed on the fly at nearly no additional cost. The Krylov process can be stopped as soon as the error estimate satisfies a given tolerance. Err a is asymptotically correct for t → 0 and very tight in the asymptotic regime. Our numerical experiments illustrate that the asymptotic regime is more relevant for the skew-Hermitian case (compared to the Hermitian case) and for a smaller choice of m and tolerances. The non-normal examples seem to be in between the skew-Hermitian and Hermitian cases. In our numerical experiments the defect (residual) is seen to behave nicely close to the asymptotic regime and the generalized residual estimate is observed to constitute an upper bound. The generalized residual estimate can be tightened by applying an effective order quadrature.
For the Hermitian case we have shown that the error estimate Err 1 constitutes an upper bound and, compared to other error estimates, seems to be the most appropriate choice for Hermitian problems.
Step size control for a simple restarted scheme is an important application. The upper error bound Err a is an appropriate tool for this task, since the optimal step size for a given tolerance can be computed directly. This is not the case for other error estimates for the Krylov approximation, which usually employ heuristic schemes to compute optimal step sizes in the restarting approach. We have shown that the step size can be cheaply improved by using a heuristic step size approach in an iterative manner. Also the use of a priori bounds is not optimal in most cases.