Rigorous and effective a-posteriori error bounds for nonlinear problems—application to RB methods

Quantifying the error that is induced by numerical approximation techniques is an important task in many fields of applied mathematics. Two characteristic properties of error bounds that are desirable are reliability and efficiency. In this article, we present an error estimation procedure for general nonlinear problems and, in particular, for parameter-dependent problems. With the presented auxiliary linear problem (ALP)-based error bounds and corresponding theoretical results, we can prove large improvements in the accuracy of the error predictions compared with existing error bounds. The application of the procedure in parametric model order reduction setting provides a particularly interesting setup, which is why we focus on the application in the reduced basis framework. Several numerical examples illustrate the performance and accuracy of the proposed method.


Introduction
A-posteriori error estimates are important tools in many disciplines of applied mathematics. For example, they are required for assessing the quality of numerical approximations and to guarantee their feasibility in the corresponding scenario. Popular examples for the application of error bounds are adaptive refinement strategies, where error estimates are used to judge whether the spatial or temporal discretization should be refined to improve the quality of the approximation, see for example [1,9,17]. Two desirable properties of such bounds are rigorosity, i.e. the error bound should be a valid upper bound, and effectivity, i.e. the factor of overestimation should be computable. In the context of the finite element method (FEM), those properties are also referred to as reliability and efficiency.
An area where error bounds are of utmost importance is reduced order modeling. Faced with the computational complexity involved with solving high-dimensional systems of equations arising for example for highly accurate discretizations of partial differential equations (PDEs), several techniques have emerged that try to tackle this challenge by reducing the dimension of the problem. This is typically done by a projection of the problem onto a low-dimensional subspace that contains enough information about the solution to the problem. The projection then yields a problem of low-dimension which can be solved with low computational complexity and which yields an approximation to the high-dimensional solution. The important question that should then be answered is how far the approximation is from the true solution.
To this end, one typically employs a-posteriori error estimates which in the ideal case deliver a rigorous upper bound that does not deviate too much from the true error. Giving a complete overview over the available methods and corresponding results for the error estimation is out of scope of this paper. Instead, we refer to [3] and [4] for recent overviews of model (order) reduction in the parameteric and nonparametric cases. Based on these techniques, approximate solutions can be calculated cheaply and in a computationally efficient manner. One framework that is particularly suitable for parametric problems is the reduced basis (RB) method. The essential idea of RB methods is to identify low-dimensional subspaces in the high-dimensional solution spaces by exploring the parameter domain with so-called greedy algorithms. In this article, we will demonstrate that classical error bounds which are well-established within RB methods can be significantly improved by introducing an auxiliary linear problem and corresponding RB approximation. By the proposed procedure, we are able to reach optimal effectivities of almost 1 in many examples. Additionally, the necessity for good lower bounds on the inf-sup constant can be alleviated, which allows the use of rougher (and thus computationally less expensive) lower bounds. Furthermore, the quality of the error bound can be tuned according to the application requirements.
In this paper, we improve a residual-based error estimation technique that has been used frequently during the last decades. We illustrate the essential idea of the improvement that enables highly accurate error estimates by the following simple example: Consider the vector-valued equation Ax = f for A = 1 0 10 1 and f = 0 1 . This equation has the unique solution x * = 0 1 . Assume we have a numerical scheme that is able to produce the approximate solutionx ∈ R 2 with sayx = 1.01x * . This results in a very low error (in the Euclidean norm) of only x − x * 2 = 0.01. Usually the true solution x * is not available for the evaluation of the error, which is why we are interested in finding rigorous upper bounds to the norm of the true error e :=x − x * . The straightforward procedure for doing this is to define the residual r := Ax − f and to derive the equation Ae = r for the error e. It then directly follows e 2 ≤ A −1 2 r 2 ≈ 10.1 · 0.01 ≈ 0.101, which is an overestimation of factor ≈ 10, which is already quite large in this small example. To obtain more accurate error bounds in this linear setting, we start again with the equation for the error Ae = r. It is an interesting observation that by solving this equation exactly, the error can be calculated exactly, i.e. with no overestimation. Unfortunately this is often too expensive in applications since this essentially adds the complexity of solving the original problem again (at least in the linear case). The central idea now is to not solve the error equation Ae = r exactly but by another computationally efficient method that produces approximate solutions. If we assume to have a numerical scheme that is able to calculate an approximate error e rapidly, we can make use of the triangle inequality and deduce the upper bound e 2 ≤ e 2 + e − e 2 . The second term can be estimated similarly to the first error bound by introducing a second residual R := A e − r, from which we then obtain the final bound Returning to the toy example and assuming an approximation e = 1.01e, we can evaluate equation (1) and obtain e 2 ≤ 0.01111, which gives an overestimation factor of approximately 1.111. Hence, the error estimate is improved by a factor of about 10.
In this paper, we show how the idea behind this very simple example can be generalized to a large class of linear and nonlinear problems, especially in the context of RB methods. To this end, we introduce a generic nonlinear error estimate in Section 2. We discuss the application in the RB context in Section 3 and provide several numerical examples in Section 4. Finally, we present a conclusion and an outlook in Section 5.

Rigorous and effective error bounds
We first clarify the setting used throughout this article. In what follows, we always assume X and Y to be Banach spaces with norms · X and · Y , respectively. The set of all bounded linear operators from X to Y is denoted as Throughout this article, we consider continuously differentiable mappings G ∈ C 1 (X, Y ) and are interested in solving the problem Find x ∈ X such that G(x) = 0. (P) An element x * ∈ X is called (true) solution to the problem (P), if G(x * ) = 0.
In the remainder of this article, we always assume that at least one solution exists. We are interested in estimating the error e :=x − x * between a true solution and a suitable approximationx ∈ X by means of reliable a-posteriori error bounds, which can be represented by functions : X → R with the property A general framework for providing such error estimates can be found in [7]. However, the results presented in that reference often lead to quite large overestimations of the true error. The quality of the upper bound can be quantified in terms of the so-called effectivity, which is defined as By its definition, it is clear that for reliable (i.e. rigorous) error estimates, it always holds eff(x) ≥ 1. Ideally we aim for error bounds that provide effectivities close to one as we then get almost exact error predictions.

Rigorous, effective, and computable a-posteriori error estimates with effectivity bounds
In this section, we refine the results derived in [7] and show how significant improvements can be achieved. We want to emphasize that these derivations are independent of model order reduction but apply to any kind of approximation procedure. To this end, let us assume that the Fréchet-derivative DG|x of G at the approximate solutionx defines an invertible linear operator from X to Y . Based on this derivative, we then define the following three quantities, where B α (x) = {x ∈ X| x −x X ≤ α} denotes the closed ball in X with radius α aroundx (local nonlinearity indicator).
Due to the assumption on G, DG|x is bounded and thus DG| −1 x is also bounded according to the bounded inverse theorem and the above quantities are well-defined. Based on these quantities, we are able to prove the following fundamental error estimate.
Theorem 1 (Rigorous a-posteriori error estimation) Letx ∈ X be an approximate solution and assume that DG|x : X → Y is invertible. Let the validity criterion holds. Then the problem G(x) = 0 has a unique solution x * ∈ X in the closed ball B 2 (x) (x) and the following upper bound for the error e =x − x * ∈ X holds Proof We present the full proof of the theorem in Appendix A.
Remark 1 Note that similar bounds have been derived by various authors [20,21,23]. However, in the bounds in literature known to us, the nonsplit residual (x) is replaced by the upper bound which we call split residual for obvious reasons. As we have seen in the introduction and as we will see in the numerical results, this splitting can induce a very large overestimation. This is not the case when the quantity (x) or other, more accurate approximations to it, are used. Hence, the results in this article improve all the aforementioned existing results.
The quantity L(α) can be seen as a measure for the nonlinearity of the problem in the vicinity of the approximate solution. In particular, for (affine) linear problems, we immediately get L(α) = 0 (and τ (x) = 0, i.e. unconditional validity) and hence even exact error predictions as stated in the following corollary.

Corollary 1 (Exact error prediction for linear problems) Let G be affine linear in x. Then it holds
. We further infer If the local nonlinearity indicator L(α) does not vanish but satisfies L(α) ≤ Cα for some constant C > 0, the constant in front of the nonsplit residual in (4) can be improved as follows: Lemma 1 Let the assumptions of Theorem 1 hold and let L(α) ≤ Cα for some C > 0. If the modified validity criterion is satisfied, then the problem G(x) = 0 has a unique solution x * ∈ X and the error e =x − x * is bounded by Proof We present the full proof of the lemma in Appendix B.
In (6), we recover the multiplicative structure of error bounds for RB methods for quadratic nonlinearities, e.g. [23] except for different leading factors. In particular, we omit the stability factor which arises when the split split is used as a bound to the nonsplit residual .
As it was motivated in the introduction, the key quantity to assess the quality of the error bound is the effectivity eff(x). In order to make quantitative statements of the effectivity for the error bound in the general nonlinear case, we assume that the function DG| −1 x G(·) is locally Lipschitz-continuous aroundx. By this, we mean that there exists a constant C G (x) ≥ 0 such that it holds Based on this property, we are able to prove an estimate for the effectivity for locally Lipschitz-continuous problems.
x (G(·)) be locally Lipschitz-continuous aroundx with constant C G (x) and let the error estimate from Theorem 1 holds true. Then it holds Proof The proof follows directly from the fact that G(x * ) = 0 and Note that the effectivity estimate agrees with the result stated in Corollary 1. Indeed, for linear problems, we immediately observe τ (x) = 0 and C G (x) = 1 which results in eff(x) = 1.
It is noteworthy that the local Lipschitz-continuity assumption is satisfied for a large class of problems. One class that is of particular interest in applications are quadratic problems such as the Navier-Stokes equation, Burgers equation, the algebraic Riccati equation (ARE), or nonlinear reaction-diffusion equations. The following proposition provides a bound on C G (x) in this case.
Proposition 1 (Local Lipschitz-continuity for quadratic problems) Let G be quadratic, i.e. there exists a y 0 ∈ Y , A ∈ L (X, Y ) and a continuous bilinear mapping .
where c B := sup x,x ∈X\{0} Proof It holds By direct computation, we get the Taylor-like expansion and therefore, Taking the norm on both sides, applying the definition of the continuity constant c B and using the triangle inequality, we get Finally, applying the bound given in (4) gives the desired result.
In the infinite-dimensional settings, the calculation of the involved quantities is often not possible while in the finite-dimensional case, it can be computationally demanding or even infeasible. This is particularly true for very high-dimensional settings arising for example from semi-discretized PDEs. Instead, one often only has computable upper bounds to the quantities, i.e.
In this case, Theorem 1 remains valid with the replaced quantities: Theorem 2 (Computable error bound and effectivity estimate) Letx ∈ X be an approximate solution and assume that DG|x is invertible. Let the validity criterion holds. Then the problem G(x) = 0 has a unique solution x * ∈ X in the ball B 2 ub (x) (x) and the upper bound holds Furthermore, if there exists a constant C (x) > 0 such that ub (x) ≤ C (x) (x) and DG| −1 x (G(·)) is locally Lipschitz-continuous aroundx with constant C G (x), the effectivity estimate where eff ub (x) is the effectivity for the error bound ub (x) holds.
Proof The first statement (10) follows identical to Theorem 1. For proving the additional effectivity estimate, we infer from which we can proceed similar to Lemma 2.
Similar to Lemma 1, the error bound and effectivity bound can be improved if L ub (α) ≤ Cα for some C > 0 and if the modified validity criterion 4γ ub (x)C ub (x) ≤ 1 holds.

Reaching high effectivities through auxiliary linear problems
In this section, we will see how a very sharp bound for (x) can be obtained with low additional computational overhead. As it was motivated in the introduction by a simple two-dimensional linear problem, the effectivity of the a-posteriori error bound deteriorates by a large factor if the calculation of (x) is split according to (5). Thus, the key towards highly effective (i.e. eff(x) ≈ 1) error bounds lies in finding highly effective approximations or bounds to (x).
We first observe that the value of (x) can be calculated exactly by solving the following linear system for E(x) ∈ X, which then gives (x) = E(x) X by definition. While in the linear example, i.e. for linear G, this equation calculates the exact error, this is no longer the case for nonlinear G. We therefore refer to (11) as the auxiliary linear problem (ALP) and will consequently denote the here presented error bounds as ALP-based error bounds. Although linear problems of the form (11) are often relatively easy to solve, it can be prohibitive to do so in high-dimensional or multi-query scenarios.
To obtain a computationally efficient scheme, instead of requiring the true solution E(x), we assume to have a suitable method that can be used to calculate an approximate solution E(x) ∈ X.
Remark 2 One technique that is often applied to solve nonlinear problems of the form G(x) = 0 is the Newton-iteration. Based on an initial guess x 0 ∈ X, the following procedure is performed in an iterative manner: Hence, we can see that the computation of E(x) is equivalent to performing one step in the Newton-iteration and E(x) can be considered as a quasi Newton update. Thus, the additional computational effort can either be used for improved error quantification-as in our case-or improved approximation quality by settingx = The strength of the proposed method lies in the fact that we can easily derive a rigorous bound for the quantitiy (x), when an approximation E(x) is available: holds true.
Proof The proof is a straightforward application of the triangle inequality. It holds , we make use of the linearity of the ALP and obtain the relation Provided that an efficient scheme for the approximation of E(x) exists, the computational overhead for the calculation of ub is not very large as it only requires the calculation of E(x) X and R(x) Y . Many iterative solvers for large-scale linear systems provide the residual of the equation as an abortion criterion which can be directly used for the calculation of the residual norm R(x) Y . Furthermore, no additional quantities are required: In particular, γ ub (x) has to be calculated anyway for the evaluation of the error bound.
At this point, we want to emphasize that our numerical examples reveal very accurate error predictions when using ub (x) from Theorem 2 with the choice ub (x) according to Lemma 3. One possible explanation for this observation can be deduced from and the fact that ub (x) is a very accurate estimate of (x). In contrast to the original splitting of (x) in (5), the splitting in (13) does not deteriorate the bound ub (x) significantly since R(x) Y is often much smaller than E(x) X . To quantify this observation rigorously, we use the following lemma.
Lemma 4 (Relation of (x) and ub (x)) Assume Then the following inequality holds true for ub chosen as in (12).
Proof Note that the following proof is similar to a proof for the effectivity of relative RB error bounds [18]: The proof follows with From the triangle inequality and (13), we infer Inserting this twice into (14) yields the final result.

Highly accurate error bounds in the reduced basis context
In this section, we apply the proposed error bound within the RB framework. In particular, we explain how the a-posteriori error bound derived in Section 2 can be applied to parametric and nonlinear problems within the RB context.

Parametric nonlinear problems and the reduced basis method
In the following, we consider parametric problems. To this end, let μ ∈ P be a parameter vector where P ⊂ R P for P ∈ N is a compact set of admissible parameters. The problems that we are interested in take the form for the parameter-dependent operator G(·; μ) : X → Y . In the following, we always assume that for every parameter μ ∈ P at least one solution exists. The idea behind RB methods is to determine a low-dimensional subspace X N ⊂ X with N = dim(X N ) dim(X) = d ≤ ∞ and to find approximate solutions in this subspace by solving an N-dimensional so-called reduced problem. To illustrate the procedure, we equip the approximation space X N with a reduced basis {φ 1 , . . . , φ N } ⊂ X, of linearly independent basis elements φ i ∈ X. We then define the approximationx(μ) ∈ X N viâ where the coefficient functions x N,i : P → R are called reduced coordinates of the reduced coordinate vector ∈ R N and where we introduce := (φ 1 , . . . , φ N ) as the row vector of basis functions. By restricting the set of possible solutions of the problem (P (μ)) to the subspace X N and by projecting the residual G(x(μ); μ)) to another low-dimensional subspace Y N of dimension N, we arrive at the reduced problem: where the reduced problem is given as Here, Y N : Y → Y N denotes a projection onto the subspace Y N , which we equip with a basis {ψ 1 , . . . , ψ N } ⊂ Y . This procedure is commonly referred to as Petrov-Galerkin projection and it is widely used for projection-based model order reduction (MOR) methods. The solvability of (P N (μ)) is typically ensured by a careful construction of the spaces X N and Y N . In the following, we always assume that all problems are solvable, i.e. in particular we can compute true solutions x * (μ) ∈ X and approximationŝ x(μ) ∈ X N for any parameter μ ∈ P. But as mentioned above, we do not require uniqueness.

Effective error prediction for the RB method
Given an approximate solutionx(μ) ∈ X N to a true solution x * (μ) ∈ X, the fundamental question arises whether the norm of the error e(μ) :=x(μ) − x * (μ) can be quantified rigorously and with good effectivity. To give a positive answer to this question, we apply Theorem 1. In the parametric setting, we are challenged with the requirement of calculating the following parameter-dependent quantities efficiently, where we often omit the explicit dependency onx(μ) for the sake of readability: Since a direct calculation of these quantities is often too expensive, we employ rapidly computable and (in the ideal case) rigorous upper bounds similar to the nonparametric case Although all quantities are important, we will primarily focus on the efficient calculation of ub (μ) and provide comments about the role of the other quantities in the subsequent section. We recall that (μ) can be calculated explicitly by solving the following parametric linear equation for E(μ) ∈ X and by computing (μ) = E(μ) X . Lemma 3 shows how an upper bound for (μ) can be calculated based on an approximation E(μ) ∈ X of the solution E(μ). The idea to obtain such approximations in the context of RB methods is to employ another Petrov-Galerkin projection of the parametric ALP (P E (μ)) for a different pair of subspaces consisting of linearly independent basis functions and define the ansatz and project the ALP (P E (μ)) analogously to the original problem Note that this equation is of dimension M and can be solved efficiently, provided M is sufficiently small. To be able to state a rigorous upper bound ub (μ) for (μ), we define the residual R(μ) ∈ Y of the approximation of the ALP as We then get from Lemma 3 the upper bound Based on this, we denote as ub (μ) the parametric computable error bound stemming from Theorem 2 (10) where we use ub (μ) given by (15).

Improvement of classical RB bounds for linear elliptic problems
The RB method is classically applied in the context of parametric PDEs. In this section, we recall the basic error estimation results for linear elliptic problems and relate them to the bound presented in the previous section.
Let X be suitable Hilbert (function) space and consider the following weak formulation of a parameterized PDE: We assume a(·, ·; μ) : X×X → R to be a continuous bilinear form and f (·; μ) ∈ X , where X denotes the dual space of X. We further assume the following essential properties that ensure the well-posedness of (16) for any μ ∈ P and for each 0 = v ∈ X, there exists a u ∈ X such that a(u, v; μ) = 0. Provided these assumptions hold true, it is a well-known result that there exists a unique solution u * (μ) ∈ X to the problem (16) (cf. [6]). This problem fits in the general framework by setting Y := X and G(·; μ) : X → Y via Let us now assume that an RB approximationx(μ) ∈ X N for some suitable subspace X N with N = dim(X N ) d is given. Then, the classical relation between the error e(μ) :=x(μ) − x * (μ) ∈ X and the residual of the approximation is established via the the norm of the residual G(x(μ); μ) and reads as follows In this setting, the equation 1 β(μ) = γ (μ) relates the inf-sup constant to the stability constant in the abstract formulation in this paper. Recall that for linear problems, we have Hence, by not splitting the calculation of the residual and by directly applying an approximation scheme to (μ), we can expect more accurate error predictions.
To apply the improved error estimation technique, we setup the ALP, whose weak form in the linear case is given via Since this equation is as expensive as the original problem, we perform the additional RB approximation of the ALP according to the framework described in the previous section. To this end, we assume to have another subspace Recall that this is an M-dimensional equation that can be solved rapidly, similar to the RB approximation of the main problem. Based on the approximate solution, we then get according to Lemma 3 the error bound where R(μ) ∈ Y is the Riesz-representative of the residual of (17), when the approximation e(μ) replaces the true error e(μ).
Often, the calculation of the inf-sup constant β(μ) poses many difficulties when it comes to an efficient implementation. As a remedy, one often employs pessimistic lower bounds β lb (μ) to β(μ) that can be calculated rapidly. Such lower bounds can, for example, be computed by employing standard estimation techniques in the RB framework such as the min-θ scheme or the successive constraint method (SCM) (cf. [12,18]). However, they are often either not applicable, computationally involved or deliver highly imprecise results that render the classical RB error bounds useless. The following example demonstrates the influence of β(μ) onto the classical and improved error bound and shows the benefit of using the results presented in this article. By using the upper bound ub (μ) with ub (μ) and a lower bound of the infsup constant β lb (μ) := β(μ) λ with a parameter λ ≥ 1, we get the following estimates when using the classical RB bound and the improved version presented in this paper.
Since we expect that R(μ) Y G(x(μ); μ) Y , severe underestimations, i.e. assuming large λ, of the inf-sup constant have less impact in the nonsplit bound ub (μ). In particular, this property might be useful in cases for which a (probably pessimistic) lower bound β(μ) ≥β > 0 for all μ ∈ P is available, making expensive estimation techniques for those stability constants superfluous. We demonstrate the effect of this behavior in our numerical examples in Section 4.

Basis generation and offline/online efficient implementation
The essential idea of RB methods is to split the calculation into a potentially expensive offline phase where precomputations are performed which then allow a rapid online phase. During the offline step, the first task is to construct the subspaces, which in our case means finding a suitable basis for X N , Y N . To avoid technical difficulties and to ease the following, we only construct the ansatz space X N and set Y N = (X N ) .
While there are many ways to determine suitable subspaces (cf. [10]), we focus on snapshot-based techniques. In this case, the subspace is contained in the span of several true solutions, i.e. X N ⊂ span({x(μ 1 ), . . . , x(μ N )}) for suitable μ 1 , . . . , μ N ∈ P. Two popular techniques with this respect are the proper orthogonal decomposition (POD) method [25] and the class of greedy algorithms [24]. The first extracts the relevant information from a given set of solutions based on an eigenvalue decomposition of the empirical correlation operator of a set of snapshots S := {x(μ 1 ), . . . , x(μ N )} (cf. [10]). Greedy procedures, on the other hand, determine the parameter whose solution should be used to enhance the space based on its current approximation quality. In this section, we focus on the latter; however, we also make use of the first in our numerical section. The general structure of the greedy algorithm is as follows and the pseudocode is given in Algorithm 1: Starting from an initial subspace X 0 ⊂ X and a finite training set P train ⊂ P, the maximum approximation error is sought by evaluating an error indicator δ(·; μ) : X → R ≥0 for all reduced solutions with parameters in the training set P train . The subspace is then extended with the element that delivers the maximum error estimate and the loop continues until a prescribed tolerance is met. By this, we get an iterative scheme where the subspace is extended in each iteration. Algorithm 1: Greedy algorithm(P train , ρ, δ, X 0 ). Data: Training set P train , greedy tolerance ρ, error indicator δ, initial subspace X 0 Result: Subspace X N . 1 while max μ∈P train δ(x * N (μ); μ) > ρ do 2 Set μ * := arg max μ∈P train δ(x * N (μ); μ); 3 Solve full problem G(x; μ * ) = 0 for x * (μ * ) ∈ X; 4 Extend subspace X N+1 := X N span(x * (μ * )); 5 Increment N := N + 1;

end
For the approximation of the ALP, we have to identify another pair of subspaces X E M and Y E M . To this end, we proceed in an analogous fashion, i.e. we also restrict ourself to the case Y E M = (X E M ) ; however, instead of solving the full problem (P (μ)), we now solve the ALP (P E (μ)) in each greedy iteration. Additionally, the error indicator and tolerance have to be chosen in a sensible manner. In our case, we use the error indicator δ(Ê(μ)) := E RB (μ) = R E (μ) β lb (μ) which is a suitable (and even rigorous) choice. Ideally, one would like to build an error space X E M in each iteration of Algorithm 1 to use the here proposed improved error estimators in the basis building process, i.e. one would need to perform a (nested) double greedy algorithm. However, this proves to be highly computationally expensive since the error space X E M depends on the approximation space X N . Thus, we defer to a sequential computation of the spaces which makes the construction of both spaces computationally feasable. The pseudocode for this sequential double greedy algorithm is given in Algorithm 2.
Furthermore, this dependency of the error space X E M on the approximation space X N limits the usefulness of our proposed bound as an error indicator for the greedy algorithm since the space X E M has to be rebuild in every iteration.
Algorithm 2: Sequential Double Greedy algorithm(P train , P E train , ρ, ρ E , δ, δ E , X 0 , X E 0 ). Data: Training sets P train , P E train , greedy tolerances ρ, ρ E , error indicators δ, δ E , initial subspaces X 0 , X E 0 Result: Subspaces X N and X E M . 1 while max μ∈P train δ(x * N (μ); μ) > ρ do 2 Set μ * := arg max μ∈P train δ(x * N (μ); μ); 3 Solve full problem G(x; μ * ) = 0 for x * (μ * ) ∈ X; 4 Extend subspace X N+1 := X N span(x * (μ * )); span(E(μ * )); 11 Increment M := M + 1; 12 end Remark 4 1. We want to note that in contrast to our improved bound which is less sensitive to underestimation in the inf-sup constant β lb (μ), the chosen coarse error indicator δ might result in a less efficient approximation space. 2. The computational efficiency of our a-posteriori estimator is directly influenced by dim(X E M ). Thus, if one wants to achieve a fast online phase including error estimation, one might have to make sacrifices in the quality of the error space such that the computational overhead is comparable with the computational demands required for solving the reduced problem (P N (μ)).
Finally, we shortly address how an efficient online phase can be achieved: The classical assumption that is made in this respect is the parameter separability of the problem. Given the parametric problem G(·; μ) = 0, we assume that it can be decomposed into an expansion of the form i.e. it consists of parameter-dependent coefficient functions q : P → R and parameter-independent operators G q : X → Y . In cases where no such decomposi-tion is present, the (discrete) empirical interpolation method can be employed (cf. [8,16]). This property carries over to G N as Thus, one can easily assemble the reduced system by precomputing G N,q during the offline-phase. In the same way, the above property is inherited by DG and thus the reduced error problem can be handled analogously.

Numerical examples
In this section, we evaluate the proposed a-posteriori error estimation theory in the context of the RB method. The first example is a well-known thermal-block test case, modeling a parametric heat conduction problem on the unit square. Here we will see that by making use of the proposed method, we are able to reach excellent effectivities in any norm that we consider. The second example shows the application of the framework to a nonlinear finite-dimension problem that stems from a semidiscretized parametric PDE with nonvariational finite difference (FD) discretization. Finally, in the last example, we consider a parametric ARE, i.e. a parametric nonlinear matrixvalued equation. All examples are implemented in the toolbox RBmatlab 1 and were run on a machine with an Intel Core i7-6700 CPU with 16GB RAM in MATLAB 2017a.

Standard linear test case: thermal block model
The thermal block example is a well-known test example in the RB community (cf. [10,18]). It consists of a steady linear heat equation on the unit square = (0, 1) 2 , which is divided into B := B 1 · B 2 subblocks, where B 1 , B 2 ∈ N denote the number of subblocks per dimension. We denote the subblocks by i for i = 1, . . . B, counted rowwise starting from the left bottom. We prescribe a unit flux into the domain on the bottom boundary, which is denoted as N,1 with unit outward normal n(ξ ), where ξ ∈ indicates the spatial variable. The left and right boundary part N,0 is insulated, which is modeled by a zero Neumann boundary condition and the top Dirichlet boundary D has constant 0 temperature. A schematic drawing of the domain is provided in Fig. 1a. The parametric PDE for the temperature field u(·; μ) : → R for this example is given as . The parametric domain for this problem is given as We equip the space X with the norm x X = ∇x L 2 ( ) . It is a well-known fact that for every μ ∈ P, this problem possesses a unique solution u * (μ) ∈ X.
For the first tests, we pick B 1 = B 2 = 3, i.e. we have in total 9 parameters and μ max = 10. For the truth-approximation, we apply a finite-element approximation of the PDE with piecewise linear elements resulting in a d = 3 721 dimensional problem. The generation of the basis for the subspace X N for the RB-approximation of the problem is performed with a standard greedy procedure, see also Section 3.4. For this, we define the residual-based error estimator δ(u N (μ)) : . Since we take the norms in the space H 1 D ( ), we can define a lower bound to the inf-sup constant as β lb (μ) := min i=1,...,B μ i . The basis is constructed on a finite training set consisting of |P train | = 1 000 random elements chosen uniformly from P. We fix the tolerance ρ = 10 −3 which yields a basis for the approximation of size N = 62.
For the construction of the approximation space X E M for the approximation of the ALP DG| u N (μ) (E(μ)) = G(u N (μ); μ), where u N (μ) ∈ X N is the RB-approximation of the solution, we perform another greedy procedure, i.e. we apply Algorithm 2. In particular, we again invoke the standard residual-based error estimator applied to the ALP and define where u N (μ) is the RB approximation obtained from the 62-dimensional subspace and E(μ) is the current approximation of the ALP for the parameter μ. We run the greedy algorithm for the ALP with the very low tolerance ρ E = 10 −8 . Furthermore, we reuse the same 1 000 parameters that were chosen for the greedy procedure for the main problem. The basis construction results in a subspace X E M of dimension M = 118. Note that due to linearity of the problem the ALP corresponds to solving another thermal block problem with a distributed source term that is given by the residual G(u N (μ); μ). Figure 1b shows the decay of the error indicators for the main problem and the approximation of the ALP. We infer that the initial error for the ALP that is measured by the error indicator δ E is very low and in the magnitude of the true error, which is why we had to set the tolerance to ρ E = 10 −8 .
In the following, we compare the improved error estimation techniques that are presented in this paper to the standard error bounds that are very widely used in the RB context. As a first test, we use the H 1 D ( )-norm for the evaluation of the error bound and pick 20 random test parameters for the evaluation. For this test, we calculate the exact value of the stability constant γ (μ) by solving a ddimensional eigenvalue problem. Clearly this is not online efficient but the purpose of the first test is to solely demonstrate the improved quality of the error estimates. The results are presented in Fig. 2: We show the absolute value of the true error u N (μ) − u * (μ) H 1 D ( ) as well as the standard RB-error bound RB Recall that the latter choice corresponds to the split bound. The results in Fig. 2 show a very large overestimation in the range of 10−100 for all test parameters. To demonstrate the improved error estimation and the arbitrarily high effectivity in the linear case, the error bound ub (μ) with the improved approximation of (μ) is shown in the same figure. To this end, we choose decreasing tolerances ρ E and pick the subspace for E(μ) according to these tolerances. The plot clearly shows that an increasing dimension M improves the quality of the error  {74, 101, 118}), we get almost exact error prediction. In Fig. 3, we plot the true effectivities eff(μ) along with the effectivity predictions from Lemma 4. Starting from ρ E = 10 −5 , the bound on the effectivity is applicable whereas for ρ E = 10 −7 , the predicted effectivity is close to the actual effectivity. The inapplicability of the effectivity bound can also be concluded from the results depicted in Figs. 4 and 5, where the approximation of the nonsplit residual E X and weighted error residual γ R E Y is depicted for basis sizes N ∈ {40, 62} and and error space sizes M ∈ {20, 40, 60, 80, 100, 118}. To this end, we recall that the effectivity bound holds only if the quotient γ R E Y E X is bounded by 1 2 . We also note a rapid decline in the absolute value of the quotient which in turn translates to a effectivity (bound) close to 1, as can be seen in Fig. 3.
For the next test, we pick a larger test set P test ⊂ P consisting of 100 randomly chosen parameters. We compare the error estimation for the H 1 D ( ), L 2 ( ) and the so-called energy norm · μ , which is defined as u μ := √ a(u, u;μ) for u ∈ X and a fixed parameterμ ∈ P. We pick the parameterμ = (1, 2, 1, 2, . . . , 1) T . It as well-known fact that the error estimation in the energy norm delivers very accurate results, which is also visible in Table 1 where the maximum and mean effectivity for all three norms are provided. The first row corresponds to the standard RB bound. The column entitled λ shows the factor by which we overestimate γ (μ), i.e. we pick the upper bound γ ub (μ) := λγ (μ) for the calculations. In all cases, we observe a decay for decreasing tolerances ρ E , i.e. richer subspaces X E M for the ALP. In particular, for the largest basis (ρ E = 10 −8 ) and for λ = 1, we get exact error prediction over the whole parameter test set in the H 1 D ( ) and energy norm. As expected, the influence of large overestimations of γ (μ) has much less impact on the improved norm in all three cases. Recall that for the classical RB bound, the scaling directly enters in the bound, i.e. λ = 100 means an additional degradation in the effectivity of factor 100 whereas we observe only factor 1.2 − 38, depending on the chosen norm. Please note that in general, the residuals are only elements of (H 1 D ) , depending on the source term in the heat equation, and therefore the use of the L 2 ( ) norm is not possible. However, in our case, (no source term) and in the discrete setting, it is still applicable and was chosen to illustrate that the proposed method can be used for a variety of different norms.
Finally, we want to investigate the relation between the size of X N and X M which is needed to achieve a prescribed effectivity. For this purpose, we run the sequential double greedy algorithm (Algorithm 2) for N ∈ {10, 20, 30, 40, 50, 60}. We then select the basis size M of X M in such a way that the effectivity of the error bound ub is smaller than 8,4,2, and 1.1 on a test set of 50 randomly chosen parameters. The results are depicted in Fig. 6. For the cases eff ≤ 4, 2, 1.1, we    Fig. 7. We observe an overhead between 3 and 7 which decreases for increasing basis size M. This stems from the fact that for the basis sizes M used in this context, the computational cost of ub is dominated by the assembly of the reduced ALP. The relative impact of the assembly then decreases as the basis sizes N and M grow in value which in turn leads to a decrease in the overall relative computational overhead.

Nonlinear finite-dimensional parametric problems
The second example stems from the finite-difference discretization of a nonlinear reaction-diffusion-advection equation. The infinite-dimensional description for this problem is given by defining for ξ ∈ : The parameter for this example stems from the set μ = (μ 1 , μ 2 ) T ∈ [0.1, 1]× [1,10], where μ 1 describes the diffusivity of the problem and μ 2 changes the influence of the nonlinearity. The right-hand side function (source term) is given as f (ξ) := sin(ξ π ) 2 for ξ ∈ . The PDE is discretized in space with a simple finite-difference scheme with upwind flux and results in a d = 400 dimensional nonlinear problem of the form In this case, we pick the finite-dimensional spaces X = Y = R d and use the standard Euclidean norm for quantifying the error. We construct a subspace of dimension N = 6 by calculating snapshots for 100 random parameters chosen uniformly from P and by extracting the basis through the POD procedure. Note that this does not yield very accurate results for the RB-approximation but suffices to show the benefit of the improved error estimation theory presented in this paper. Figure 8a shows the solution to the full problem and to the reduced problem for three different parameters. For evaluating the error bound, we have to calculate DG|x (μ) , which yields where (a • b) i := a i b i for a, b ∈ R d denotes the componentwise product. From the explicit formula, we immediately get L(α; μ) ≤ 2μ 2 α =: L ub (α). In all of the following examples, the value of γ (μ) = DG(·; μ)| −1 x is calculated exactly by solving a high-dimensional eigenvalue problem.
For the application of the error bound, we have to construct a subspace X E M . To this end, we calculate solutions to the high-dimensional ALP for 50 random parameters a b Fig. 8 a Example solutions for three different parameters. b Results for increasing basis size from the training set and extract a basis by means of POD. For evaluating the bound, we calculate the error estimates for test parameters P test consisting of 200 parameters chosen randomly from the parameter domain. We then vary the size of the RB space for the approximation of the ALP and evaluate the mean and maximum effectivity of the valid error estimations over the whole parameter test set P test . The results are presented in Fig. 8b. Note that a size of M = 0 corresponds to the split upper bound and represents the result that is classically used for error estimation in the RB context. Increasing the size again reveals a very accurate error prediction uniformly over the parameter space. Recall that in the nonlinear problem, the bound is only applicable if τ (μ) ≤ τ ub (μ) ≤ 1, where τ ub (μ) is defined in (9). To show the benefit of using the improved error bound, the column entitled with "% valid" shows the fraction of valid error predictions. We observe that by increasing the dimension of X E M , the fraction increases from 57 to 78%. Note that the validity criterion is not always satisfied since the RB-approximation is too coarse. Hence, we cannot expect valid error estimations for all parameters unless we build richer subspaces X N . Once again we want to highlight the fact that for the RB-approximation of the ALP, the solution to an M-dimensional linear problem is required.

Parametric algebraic Riccati equations
The ARE is a nonlinear matrix-valued equation with many applications in systems theory such as optimal (feedback) control or optimal state estimation, see [14]. For X := {A ∈ R d×d | A = A T } =: R d×d sym and A, B X := trace(A T B), we define the mapping G(·; μ) : X → X via G(P (μ); μ) := A(μ) T P (μ) + P (μ)A(μ) − P (μ)F (μ)P (μ) + Q(μ), (18) where A(μ) ∈ R d×d and F (μ), Q(μ) ∈ R d×d sym with F (μ) and Q(μ) being positive-semidefinite matrices. It is a well-known fact that this equation has a unique (stabilizing) solution P * (μ) ∈ R d×d sym (i.e. the eigenvalues of (A(μ) − F (μ)P (μ)) have negative real part and P (μ) is positive-semidefinite) provided the matrix A(μ), F (μ), and Q(μ) satisfy specific conditions [15]. Often, in high-dimensional applications, the solution matrices P * (μ) are of low numerical rank, meaning they can be efficiently approximated by low-rank factorizations of the form P * (μ) ≈ Z * (μ)Z * (μ) T for Z * (μ) ∈ R d×K with K d (cf. [2]). This special structure is exploited in [20] in a parametric setting, where the low-rank factor greedy algorithm (LRFG) is introduced to make use of the special structure and to construct a suitable subspace for the RB-approximation of the ARE. This is done by defining the N-dimensional subspace X N := {V P N V T | P N ∈ R N×N sym } ⊂ X for a suitable basis matrix V ∈ R d×N which is then used for the approximation via the Galerkinprojection V T G(V P N (μ)V T ; μ)V = 0. It can be shown that P N (μ) solves an N-dimensional ARE that can be solved very efficiently for low N. Based on the lowdimensional matrix P N (μ), we then define the approximationP (μ) := V P N (μ)V T and the error e(μ) =P (μ) − P * (μ). For measuring the error, we pick the spectral norm in the space X. The application of the error bound requires the quantities L(α; μ) and DG(·; μ) −1 L (X,X) . Due to the quadratic nature of the ARE, the derivative is readily calculated as DG(·; μ)|P (μ) for some matrix P ∈ R d×d sym . Furthermore, an upper bound for L(α; μ) can be derived The linearization DG(·; μ)|P (μ) of the ARE results in a Lyapunov operator and the norm of its inverse is well studied (cf. [22]). For the following calculations, we use the exact value for γ (μ), which can be obtained by solving a high-dimensional Lyapunov equation and by taking the norm of the solution (cf. [13]). We test the error estimation procedure by applying the RB-ARE method to a mathematical model of an optimal cooling process for steel profiles arising in rolling mills. The original model is a nonlinear heat equation for the temperature distribution of the cross-section of the steel profile with boundary control. In the technical application, the natural cooling process is supported by spraying cooling fluids onto the surface. The control objective is to optimally balance between a fast cooling process and an even temperature distribution. This is necessary to avoid deformations, brittleness, and other undesirable effects. A detailed explanation of the model and corresponding optimal control problem can be found in [5]. The resulting optimal control problem takes the form subject to Mż(t) = Nz(t) + H u(t), y(t; μ) = Cz(t), t ≥ 0, where we introduce parameter-dependent weights for the cost functional by setting Q(μ) = I 6 μ Q and R(μ) = I 7 μ R with μ Q ∈ [10 −4 , 0.1] and μ R ∈ [10 −4 , 1] and I n denoting the n-dimensional identity matrix. The unusual overbars are used to prevent notation doubles. The system matrices are given as M, N ∈ R d×d , H ∈ R d×7 , and C ∈ R 6×d with d = 20 209. It is a well-known fact that the solution to this optimal control problem is given by finding the stabilizing solution P * (μ) to the ARE (18)  can be downloaded from the MORWiki. 2 The high dimension of the parametric ARE raises the need for efficient techniques for its solution. Hence, we apply the RB technique to the ARE. For testing the error bound, we construct an N = 137 dimensional subspace X N ⊂ R d×d sym by running the LRFG algorithm on a test set consisting of 900 training parameters that were chosen from a grid consisting of 30 × 30 logarithmically distributed points in the parameter domain. Recall that the equation under consideration is matrix-valued. Hence, the reduction from d to N provides a huge computational benefit and speed-up. The basis generation for the ALP is performed by calculating the solutions E(μ) to the ALP DG(·; μ)|P (μ) (E(μ)) = G(P (μ); μ) for a prescribed set of 50 parameters chosen randomly from the parameter domain. The basis is then extracted via a POD of the columns of the matrix S := [E(μ 1 ), . . . , E(μ 50 )] with a prescribed tolerance ρ E for the "energy" that is contained in the singular values σ k , i.e. we extract l basis elements with l := arg min j ∈{0,...,d} We pick ρ E = 10 −3 , which results in an M = 189 dimensional subspace for the ALP. For details about the basis generation, we refer to [19].
For testing the error bound, we pick a test set P test ⊂ P consisting of 100 parameters chosen randomly from the parameter domain. Figure 9a shows the evaluation of the maximum effectivity for the full error bound (μ), the upper bound ub (μ) with using ub (μ), and the split bound split (μ). First of all, we observe a very good agreement of the true error and the full error bound, where no additional upper bounds for the quantities are employed. The mean overestimation of the nonsplit upper bound is relatively low which indicates good error prediction. However, when going from the nonsplit bound to the split bound, we see a large gap between the results. These results show that the improved error estimation presented in this paper is vital to get accurate predictions of the true error. As a last test, we explore the influence of the dimension of the subspace onto the quality of the error estimate. To this end, we vary the tolerance ρ E for the construction of X E M , determine the corresponding subspaces, calculate the error bound for those subspaces, and plot the worst-case (maximum) effectivity along with the dimension M in Fig. 9b. Of course, the dimension of the subspace grows with the extraction of more information up to M = 254. On the other hand, the maximum effectivity decreases to an almost perfect error prediction for ρ E = 10 −6 with a maximum effectivity of only 2.6. a b Fig. 9 a Mean and maximum effectivity for error bounds for the ARE. b Dimension of the subspace X E M (bars) and maximum effectivity (line) for varying ρ E

Conclusion and outlook
In this article, we presented a novel improvement of error bounding techniques for problems which can be described as a zero value problem for differentiable operators over two Banach spaces. This was achieved by introducing and solving an auxiliary problem which counteracts the often severe overestimation that occurs when applying standard error bounding techniques and resulted in the here presented ALP-based error bounds. The resulting a-posteriori error bound shows significant improvement in its effectivity. Furthermore, the quality of the error prediction can be tuned by choosing richer subspaces for the approximation of E. The technique was then applied in the context of RB methods, where comparisons with standard error estimates were studied. Numerical examples for both, linear and nonlinear problems, highlight the benefits of the presented technique and show that we can reach effectivities that are very close to one in all examples.
Future work will study the application of the here presented method to time dependent-problems both continuous and discrete in time, as well as the applicability as an effective estimator for adaptive approximation schemes. A comparison of the proposed method to [11], especially in the case of linear problems, might be insightful. permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. However, the bound can be slightly refined by considering for x ∈ M from which we get for x =x ∈ M the final estimate