1 Introduction

When dealing with non-local operators, low-rank approximation methods are turning out to be a method of choice both for theoretical analysis as well as a foundation for constructing high performance numerical algorithms. We will concentrate on non-local operators defined as solutions to the continuous time Lyapunov operator equation with a rank one operator as the right-hand side coefficient. This equation can be formally written as

$$\begin{aligned} AX+XA=bb'~. \end{aligned}$$
(1.1)

Here we assume that A is an unbounded (differential) positive definite self-adjoint operator in a Hilbert space and \(b'\) is a functional on \({{\mathscr {X}}}={{\,\mathrm{dom}\,}}(A^{1/2})\), the domain of the square root of A. A typical example of the operator A would be a reaction-diffusion operator in \({\mathscr {H}}=L^2(\Omega )\). The formal expression (1.1) can be justified in the framework of Gelfand triplets \({\mathscr {X}}\subset {\mathscr {H}}\subset {\mathscr {W}}\) where \({\mathscr {W}}={\mathscr {X}}'\) is the topological dual of \({\mathscr {X}}\) and \(\Vert \cdot \Vert _{\bullet }\), \(\bullet ={\mathscr {X}},{\mathscr {H}},{\mathscr {W}}\) denote the corresponding norms [20]. We use the notation \(b^*\) to denote the functional \(b^*:{\mathscr {H}}\rightarrow {\mathbb {C}}\) and \(b'\) to denote the functional \(b':{\mathscr {X}}\rightarrow {\mathbb {C}}\). Note that \({\mathscr {H}}\) and \({\mathscr {X}}\) do not have the same scalar product and so the reason for the difference in notation. The notation \(b'\in {\mathscr {X}}'\) implies that we have not identified \({\mathscr {X}}\) and \({\mathscr {X}}'\), whereas the notation \(b^*\) assumes that \({\mathscr {H}}\) was identified with its dual.

Under the assumption of the positive definiteness of A, the equation (1.1) has a unique positive solution \(X\in {\mathscr {L}}({\mathscr {H}})\). Furthermore, the fact that the operator \(bb'\) is of rank one implies that X is contained in every Schatten ideal \({\mathscr {S}}_p\), \(p\in {\mathbb {N}}\), see [26]. In particular, this implies that the trace \({{\,\mathrm{tr}\,}}(X)\) is finite. This will be the quantity of interest which we will monitor in the presented numerical experiments. The physical relevance of \({{\,\mathrm{tr}\,}}(X)\) stems from the fact that it represents the total output energy of the formal Cauchy system

$$\begin{aligned}&\dot{x}=-Ax,\\&x(0)=b. \end{aligned}$$

To see this, notice that the solution of this initial value problem is then given by the formula \(x(t)=\exp (-tA)b\) and then

$$\begin{aligned}&E^2=\int _0^\infty ~\Vert x(t)\Vert ^2~dt=\int _0^\infty ~(x(t),x(t))~dt\\&\quad \qquad =\int _0^\infty ~{{\,\mathrm{tr}\,}}(x(t)x(t)^*)~dt\\&\quad \qquad ={{\,\mathrm{tr}\,}}\left( \int _0^\infty (\exp (-tA)b)(\exp (- tA)b)^*~dt\right) ={{\,\mathrm{tr}\,}}(X) \end{aligned}$$

follows. A more common application of the Lyapunov equation is in the study of the model order reduction for the linear control systems \(\dot{x}(t)=-Ax(t)+bu(t)\), by means of balanced truncation [1, 22]. We use precisely this context to justify the term loading for the function b.

By \(\Vert X\Vert =\sqrt{{\text {spr}}(X^*X)}\), where \({\text {spr}}\) is the spectral radius of an operator, we denote the operator norm of X. The main goal of this paper is to construct a low-rank approximation \(Z_r=\sum _{j=1}^r z_jz_j^*\) of X using a computationally efficient approximation method. For a given tolerance \(\tau >0\) such an algorithm constructs \(Z_r\) such that \(\Vert Z_r-X\Vert \le \tau \) and r is a small integer. In this paper we will be looking for r such that for the given \(\tau >0\) we have \(\Vert Z_r-X\Vert \le \tau \) and \(|{{\,\mathrm{tr}\,}}(Z_r)-{{\,\mathrm{tr}\,}}(X)|\le \tau \).

Let \(\mu _1(H)\ge \mu _2(H)\ge \cdots \) denote the eigenvalues, counting by multiplicity, of some compact self-adjoint operator H. For the solution X of (1.1) assume \(\mu _r(X)>\mu _{r+1}(X)\) for some \(r\in {\mathbb {N}}\). The Weyl’s theorem [6, 36] for compact operators now implies

$$\begin{aligned} |\mu _i(X)-\mu _i(Z_r)|\le \Vert X-Z_r\Vert , \;\; i=1,\ldots ,r~. \end{aligned}$$

Let now \(P_{{\mathscr {E}}_r}\) be the orthogonal projection onto the eigensubspace belonging to the r largest eigenvalues of X and let \(P_{\mathscr {Z}}\) be the orthogonal projection onto the subspace \({\mathscr {Z}}=\text {span}\{z_1,\ldots ,z_r\}\). The Davis-Kahan \(\sin \Theta \) theorem [8] yields the estimate

$$\begin{aligned} \Vert P_{{\mathscr {E}}_r}-P_{{\mathscr {Z}}}\Vert \le \frac{1}{\mu _{r+1}(X)-\rho _r({\mathscr {Z}},X)}\Vert X-Z_r\Vert , \end{aligned}$$

where \(\rho _r({\mathscr {Z}},X)\) is the Rayleigh Quotient, formally defined below. In the view of this the space \({\mathscr {Z}}=\text {span}\{z_1,\ldots ,z_r\}\) can be seen as an approximation to the eigenspace belonging to the r largest eigenvalues of X. In this paper we will interpret the low-rank approximation problem for the Lyapunov equation as the subspace approximation problem and use this context to assess the quality of the constructed vectors \(z_i\), \(i=1,\ldots ,r\). The algorithm which we will present will be formulated for the divergence form operators posed in a polygonal planar domain. The vectors \(z_j\) will be constructed as elements of a low-dimensional space of continuous piecewise polynomial functions [23]. We will present our results, as far as possible, in their abstract form and specialize to particular divergence form operators when presenting numerical examples. The main theme of the paper is an interplay of the low-rank approximation methods based on spectral calculus [11] and the utilization of the regularity of the eigenvectors of X in the construction of higher order adaptive approximation methods [2, 23].

1.1 The novelty of the paper

We construct an a posteriori error estimator for an approximation to the solution of the Lyapunov equation for the divergence form operator posed in a polygonal domain in \({\mathbb {R}}^2\). The estimator is based on the auxiliary subspace technique from [3]. We first show that the estimator can be approximated by refining the auxiliary subspace. This yields our first practical algorithm. Then, we present an analysis of the estimator based on the perturbation theory for the Rayleigh Quotients from [37] combined with the use of an eigenvalue saturation assumption from [25]. This yields an alternative computable error indicator. We justify the use of the eigenvalue saturation assumption based on a perturbation argument, showing that the solution X is a small perturbation of an operator with A-analytic eigenvectors. We test the estimator on a sequence of graded meshes and observe that it correctly indicates the portions of the domain where the mesh should be refined but also does not indicate those which are not relevant for a particular solution. We also show that we observe exponential square root convergence—even in the case of non convex polygonal domains—as we would when approximating piecewise analytic functions in a domain of \({\mathbb {R}}^2\).

1.2 The outline of the paper

The rest of the paper is structured as follows: We will first in Sect. 2 introduce the basic notation and present the motivating example. In Sect. 3 we review the background results on the approximation methods for the Lyapunov equation and present the result on approximate A-analyticity of the eigenvectors of the solution operator. We then introduce the basic conventions from the hp-finite element approximation theory in Sect. 4. In Sect. 5 we present an a posteriori error indicator and a framework for an analysis of its reliability. We then present a greedy hp-refinement strategy based on the error balancing approach. We continue with further Numerical experiments in Sect. 6. A sketch of the theoretical result related to the case in which the forcing \(b\in {\mathscr {X}}'\) is such that \(\Vert A^{-\alpha }b\Vert <\infty \) for some \(0\le \alpha <1/2\) is outlined in the “Appendix A”.

2 Basic definitions and a motivating example

In this section we will present a motivating numerical example and introduce the basic notation and results which are needed to interpret the result. This will in particularly include the introduction of the regularity classes associated with an operator A and the review of the perturbation analysis of the Rayleigh Quotient.

Let us recall the following definitions of the regularity classes defined by the positive definite operator A, see [24, 30]. According to [30, Section 7.4] a vector is called the infinite vector of the operator A if it is the element of the set \({\mathscr {A}}_\infty =\cap _{n\in {\mathbb {N}}}{{\,\mathrm{dom}\,}}(A^n)\). A vector \(x\in {\mathscr {A}}_\infty \) is called an A-bounded vector if there exists a constant \(B_x>0\) such that

$$\begin{aligned} \Vert A^nx\Vert \le B_x^n,\qquad n\in {\mathbb {N}}_0~, \end{aligned}$$

and we write \(x\in {\mathscr {A}}_b\). A vector \(x\in {\mathscr {A}}_\infty \) is called an A-analytic vector if there exists a constant \(C_x>0\) such that

$$\begin{aligned} \Vert A^nx\Vert \le C_x^n n!,\qquad n\in {\mathbb {N}}_0~, \end{aligned}$$

and we write \(x\in {\mathscr {A}}_a\). It holds that \({\mathscr {A}}_b\subset {\mathscr {A}}_a\subset {\mathscr {A}}_\infty \) and so eigenvectors of an operator A are A-bounded vectors. The vector \(v=\exp (-tA)b\), for \(b\in {\mathscr {H}}\) and \(t>0\), is an example of an A-analytic vector, see [30, Example 7.5].

Let now \(\Omega \subset {\mathbb {R}}^2\) be a bounded domain and let A be a divergence type positive definite (elliptic and self-adjoint) operator in \({\mathscr {H}}=L^2(\Omega )\). Then we have a more detailed description of the regularity spaces. In the case in which A is a divergence form operator with analytic coefficients and b is also analytic and \(\Omega \) has at least a \(C^2\) regular boundary, an A-analytic function is an (real) analytic function in the classical sense, and in particular the function \(v=A^{-1}b\) is analytic in the interior of \(\Omega \), [24]. In the case in which A is a divergence type operator with piecewise analytic coefficients, b is piecewise analytic, and the boundary of \(\Omega \) is also piecewise analytic the solution \(v=A^{-1}b\) is still infinitely differentiable. Further, the function v can be represented as a sum of an analytic function and a function whose singularities are concentrated at the corners. So even though we cannot control the growth of the derivatives in the classical sense, the solution is contained in all weighted Sobolev spaces, where the weighting function is the distance of a point to the corners of the domain [23].

The two prototype domains with the associated low-rank approximations to the solution of the associated Lyapunov equation are presented in the following examples. For the coefficients of the Lyapunov equation \(AX+XA=bb^*\) we choose \(b(x_1,x_2)\) to be the bell curve concentrated at (1/2, 1/2):

$$\begin{aligned} b(x_1,x_2) = \exp \left( {-\delta \left( x_1-\frac{1}{2}\right) ^2-\delta \left( x_2-\frac{1}{2}\right) ^2}\right) , \;\;\delta >0 \end{aligned}$$

and we take A to be the Laplace operator with the Dirichlet boundary conditions in the corresponding domain \(\Omega \).

Fig. 1
figure 1

A representative example: An elliptic domain with an exponential bell curve concentrated at (1/2, 1/2). The columns of X can be plotted and the first one is illustrated. In this case the operator has numerical rank = 2, and the corresponding modes are shown. The dimension of the finite element space is \(10^4\)

Fig. 2
figure 2

A representative example: A dumbbell domain with an exponential bell curve concentrated at (1/2, 1/2). The columns of X can be plotted and the first one is illustrated. In this case the operator has numerical rank \( = 2\), and the corresponding modes are shown. The dimension of the finite element space is \(10^4\)

Example 2.1

(Ellipse) Let us choose the computational domain \(\Omega \) as an ellipse. The operator A in the Lyapunov equation (1.1) is taken to be the Laplace operator with the zero Dirichlet boundary conditions and we set \(\delta =5\) for the load vector b. This is the regularity setting as in the classical paper by Nelson [24] and it ensures that A-analytic vectors are real analytic functions. The solution X is remarkably of (numerical) rank \(= 2\). See Fig. 1 for illustrations of the loading, the first column of X, and two eigenmodes of X.

Example 2.2

(Dumbbell A) Let us consider a classical Laplace dumbbell problem with computational domain \(\Omega = ([0,2.4]\times [0,1]){\setminus } (([1,1.4]\times [0,0.3]) \cup ([1,1.4]\times [0.7,1]))\). We choose the operator A in the Lyapunov equation (1.1) to be the Laplace operator with the zero Dirichlet boundary conditions and set \(\delta =50\) for the load vector b. The solution X is again of (numerical) rank \(= 2\) and we see that the vectors \(z_1\) and \(z_2\) are highly regular away from the corners of the domain. See Fig. 2 for illustrations of the loading, the first column of X, and two eigenmodes of X.

We will analyze the eigenvalues of a positive compact operator H using variational techniques. Let \(x\in {\mathscr {H}}{\setminus }\{0\}\) be given, then

$$\begin{aligned} \rho (x,H)=\frac{(x,Hx)}{(x, x)} \end{aligned}$$

is the Rayleigh Quotient of the vector x for the operator H. In the case in which we are given a r-dimensional subspace \({\mathscr {Z}}\subset {\mathscr {H}}\) we define the Rayleigh Quotients

$$\begin{aligned} \rho _i(\mathscr {Z},H)=\max _{\begin{array}{c} \mathscr {S}\subset \mathscr {Z}\\ \ dim\mathscr {S}=i \end{array}}\min _{x\in \mathscr {S}{\setminus }\{0\}}\frac{(x,Hx)}{(x, x)}~. \end{aligned}$$

Obviously, for the r dimensional subspace \({\mathscr {Z}}\) we have \(\mu _i(H)\ge \rho _i({\mathscr {Z}},H)\), \(i=1,\ldots ,r\). We will now review some basic results on the Rayleigh Quotient analysis from [15, 17, 37]. Let \(H:{\mathscr {H}}\rightarrow {\mathscr {H}}\) be a positive compact operator and let u, \(\Vert u\Vert =1\) be an eigenvector so that \(Hu=\mu u\), \(\mu \in \mathrm {Spec}(H)\). For a given non-zero vector \(\psi \in {\mathscr {H}}\) we have the estimate

$$\begin{aligned} |\mu -\rho (\psi ,H)|\le \mu _1(H)\sin ^2(\angle \{\psi ,u\})~. \end{aligned}$$
(2.1)

This estimate is obviously very accurate for Rayleigh Quotients \(\rho (\psi ,H)\) which are close to \(\mu _1(H)\). This is precisely the setting in which we expect to find ourselves. This bound has a subspace extension which can be used to treat eigenvalues with higher multiplicities or clusters of eigenvalues, [15, 18]. We measure the distance between finite dimensional subspaces \({\mathscr {X}}\) and \({\mathscr {Y}}\) using the concept of the principal angle. Let \(P_{\mathscr {X}}\) and \(P_{\mathscr {Y}}\) be the orthogonal projections onto \({\mathscr {X}}\) and \({\mathscr {Y}}\) respectively. The vector of the sines squared \(\sin ^2_p\Theta ({\mathscr {X}},{\mathscr {Y}})\) of the principal angles between \({\mathscr {X}}\) and \({\mathscr {Y}}\) is defined using eigenvalues of the positive self-adjoint operator \(S=I-P_{\mathscr {X}}P_{\mathscr {Y}}P_{\mathscr {X}}\) as \(\sin ^2_p\Theta _i({\mathscr {X}},{\mathscr {Y}})=\mu _i(S)\). Since the operator S is a self-adjoint operator, we define other trigonometric functions of the principal angles using spectral calculus. We define the sine of the maximal principal angle as \(\sin ^2\Theta ({\mathscr {X}},{\mathscr {Y}})=\Vert I-P_{\mathscr {X}}P_{\mathscr {Y}}P_{\mathscr {X}}\Vert \).

Under the assumption that \(\mu _r(H)>\mu _{r+1}(H)\), the crudest estimate from [18, Theorem 2.2] reads

$$\begin{aligned} \sum _{i=1}^r\mu _i(H)-\sum _{i=1}^r\rho _i({\mathscr {Z}},H)\le \mu _1(H)\sum _{i=1}^r\sin ^2_p\Theta _i({\mathscr {E}}_r,{\mathscr {Z}}) \le \mu _1(H)r\sin ^2\Theta ({\mathscr {E}}_r,{\mathscr {Z}})~. \end{aligned}$$

Here \({\mathscr {E}}_r\) denotes the eigensubspace associated to the eigenvalues \(\mu _i(H)\), \(i=1,\ldots ,r\). This estimate implies, using [16, Lemma 5.5] and [35],

$$\begin{aligned} \sum _{i=1}^r\mu _i(H)-\sum _{i=1}^r\rho _i({\mathscr {Z}},H)\le \mu _1(H)\sum _{i=1}^r\Vert v_i-z_i\Vert ^2, \end{aligned}$$
(2.2)

where \(Xv_i=\mu _i(X)v_i\), and \(v_i\) are an orthonormal set of vectors.

3 The Lyapunov equation

In this section we will review the low-rank approximation estimates for the solution X of the Lyapunov equation (1.1). Also, we will review basic results on the projection based approximation methods for the Lyapunov equation.

Given a Gelfand triple \({\mathscr {X}}\subset {\mathscr {H}}\subset {\mathscr {W}}\) of Hilbert spaces, where \({\mathscr {W}}= {\mathscr {X}}'\) is the dual space to \({\mathscr {X}}\), we consider an unbounded operator A such that its range is in \({\mathscr {W}}\) and its domain of definition is given by \({\mathscr {X}}= {{{\,\mathrm{dom}\,}}_{\mathscr {W}}(A)=\{x\in {\mathscr {H}}~:~\Vert Ax\Vert _{\mathscr {W}}<\infty \}}\). We let \(A':{\mathscr {W}}\rightarrow {\mathscr {X}}\subset {\mathscr {W}}\) denote the dual operator to A in the duality paring \(\left<\cdot ,\cdot \right>=\left<\cdot ,\cdot \right>_{{\mathscr {W}}\times {\mathscr {X}}}\). Moreover, we consider a (not necessarily bounded) linear operator \(B: {\mathscr {U}}\rightarrow {\mathscr {W}}\) for a Hilbert space \({\mathscr {U}}\) with inner product \(\big (\cdot ,\cdot \big )_{\mathscr {U}}\).

The operators AB give rise to the Lyapunov operator equation in a linear operator X:

$$\begin{aligned} AX+XA'=-BB', \end{aligned}$$
(3.1)

which formally stands for the variational formulation

$$\begin{aligned} \big< AXz_1,z_2\big>_{{\mathscr {W}}\times {\mathscr {X}}}+\big < XA'z_1, z_2\big >_{{\mathscr {W}}\times {\mathscr {X}}}={\mathfrak {b}}(z_1,z_2),\qquad z_1, z_2\in {\mathscr {X}}, \end{aligned}$$
(3.2)

with the sesquilinear form \({\mathfrak {b}}(z_1,z_2)= -\big ( B'z_1,B' z_2\big )_{{\mathscr {U}}}\). We refer to, e.g., [7, 14, 29] for a more detailed discussion of this equation.

We now consider the situation when A is positive definite self-adjoint on \({\mathscr {H}}\), and in addition has a compact resolvent. We choose \({\mathscr {W}}= {\mathscr {H}}_{-1/2}\), which is equipped with the scalar product \((\cdot , A^{-1}\cdot ) = (A^{-1/2} \cdot , A^{-1/2}\cdot )\), and \({\mathscr {X}}= {\mathscr {H}}_{1/2} = {{\,\mathrm{dom}\,}}_{{\mathscr {H}}}(A^{1/2})\).

Additionally, we assume that that the product \(A^{-1/2}B\) is bounded. This is equivalent to the assumption that

$$\begin{aligned} b(\psi ,\phi )=-{\mathfrak {b}}(A^{-1/2}\psi , A^{-1/2}\phi ) \end{aligned}$$

is everywhere defined and bounded on \({\mathscr {H}}\). The substitutions \(\psi =A^{1/2}z_1\) and \(\phi =A^{1/2}z_2\) then allow us to turn (3.2) into the equivalent equation

$$\begin{aligned} \big (A^{1/2}\psi , XA^{-1/2}\phi )+(XA^{-1/2}\psi , A^{1/2}\phi )= b(\psi ,\phi ), \qquad \psi ,\phi \in {\mathscr {X}}. \end{aligned}$$
(3.3)

3.1 Approximate A-analyticity of the eigenvectors

A result of [11] showed that in the case in which \(b\in {\mathscr {H}}\) and A has the compact resolvent the following estimate holds

$$\begin{aligned} \Vert X-\sum _{p=-k}^k~\omega _p\exp (-t_pA)b(\exp (-t_pA)b,\cdot )\Vert \le \frac{C_{St}\Vert b\Vert ^2}{2\lambda _1}\exp (-\pi \sqrt{k})~. \end{aligned}$$
(3.4)

The constant \(C_{St}\) can be bounded to be less than 3, and given \(k\in {\mathbb {N}}\) the weights \(\omega _p\) and the nodes \(t_p\) are given by an explicit formula (see [11])

$$\begin{aligned}&t_p=\log (\exp (p\pi /\sqrt{k})+\sqrt{1+\exp (2p\pi /\sqrt{k})})/(2\lambda _1 ),\nonumber \\&\omega _p=\pi /(2\sqrt{k(1+\exp (p\pi /\sqrt{k})}\lambda _1)~, \end{aligned}$$
(3.5)

for \(p=-k,\ldots ,k\) and \(\lambda _1\) denoting the smallest eigenvalue of A. This construction can still be performed under the weaker assumption that only \(\Vert A^{-\alpha }b\Vert <\infty \) for some \(\alpha \), \(0\le \alpha <1/2\). Under this assumption one needs to modify \(\omega _p\) and \(t_p\) to reflect the changed asymptotic (as \(t\rightarrow 0\)) behavior of \(\exp (-tA)b\) and the fact that we are using a sinc quadrature formula for the sector rather than for the strip as was used in [11], see [27]. For further details of this construction and for ramifications in the context of this paper, see “Appendix A”. The assumption \(\Vert A^{-\alpha }b\Vert <\infty \) also covers the standard boundary control setting for the Lyapunov equation, see [20, 27].

The ramifications of this result are twofold. First it indicates that X can be represented by the sum of rank one operators and that the error decays exponentially in the square root of the number of terms in the sum. But equally important is the second consequence, the range of the operator

$$\begin{aligned} X_k=\sum _{p=-k}^k~\omega _p\big (\exp (-t_pA)b\big )(\exp (-t_pA)b,\cdot ) \end{aligned}$$
(3.6)

is finite dimensional and it is spanned by the A-analytic vectors \(v_p=\exp (-t_pA)b\).

Proposition 3.1

Assume A is self-adjoint and positive definite and let \(\tau >0\) be given. By X denote the unique self-adjoint and positive solution of (3.2). Then there exist numbers r and k such that

$$\begin{aligned} |{{\,\mathrm{tr}\,}}(X)-\sum _{i=1}^r\mu _i(X_k)|\le \tau ~{{\,\mathrm{tr}\,}}(X) \end{aligned}$$

and eigenvectors \({\hat{u}}_i\ne 0\), \(X_k{\hat{u}}_i=\mu _i(X_k){\hat{u}}_i\) are A-analytic.

Proof

Since X is trace class, there exists a number r such that \(|{{\,\mathrm{tr}\,}}(X)-\sum _{i=1}^r\mu _i(X)|\le (\tau {{\,\mathrm{tr}\,}}(X))/2\). Further, let \(k\in {\mathbb {N}}\) be the smallest k such that

$$\begin{aligned} \Vert X-X_k\Vert \le \frac{C_{St}\Vert b\Vert ^2}{2\lambda _1}\exp (-\pi \sqrt{k})\le \frac{\tau }{2r}{{\,\mathrm{tr}\,}}(X) \end{aligned}$$

then

$$\begin{aligned}&|{{\,\mathrm{tr}\,}}(X)-\sum _{i=1}^r\mu _i(X_k)|=|{{\,\mathrm{tr}\,}}(X)-\sum _{i=1}^r\mu _i(X)+ \sum _{i=1}^r\mu _i(X)-\sum _{i=1}^r\mu _i(X_k)|\\&\quad \le (\tau {{\,\mathrm{tr}\,}}(X))/2 + \sum _{i=1}^r|\mu _i(X)-\mu _i(X_k)|\\&\quad \le (\tau {{\,\mathrm{tr}\,}}(X))/2+(\tau {{\,\mathrm{tr}\,}}(X))/2~. \end{aligned}$$

The last inequality follows by the use of the Weyl’s theorem which implies

$$\begin{aligned} |\mu _i(X)-\mu _i(X_k)|\le \Vert X-X_k\Vert \le \tau /(2r){{\,\mathrm{tr}\,}}(X)~, \end{aligned}$$

from which the final estimate follows by summing over i. \(\square \)

We can interpret (3.4) as a substitute regularity result. If we consider the actual operator X as a perturbation of the operator \(X_k\) then we can say that with exponentially (in \(\sqrt{k}\)) decaying tolerance the eigenvectors of X are close to \({\mathscr {A}}_a\) eigenvectors of the operator \(X_k\). This in turn implies that a high order method, such as an hp-adaptive finite element method for operators posed in \({\mathscr {H}}=L^2(\Omega )\) might be able to exploit this regularity to construct a high performance solver which will mix low-rank numerical linear algebra with full finite element piecewise polynomial adaptivity. Also, this indicates that we might view the low-rank approximation task as a task of computing the approximation to the r dominant eigenvalues of the operator X.

3.2 Galerkin approximation of the Lyapunov equation

We will first present abstract approximation results. Let A be positive definite and self-adjoint operator with a compact resolvent. Let us further assume that we have constructed a sequence of finite dimensional subspaces \({\mathscr {V}}_s\subset {\mathscr {X}}={{\,\mathrm{dom}\,}}(A^{1/2})\), \(s\in {\mathbb {N}}\) such that \({\mathscr {V}}_{s_1}\subset {\mathscr {V}}_{s_2}\) for \(s_1<s_2\) and the orthogonal projections \(P_s\) onto \({\mathscr {V}}_s\) converge strongly to the identity operator as \(s\rightarrow \infty \).

According to [20, Section 4.1.2 and Section 5], the Galerkin projection \(A_s:{\mathscr {V}}_s\rightarrow {\mathscr {V}}_s\) is given by the formula

$$\begin{aligned} A_s=(A^{1/2}P_s)^*(A^{1/2}P_s). \end{aligned}$$

It holds that

$$\begin{aligned} \Vert A_s^{-1}P_s v-A^{-1}v\Vert _{{{\,\mathrm{dom}\,}}(A^{1/2})}&\le 2\Vert (I-P_s)A^{-1}v\Vert _{{{\,\mathrm{dom}\,}}(A^{1/2})}~, \end{aligned}$$
(3.7)

and furthermore \(\Vert A_s^{-1}P_s-A^{-1}\Vert \rightarrow 0\). For more on the convergence of discrete operator approximations see [20, Section 4] and classical references [6, 28]. With this notation we define the operator \(X_s:{\mathscr {V}}_s\rightarrow {\mathscr {V}}_s\)—in a generic situation when instead we have some finite element subspace \({\mathscr {V}}\) we tacitly write \(X_{\mathscr {V}}\)—as the solution of the finite dimensional operator equation

$$\begin{aligned} A_s X_s+X_s A_s=-(P_s b)(P_s b)^*~. \end{aligned}$$
(3.8)

Under additional assumptions on the uniformity of the sequence \(P_s\), a general result from [20, Theorem 4.1.4.1] states that \(\Vert X_sP_s- X\Vert \rightarrow 0\). The precise formulation of the convergence result is quite technical. Intuitively it could be condensed to checking that the Galerkin projections \(A_s\) are uniformly coercive in s and that the orthogonal projections onto \({\mathscr {V}}_s\) converge strongly to identity in a monotone way (in the Loewner order) and at a guaranteed rate. An example of spaces for which these assumption hold are spaces associated with a hierarchical finite element scheme defined on a quasi-uniform grid. For further details see [20, Section 5.2] where the convergence rates are established for the specific case when \({\mathscr {V}}_s\) are spaces of piecewise linear functions and A is the Laplace operator posed in a polygonal domain. Since both \(X_s \) and X are bounded operators, norm convergence implies that the eigenvalues converge with multiplicity and that the associated spectral projections converge in norm.

For practical computations we typically do not have an access to an orthonormal basis of the space \({\mathscr {V}}_s \). One either has to solve a linear system in order to compute the action of the orthogonal projection \(P_s \) or seek a computationally more efficient and stable alternative. Simply, the problem (3.8) can be rephrased as a generalized Lyapunov equation, which is obtained by dropping the requirement for an access to an orthonormal basis of \({\mathscr {V}}_s \). We pay for this flexibility by the introduction of the Gram matrix \(M_s \) of the chosen (non-orthonormal) basis of \({\mathscr {V}}_s \). The matrix \(M_s \) is called the mass matrix and the equation (3.8) takes the form of the generalized Lyapunov equation

$$\begin{aligned} K_s Y_s M_s +M_s Y_s K_s =-(M_s {\underline{b}})(M_s {\underline{b}})^*~. \end{aligned}$$
(3.9)

Here \(K_s \) and \(M_s \) are the finite element stiffness and the mass matrix and \({\underline{b}}\) is the matrix representation of the load vector b in the chosen finite element basis of \({\mathscr {V}}_s \), see [21, 33]. Let us note that the finite dimensional operator \(X_s \) is represented as the pencil \((M_s Y_s M_s ,M_s )\) and so the generalized eigenvalue of the pencil coincide with the eigenvalues of \(X_s\).

The generalized Lyapunov equation (3.9) can be efficiently solved by a projection iterative method. Our method of choice is the projection onto the extended Krylov subspace generated by A and b. This is implemented in MATLAB as the kpik algorithm of Simoncini [32]. See also [19] for a possibility to solve such operator equations even when the systems are so large that computing the action of \(A^{-1}\) by a sparse direct solver is not feasible.

3.3 Measuring the residual of a low-rank approximation of a Lyapunov equation

Given \(z_i\in {{\mathscr {X}}}\) let \({\hat{Y}}_{r}=\sum _{k=1}^rz_iz_i^*\) be as before. Its Lyapunov equation residual is the sesquilinear form

$$\begin{aligned}&{\mathfrak {r}}(\psi ,\phi )=\sum _{i=1}^r\Big [\big (\psi , A^{1/2}z_i)(A^{-1/2}z_i,\phi )+(A^{-1/2}z_i,\psi )(A^{1/2}z_i, \phi )\Big ]\\&\quad \qquad \qquad -(\psi ,A^{-1/2}b)(A^{-1/2}b,\phi ), \end{aligned}$$

for \(\psi ,\phi \in {\mathscr {H}}\). This form is bounded on \({\mathscr {H}}\), it is of finite rank of at most \(2r+1\) and there exists a unique operator \(R({\hat{Y}}_{r})\) such that

$$\begin{aligned} {\mathfrak {r}}(\psi ,\phi )=(\psi ,R({\hat{Y}}_{r})\phi )~. \end{aligned}$$

Subsequently, the solution \(Y=X-X_r\) of the equation

$$\begin{aligned} \big (A^{1/2}\psi , YA^{-1/2}\phi )+(YA^{-1/2}\psi , A^{1/2}\phi )= {\mathfrak {r}}(\psi ,\phi ) \end{aligned}$$

can be estimated by

$$\begin{aligned} \Vert X-X_r\Vert \le \frac{1}{2}\sup _{\psi \ne 0,\phi \ne 0}\frac{|{\mathfrak {r}}(\psi ,\phi )|}{\Vert \psi \Vert \Vert \phi \Vert }=\frac{1}{2}\Vert R(X_r)\Vert ~. \end{aligned}$$

Furthermore, \(Y=X-X_r\) is contained in every Schatten ideal, since r is of finite rank.

Let us now formulate an approximation result which might serve as means to construct an approximation to the size of the residual.

Proposition 3.2

Let \(z\in {{\mathscr {X}}}\) be given and let the sequence of orthonormal projections \(P_s\) be such that \(P_sz=z\) for all \(s>0\) and let \(P_s\rightarrow I\) strongly. Define

$$\begin{aligned} {\mathfrak {r}}_s(\psi ,\phi )=\big (\psi , A^{1/2}_sz)(A^{-1/2}_sz,\phi )+(A^{-1/2}_sz,\psi )(A^{1/2}_sz, \phi )-(\psi ,A_s^{-1/2}b_s)(A_s^{-1/2}b_s,\phi ) \end{aligned}$$

then \(\Vert R_s(zz^*)-R(zz^*)\Vert \rightarrow 0\) as \(s\rightarrow \infty \). The operator \(R_s(zz^*)\) is a bounded operator representing the form \({\mathfrak {r}}_s\) and \(b_s=P_sb\).

Proof

Given \(x,y\in {\mathscr {H}}\) it follows that

$$\begin{aligned} \Vert xx^*-yy^*\Vert \le 2\max \{\Vert x\Vert ,\Vert y\Vert \}\Vert x-y\Vert \end{aligned}$$
(3.10)

and so the strong convergence of \(x\rightarrow y\) implies the norm convergence of \(xx^*\) to \(yy^*\). Now,

$$\begin{aligned} {\mathfrak {r}}_s(\psi ,\phi )=(\psi , \big [(A^{1/2}_sz)(A^{-1/2}_sz)^*+ (A^{-1/2}_sz)(A^{1/2}_sz)^*-(A_s^{-1/2}{b_s})(A_s^{-1/2}{b_s})^*\big ]\phi ) \end{aligned}$$

and the conclusion follows from the norm resolvent convergence of \(A_s\) to A and the estimate (3.10) for the norm convergence of the rank one operators. \(\square \)

4 p and hp finite element discretization

Fig. 3
figure 3

Geometrically graded mesh. At every reentrant corner the mesh has been graded by applying the replacement rule for \(\ell \) times. Here \(\ell = 8\)

As mentioned above, it is crucial to use high order FEM to exploit the regularity of the solution. In this section we give a general overview of the hp solver at our disposal. We emphasize that the techniques described here can be applied in general curvilinear setting without any modification. However, we state the families of finite element spaces only in the case of domains partitioned into triangles and quadrilaterals.

Let \(\Omega \subset {\mathbb {R}}^2\) be a open, bounded domain, with Lipschitz boundary \(\partial \Omega \), \({\mathscr {H}}\subset H^1(\Omega )\), and let \({\mathscr {T}}=\{T\}\) be a conforming partition of \(\Omega \) into convex (curvilinear) triangles and quadrilaterals, which we call a mesh or triangulation, see Fig. 3. We do not impose any restriction on the number of curved edges. Any curved elements are handled using standard blending function techniques (cf. [34]).

For a given element T and non-negative integer m, we define the local polynomial space \({\mathbb {Q}}_m(T)\) as follows. If T is a triangle, then \({\mathbb {Q}}_m(T)\) consists of the polynomials of total degree \(\le m\), so \(\dim {\mathbb {Q}}_m(T)=(m+2)(m+1)/2\). If T is a quadrilateral, then \({\mathbb {Q}}_m(T)\) consists of polynomials of degree \(\le m\) in each variable, so \(\dim {\mathbb {Q}}_m(T)=(m+1)^2\).

For a given triangulation, \({\mathscr {T}}\), let \({\mathbf {p}}:{\mathscr {T}}\rightarrow {\mathbb {N}}\) be a function that assigns a positive integer to each element \(T\in {\mathscr {T}}\). This map is called a p-vector. We define the corresponding finite element space

$$\begin{aligned} {\mathscr {V}}={\mathscr {V}}({\mathscr {T}},{\mathbf {p}})=\{v\in {{\mathscr {X}}}:\,v_{\vert _{T}}\in {\mathbb {Q}}_{{\mathbf {p}}(T)}(T) \text{ for } \text{ all } T\in {\mathscr {T}}\}~. \end{aligned}$$
(4.1)

We note that \({\mathscr {V}}\subset C({{\overline{\Omega }}})\).

Let \({\mathscr {F}}=\{{\mathscr {T}}_\ell \}\) be a family of nested meshes obtained from successive refinements of an initial coarse mesh, where the index \(\ell \ge 0\) refers to a refinement level. To account for possible singular points at points on the boundary where there are non-convex corners, we apply a geometric grading of element sizes toward singular points that takes into account this a priori knowledge [31, Section 4.5]. Beginning with a coarse mesh \({\mathscr {T}}_0\) in which the vertex graph distance between singular points (i.e., the minimal number of edges in a path connecting these points) is at least two, the mesh grading approach is implemented using element-level replacement rules employing exact geometry description as described in [13]. The element layers are created by nested application of replacement rules on every element touching a singular point. At each step, only the elements touching the singular point created at the previous one are refined making the bookkeeping of the layers simple. This is illustrated in Fig. 3. Notice that these replacement rules need not be the same at different levels, since a rule for a quadrilateral element may result in a triangle touching the singular point.

Given such a family of meshes, we distinguish two families of finite element spaces defined on them. We refer to the first as the p-method family because it uses a fixed polynomial degree for every element in the mesh. For this family, the polynomial degree p is chosen and applied to each element in the pth mesh in the family, \({\mathscr {T}}_p\in {\mathscr {F}}\), i.e. \(\mathbf { p }(T)=p\) for all \(T\in {\mathscr {T}}_p\). We denote the finite element spaces in this family by \({\mathscr {V}}_{1,p}\), and use \(1\le p\le 8\) for our experiments. We note that the spaces are nested, \({\mathscr {V}}_{1,p}\subset {\mathscr {V}}_{1,p+1}\). We refer to the second family as the hp-family because it uses variable polynomial degrees in the mesh. For the second family, given a polynomial degree p, the mesh \({\mathscr {T}}_p\) is chosen as in the first family, but polynomial degrees are no longer assigned uniformly throughout the mesh. All elements touching a singular point are assigned polynomial degree 1, the next layer of elements are assigned polynomial degree 2, and so on, until polynomials of degree p are achieved at the pth layer. Any elements that are greater than p layers away from all singular points are also assigned polynomial degree p. The initial mesh and refinement scheme ensures that there is no ambiguity in how polynomial degrees are assigned to each element. We denote the finite element spaces in this by \({\mathscr {V}}_{2,p}\), using \(1\le p\le 12\) for our experiments. As before, the spaces are nested, \({\mathscr {V}}_{2,p}\subset {\mathscr {V}}_{2,p+1}\), and we also note that \({\mathscr {V}}_{2,p}\subset {\mathscr {V}}_{1,p}\).

For the hp-family it is necessary to construct the local basis functions in such a way that varying the local polynomial order still results in a continuous formulation. We distinguish between three types of polynomial functions on an element: vertex functions, which vanish on all vertices except one; edge functions, which vanish on all edges except one; and element functions (interior bubble functions), which vanish on all edges. On the global (mesh) level, vertex functions are supported in the patch of elements sharing that vertex, edge functions are supported in the (one or two) elements sharing an edge, and element functions are supported in a single element. It is this distinction between the types of polynomial functions that enables one to build elements in which the degrees of the element functions may differ from those of the edge functions, and the degree used on one edge may differ from that used on another. In fact, this is precisely what is done in the hp-family, \({\mathscr {V}}_{2,p}\), to allow for variable p(T). In particular, when T and \(T'\) are adjacent elements whose assigned polynomial degrees differ by one, say \(p(T)=m\) and \(p(T')=m+1\), the polynomial degree of the edge functions associated with their shared edge is taken to be \(m+1\).

5 A greedy auxiliary subspace mesh refinement

In what follows we will assume that we have a sequence of spaces \({\mathscr {V}}_s={\mathscr {V}}_{l_s,p_s}\) such that \({\mathscr {V}}_{s_1}\subset {\mathscr {V}}_{s_2}\) for \(s_1<s_2\) and \(P_s\rightarrow I\) strongly as \(s\rightarrow \infty \). Further, we will assume that A is a divergence type operator with analytic coefficients and posed in a polygonal domain \(\Omega \). We will use the notation X for the solution of the Lyapunov equation (3.2) and \(X_s\) will be the solution of the Lyapunov equation projected onto the subspace \({\mathscr {V}}_s\). The \(X_k\) will denote the operator (3.6), which is the function of the operator A.

In this section we aim to present error indicator which will be shown to be converging to the error estimate in the sense of Proposition 3.2. We will also show what role approximate A analyticity of the eigenvectors of X might play. Also note that Proposition 3.2 gives a computable approximation of the error estimator. Namely the operator \(R_s(Z_r)\) is representable as a short sum of rank one operators and so its maximal singular value can be efficiently computed by a standard iterative procedure. However, we will now take an alternative avenue, but before we do so, let us present an extension to Proposition 3.1.

Theorem 5.1

Let \(r\in {\mathbb {N}}\) be given such that \(\mu _r(X)>\mu _{r+1}(X)\) and let \(b\in {\mathscr {H}}\). Then for \({\mathscr {Z}}=\text {span}\{z_1,\ldots ,z_r\}\subset {\mathscr {V}}\) and \(z_i\), \(i=1,\ldots ,r\) pairwise orthogonal and of norm one, we have, for \(i=1,\ldots , r\), the estimate

$$\begin{aligned} |\mu _i(X)-\rho (z_i,X_k)|\le \frac{C_{St}\Vert b\Vert ^2}{2\lambda _1}\exp (-\pi \sqrt{k})+\frac{\Vert b\Vert ^2(1+C_{St}\exp (-\pi \sqrt{k}))}{2\lambda _1}\Vert {\tilde{u}}_i-z_i\Vert ^2~. \end{aligned}$$

Here \({\tilde{u}}_i\), \(\Vert {\tilde{u}}_i\Vert =1\) are A-analytic eigenvectors such that \(\mu _i(X_k)=({\tilde{u}}_i,X_k{\tilde{u}}_i)\) . Further, we have

$$\begin{aligned}&\Vert \mu _i(X)u_iu_i^*-\rho (z_i,X_k)z_iz_i^*\Vert \le \frac{C_{St}\Vert b\Vert ^2}{2\lambda _1}\exp (-\pi \sqrt{k})\\&\quad +\frac{\Vert b\Vert ^2(1+C_{St}\exp (-\pi \sqrt{k}))}{2\lambda _1}\left[ \Vert {\tilde{u}}_i-z_i\Vert ^2+\Vert {\tilde{u}}_i-z_i\Vert \right] ~. \end{aligned}$$

Proof

The proof follows by combining Proposition 3.1 and estimate (2.2). Recall that \(X_k\) is a positive self-adjoint operator, and then using (2.2) compute

$$\begin{aligned}&|\mu _i(X)-\rho (z_i,X_k)|\le |\mu _i(X)-\mu _i(X_k)|\\&\quad \quad +|\mu _i(X_k)-\rho (z_i,X_k)|\\&\quad \le \frac{C_{St}\Vert b\Vert ^2}{2\lambda _1}\exp (-\pi \sqrt{k})+ \Vert X_k\Vert \Vert {\tilde{u}}_i-z_i\Vert ^2~. \end{aligned}$$

To finish the proof we need to estimate \(\Vert X_k\Vert \) by an a priori bound. To this end, let \(\psi _i\) be an orthonormal system of eigenvectors of the self-adjoint and positive operator A. Then, see also [11, Equation (3.2)],

$$\begin{aligned} \Vert X_k\Vert \le \Vert X\Vert +\Vert X_k-X\Vert \end{aligned}$$

and

$$\begin{aligned} |(\psi _i,X\psi _j)|=\frac{|(\psi _i,b)||(b,\psi _j)|}{\lambda _i+\lambda _j}\le \frac{|(\psi _i,b)||(b,\psi _j)|}{2\lambda _1} \end{aligned}$$

follows. We now compute

$$\begin{aligned} \Vert X\Vert \le \sqrt{{{\,\mathrm{tr}\,}}(X^*X)}\le \frac{1}{2\lambda _1}\Vert b\Vert ^2. \end{aligned}$$

and then conclude

$$\begin{aligned} \Vert X_k\Vert \le \frac{1}{2\lambda _1}\Vert b\Vert ^2(1+C_{St}\exp (-\pi \sqrt{k}))~. \end{aligned}$$

The last inequality of the theorem follows by combining the estimate for \(|\mu _i(X)-\rho (z_i,X_k)|\) and the estimate (3.10). \(\square \)

This theorem indicates that in the case of A being the divergence type operator with analytic coefficients and posed in the polygonal domain the functions \({\tilde{u}}_i\) are going to be infinitely differentiable functions which are also A-analytic. For such functions there exist constants \(C_i>0\) and \(\gamma _i>0\) such that

$$\begin{aligned} \min \{\Vert {\tilde{u}}_i-z\Vert ~:~z\in {\mathscr {V}}_{l_s,p_s}\}\le C_i\exp (-\gamma _i\sqrt{\dim ({\mathscr {V}}_{l_s,p_s})})~. \end{aligned}$$
(5.1)

Subsequently one concludes—using the second inequality of Theorem 5.1r times—that there exists a rank r of approximation X, whose eigenvectors will belong to the space \({\mathscr {V}}_{l_s,p_s}\) and the eigenvalues will approximate the eigenvalues of the operator X—in the sense of Theorem 5.1—with the a priori estimate of the form (5.1).

We now present an analysis of our approximation of the a posteriori error estimator. It is based on the norm convergence of the solutions of the projected Lyapunov equations. Given that the operator A is self-adjoint and that the sequence of subspaces is monotonic we conclude, based on [20, Section 4.1.2 and Section 5], that \(\Vert X-X_s\Vert \rightarrow 0\). Then a simple calculation shows that for a given \(s_1\) and for each \(\varepsilon >0\) exists \(s_2>s_1\) such that

$$\begin{aligned} \Vert X-X_{s_1}\Vert \le (1+\varepsilon )\Vert X_{s_2}-X_{s_1}\Vert . \end{aligned}$$

In the case in which \(X_{s_i}=\sum _{j=1}^r z^{(s_i)}_j(z^{s_i}_j)^*\), \(i=1,2\) we have, using (3.10), the estimate

$$\begin{aligned}&\Vert X-X_{s_1}\Vert \le (1+\varepsilon )\Vert X_{s_2}-X_{s_1}\Vert \nonumber \\&\quad \le (1+\varepsilon )\max \{\Vert z^{(s_i)}_j\Vert ~:~i=1,2,j=1,\ldots ,r\} \sum _{j=1}^r\Vert z^{(s_2)}_j-z^{(s_1)}_j\Vert ~. \end{aligned}$$
(5.2)

In the case of A being a divergence type operator and \({\mathscr {V}}_{s}={\mathscr {V}}_{l_s,p_s}\), we might take \({\mathscr {V}}_{s_1}={\mathscr {V}}_{l_{s_1},p_{s_1}}\) and \({\mathscr {V}}_{s_2}={\mathscr {V}}_{l_{s_1},p_{s_2}}\). That is to keep the refinement level \(l_s\) constant and studying the pure p refinement for improving the auxiliary subspace.

We now argue that instead of the auxiliary subspace adaptively we opt for a fixed refinement by uniformly increasing the polynomial degree by two. This argument will be partly heuristic.

A more refined information on the error can be obtained if one interprets the low-rank approximation task in the context of spectral approximations, as was done in Proposition 3.1. Recall the estimate (2.1), which according to [15, 18]—in the case in which \(\psi \) and u are of norm one—implies

$$\begin{aligned} |\mu -\rho (\psi ,H)|\le \mu _1\Vert \psi -u\Vert ^2~. \end{aligned}$$

Since we typically do not have access to the vector u, we will use the auxiliary subspace technique from [3, 12] in a combination with the saturation assumption to approximate \(\Vert u-\psi \Vert \). Let us demonstrate the auxiliary subspace error estimation for the interpolation error. Let \({\mathscr {V}}\subset {\mathscr {V}}^{aux}\subset {\mathscr {H}}\) be two finite element spaces and let \({\mathscr {H}}\) be an appropriate Sobolev space. Note that in this section \(u^{aux}\) will not be denoting the duality paring for u, but rather that \(u^{aux}\) is an element of \({\mathscr {V}}^{aux}\) for some function u. Then for the finite element approximation \(\psi \in {\mathscr {V}}\) of the solution \(u\in {\mathscr {H}}\) we can define the approximate error function \(\varepsilon \approx u-\psi \) by projecting the error \(u-\psi \) onto the space \({\mathscr {V}}^{aux}_{\ominus }={\mathscr {V}}^{aux}\ominus {\mathscr {V}}\). With this we immediately get the efficiency bound \(\Vert \varepsilon \Vert \le \Vert u-\psi \Vert \) and the reliability bound is obtained as a combination of the strong Cauchy inequality for the subspace \({\mathscr {V}}^{aux}_{\ominus }\) and the saturation assumption for the subspace \({\mathscr {V}}^{aux}\). In the context of the eigenvalue problem, a saturation assumption for analyzing the eigenvalue approximation error has been used in [25]. We will adapt this assumption to the case of a compact operator. Assuming we are interested in the top r eigenvalues of some compact operator H, the saturation assumption holds if there is a constant \(0\le \gamma <1\) such that

$$\begin{aligned} \mu _i(H)-\rho _i({\mathscr {V}}^{aux},H)\le \gamma (\rho _i({\mathscr {V}}^{aux},H)-\rho _i({\mathscr {V}},H)),\qquad i=1,\ldots ,r~. \end{aligned}$$

It was then shown in [25] that this is equivalent to stating

$$\begin{aligned} (\mu _i(H)-\rho _i({\mathscr {V}},H))\le \frac{1}{1-\gamma }(\rho _i({\mathscr {V}}^{aux},H)-\rho _i({\mathscr {V}},H)). \end{aligned}$$
(5.3)

This statement is adapted here from the case of an unbounded operator with a compact resolvent, which was considered in [25].

Remark 5.1

As the spaces \({\mathscr {V}}\) and \({\mathscr {V}}^{aux}\), for practical hp-finite element computations, we choose the given hp-space and the hp-space obtained on the same mesh by increasing the polynomial degree by two.

Proposition 5.1

Let the saturation assumption hold for the eigenvalues of some positive compact operator H. Then

$$\begin{aligned} \mu _i(H)-\rho _i({\mathscr {V}},H)\le \frac{\mu _1(H)}{1-\gamma }\Vert {\hat{u}}^{aux}_{i}-\hat{u }_i\Vert ^2 \end{aligned}$$

where \({\hat{u}}^{aux}_i\), \(\Vert {\hat{u}}^{aux}_i\Vert =1\) is a vector such that \(\rho _i({\mathscr {V}}^{aux},H)=({\hat{u}}^{aux}_i,H{\hat{u}}^{aux}_i)\) and \({\hat{u}}_{i}\), \(\Vert {\hat{u}}_{i}\Vert =1\) is such that it verifies \(\rho _i({\mathscr {V}},H)=({\hat{u}}_i,H{\hat{u}}_i)\).

Proof

The result of Knyazev [37] implies

$$\begin{aligned} \big |\rho _i({\mathscr {V}}^{aux},H)-\rho _i({\mathscr {V}},H)\big |\le & {} \rho _1({\mathscr {V}}^{aux},H)\Vert {\hat{u}}^{aux}_i-{\hat{u}}_i\Vert ^2\nonumber \\\le & {} \mu _1(H)\Vert {\hat{u}}^{aux}_i-{\hat{u}}_i\Vert ^2 \end{aligned}$$
(5.4)

where \({\hat{u}}_{i}\), \(\Vert {\hat{u}}_{i}\Vert =1\) is such that \(\rho _i({\mathscr {V}},H)=({\hat{u}}_i,H{\hat{u}}_i)\). The statement of the theorem follows when one combines (5.4) with (5.3). \(\square \)

Fig. 4
figure 4

The approximate error function. \(\varepsilon _1={\hat{u}}^{aux}_1-{\hat{u}}_1\), with \({\hat{u}}^{aux}_1\) and \({\hat{u}}_1\) at \(p=6\) and \(p=4\), respectively. Elemental \(L^2\)-error of \(\varepsilon _1\) distributed over the mesh, relative scale over [0, 1]

Our greedy strategy for mesh refinement for such operators is to refine those triangles where the restrictions of the functions \(\varepsilon _i={\hat{u}}^{aux}_i-{\hat{u}}_i\), \(i=1,\ldots ,r\) are above a given threshold. One sees on Fig. 4 how \(\varepsilon _1\) picks out the elements near the first two reentrant corners for refinement, but not the second two.

Corollary 5.1

Let \(r\in {\mathbb {N}}\) be given such that \(\mu _r(X)>\mu _{r+1}(X)\), \(b\in {\mathscr {H}}\). Given \(k\in {\mathbb {N}}\), and let \(0\le \gamma <1\) be the saturation constant for \(X_k\). Then for \({\mathscr {Z}}=\text {span}\{z_1,\ldots ,z_r\}\subset {\mathscr {V}}\) and \(z_i\), \(i=1,\ldots ,r\) pairwise orthogonal and of norm one, we have the estimate

$$\begin{aligned} |\mu _i(X)-\rho (z_i,X_k)|\le \frac{C_{St}\Vert b\Vert ^2}{2\lambda _1}\exp (-\pi \sqrt{k})+\frac{\Vert b\Vert ^2(1+C_{St}\exp (-\pi \sqrt{k}))}{2\lambda _1(1-\gamma )} \Vert {\tilde{u}}'_i-z_i\Vert ^2~. \end{aligned}$$

Here \({\tilde{u}}'_i\), \(\Vert {\tilde{u}}'_i\Vert =1\) are such that \(\rho _i({\mathscr {V}}^{aux},X_k)=\rho ({\tilde{u}}'_i,X_k)\) .

Proof

Let \(0\le \gamma <1\) be the saturation constant for \(X_k\), then

$$\begin{aligned} |\mu _i(X)-\rho (z_i,X_k)|\le & {} |\mu _i(X)-\mu _i(X_k)|+|\mu _i(X_k)-\rho (z_i,X_k)|\\\le & {} |\mu _i(X)-\mu _i(X_k)|+\frac{\Vert X_k\Vert }{1-\gamma }\Vert {\tilde{u}}^{k'}_i-z_i\Vert ^2~. \end{aligned}$$

The proof follows using the estimate from Theorem 5.1. \(\square \)

Remark 5.2

Let us first address the assumptions of Proposition 5.1 and Corollary 5.1. Recall that \(X_k\) is the function of the operator A. The eigenvalue saturation assumption with \(0\le \gamma <1\) for operator \(H=X_k\) in the case in which A is a divergence type operator with analytic coefficients can be justified in the same way as it was done [25]. Namely, Neymeyer argues in [25, Section 4] that the saturation assumption with \(\gamma <1\) holds for the Laplace eigenvalue problem and \({\mathscr {V}}\) being chosen as the space of piecewise linear functions and \({\mathscr {V}}^{aux}\) as the space of piecewise quadratic functions. The argument rests on the fact that eigenfunctions of the Laplace operator are harmonic and a harmonic function whose restriction to the open set of positive measure is zero is a zero function. Equivalently, eigenfunctions of the operator \(X_k\) are analytic functions, this follows from Nelson [24] for this particular class of operators, and so the same argument holds. Note that this does not necessarily imply that \(1/(1-\gamma )\) is small, only that it is finite. Further more detailed analysis would be needed to assess the size of \(\gamma \) or to prove that it only depends on the shape regularity of the triangulation and possibly on the polynomial degree. For similar considerations in the context of the eigenfunction approximations of an unbounded operator see [10].

Note that we substitute \({\hat{u}}^{aux}_i\), the i-th eigenvector of \(X_{{\mathscr {V}}_1}\)—the solution of the Lyapunov equation projected onto \({\mathscr {V}}^{aux}\). We could quantify, in principle, the error \(\Vert {\hat{u}}^{aux}_i-{\tilde{u}}'_i\Vert \) by a direct perturbation analysis. The operators \(X_k\) are given by an explicit formula and the action of \(X_k\) onto a vector is in principle computable using contour integration techniques from [4, 5, 9]. This analysis would however be quite technical, would require additional technical apparatus and the estimates will likely be unnecessarily pessimistic. For instance, estimates from [4, 5] are only valid for quasi-uniformly refined meshes and low order finite elements. They also depend on the approximability of the loading vector b, even though the eigenvectors of the operator \(X_k\) are A-analytic for any \(b\in {\mathscr {H}}\). An extension to higher order finite elements is plausible, but would require extended technical work which is beyond the scope of this paper. Even then, the constant \(1/(1-\gamma )\) can potentially be very large and the overall estimate would be quite pessimistic. Instead, we opted to make a heuristic choice and monitor only \(\Vert {\hat{u}}^{aux}_i-{\hat{u}}_i\Vert \). We will report on the numerical experiments where we will compare \(\Vert {\hat{u}}^{aux}_i-{\hat{u}}_i\Vert ^2\) with the error in the i-th eigenvalue. Note also that Proposition 3.2 gives further justification, or an alternative interpretation, for estimating the residual from an auxiliary subspace. We can compute the norm \(\Vert R_{s_2}(\sum _{i=1}^r{\hat{u}}_i{\hat{u}}_i^*)\Vert \) by a data sparse SVD computation. Instead of going down this avenue, we can observe that \(\sum _{i=1}^r\Vert {\hat{u}}^{aux}_i-{\hat{u}}_i\Vert \) is an estimator of this norm and this is the approach which we take (Fig. 5).

Fig. 5
figure 5

Convergence of \(\Vert {\hat{u}}^{aux}_{1}-{\hat{u}}_1\Vert ^2\) (solid line) and \(|\mu _1(X)-\mu _1(X_r)|\) (dashed line), as \(p=1,\dots , 6\), on a graded mesh with constant p at \(\ell = 4\)

6 Numerical experiments

We will now present an example with a loading b which is not a separable function. Bearing in mind the a priori error estimate (5.1), we will be presenting the convergence plots in which the logarithm of the error will be on the y axis and \(\sqrt{N}\), \(N=\dim {\mathscr {V}}_{l,p}\) on the x axis. A straight line plot in this coordinate system implies the convergence of the order \(O(\exp (-\gamma \sqrt{N}))\), for some \(\gamma >0\). We will see that we observe such convergence—if with a smaller \(\gamma \)—even for the domain with reentrant corners and a less regular b.

Example 6.1

(Dumbbell B) Let us consider a classical Laplace dumbbell problem with computational domain \(\Omega = ([0,2.4]\times [0,1]){\setminus } (([1,1.4]\times [0,0.3]) \cup ([1,1.4]\times [0.7,1]))\). We chose the operator A in the Lyapunov equation

$$\begin{aligned} AX+XA^*=-bb^* \end{aligned}$$

to be the Laplace operator with the zero Dirichlet boundary conditions. The function \(b(x_1,x_2)\) is taken to be the indicator function:

$$\begin{aligned} b(x_1,x_2) = {\text {ind}}\{|x_1-1/2|+|x_2-0.45| \le 1/4\}. \end{aligned}$$

The solution X is remarkably of numerical rank \(= 2\). See Fig. 6 for illustrations of the function b, the first column of X, and the two dominant eigenmodes of X.

In our numerical experiments we quantify more precisely the performance of the p- and hp-discretizations of Examples 2.2 (Dumbbell A) and 6.1 (Dumbbell B). In both examples the configuration is exactly the same except for the loading which is smooth in Example 2.2 and discontinuous and not symmetric with respect to the domain in Example 6.1. The two quantities of interest are the sum of eigenvalues of the solution X and the rank of X.

The mesh grading strategy described in Sect. 4 is used in two different sets of experiments for both examples. First, we consider levels \(\ell =0,\dots ,8\), and for every \(\ell \) we compute the solutions for all (constant) \(p=1,\dots ,8\). Here \(\ell =0\) means that the background mesh is used without any refinements, this is sometimes referred to as the “pure p-version”-approach. Second, we compute a proper hp-sequence, where as \(\ell =1,\dots ,11\), we compute the p-vector \({\mathbf {p}}\) using maximal \(p = \ell + 1\). The final solution of the hp-sequence is taken as the reference solution.

For every individual experiment the quantities of interest have been computed.

6.1 Convergence in eigenvalues

The observed convergence in the relative error in the sum of eigenvalues \(\sum _i \lambda _i\) is illustrated in Figs. 7 and 8. In both cases an overall picture over the set of experiments is given with a detail plot indicating the region where the loss of convergence rate is observed. As expected, the hp -discretization is the most efficient one in both cases. Interestingly, the effect of the singularities is evident especially if one focuses on the levels \(\ell =0\) and \(\ell =2\), where it is evident that if the geometric grading is not taken to sufficiently high level, there is a loss of convergence rate. On the other hand, for the p-discretization, there appears to be an optimal level (here \(\ell = 4\)) beyond which the observed rate does not increase, yet the constant does. This is the reason why for the Dumbbell B the \(p=6,8\) graphs have been omitted.

Fig. 6
figure 6

A dumbbell domain with an indicator function loading. The columns of X can be plotted and the first one is illustrated. In this case the operator has numerical rank = 2, and the corresponding modes are shown

6.2 Asymptotic behavior of the numerical rank

More unusual error measure is the observed numerical rank of the solution. This is meant in the sense of the Proposition 3.1 with \(\tau =10^{-8}\). At first one could suspect that the results of Tables 1 and 2 are a simple consequence of keeping the tolerances of the kpik -algorithm constant even as the dimensions of the cases increase. However, by comparing the observed ranks with the numbers of degrees of freedom in Figs. 7 and 8 it is clear that this connection does not explain these results. The connection between the levels \(\ell \,\) and the polynomial order p indicates that the key here is the accurate capturing of the effects due to the singular points. Nicely tying with the discussion above, again the \(\ell = 4\) appears to be the level where the singularities are first captured.

Fig. 7
figure 7

Dumbbell A: Effect of the mesh grading. Relative error in the sum of eigenvalues versus the number of DOFs. For every indicated level \(\ell \) the error is computed for a constant \(p=1,\dots ,8\). The reference solution (solid black line) is the hp-sequence with the proper p-vector \({\mathbf {p}}\) for levels \(\ell =1,\dots ,11\)

Fig. 8
figure 8

Dumbbell B: Effect of the mesh grading. Relative error in the sum of eigenvalues versus the number of DOFs. For every indicated level \(\ell \) the error is computed for a constant \(p=1,\dots ,8\). The reference solution (solid black line) is the hp-sequence with the proper p-vector \({\mathbf {p}}\) for levels \(\ell =1,\dots ,11\)

It is clear that for the discontinuous loading (Dumbbell B) the observed ranks are slightly higher than those for Dumbbell A with smooth loading.

Table 1 Dumbbell A: Observed ranks
Table 2 Dumbbell B: Observed ranks

For a qualitative view of the reduction in rank, in both Figs. 9 and 10 two sets of eigenmodes from the hp-sequence corresponding to \(\ell = 3\) and 4 are shown. Considering the 3D-plots of Fig. 2 for Dumbbell A, one can see that the superfluous modes (Fig. 9c, d) have features that are ultimately subsumed to the second mode. The explanation here is that the singularities pollute the solution and this results in modes with very small eigenvalues. For the Dumbbell B the situation is even more interesting. In fact, the both final two modes have ghost modes at \(\ell = 3\). Here the eigenvalues are

$$\begin{aligned} \lambda _{\ell =3}^B = \{1.4\times 10^{-3},1.1\times 10^{-4},6.7\times 10^{-6},1.2\times 10^{-6},3. 3\times 10^{-7},8.5\times 10^{-8}\}, \end{aligned}$$

and

$$\begin{aligned} \lambda _{\ell =4}^B = \{1.5\times 10^{-3},2.1\times 10^{-5}\}, \end{aligned}$$

indicating the this “summation of modes” is also reflected in the eigenvalues as well.

Fig. 9
figure 9

Dumbbell A: Decreasing rank: Qualitative view via plots from the hp-sequence. As indicated in Table 1b, the numerical rank changes from 4 to 2 as the level \(\ell \) changes from 3 to 4. As the corner singularities are better captured the two superfluous modes are subsumed into the second mode

Fig. 10
figure 10

Dumbbell B: Decreasing rank: Qualitative view via plots from the hp-sequence. As indicated in Table 2b, the numerical rank changes from 4 to 2 as the level \(\ell \) changes from 3 to 4. As the corner singularities are better captured the first two modes and the following four at \(\ell = 3\) are subsumed into the first and second mode at \(\ell = 4\), respectively

7 Conclusions

In this paper we have presented the approximate, controlled by a threshold on the numerical rank, regularity structure of the solution operator to the operator Lyapunov equation. We point out the following consequence of the use of the threshold parameter to define the numerical rank—the dominant eigenvalues of the solution X can be seen to form the cluster. We are not computing the average of this cluster as is done in he work of Osborn [28]. We instead compute and track the asymptotic behavior of the sum of the clustered eigenvalues, as is done more generally in majorisation estimates from [17]. In this sense our high order greedy adaptivity strategy constructs a subspace which captures the trace better (not the multiplicity) than a standard approximation which is oblivious of the regularity of the eigenfunctions associated to the cluster of eigenvalues forming the trace. The subspace which we constructed is much smaller than the one which the Theorem 5.1 assumed. Yet, even with such a crude approach we still had exponential (but slower than optimal) convergence in the number of the finite element degrees of freedom due to the robust regularity structure of the approximate operators \(X_k\). The future work will be focused on more singular right hand sides (like boundary forcing) and in tighter analysis of the regularity structure of the eigenfunctions of the solution operator X.