1 Introduction and Set-up of the Problem

The question of ‘Krylov solvability’ of an inverse linear problem is an operator-theoretic question, with deep-rooted implications in numerics and scientific computing among others, that in fairly abstract terms is formulated as follows.

A linear operator A acting on a real or complex Hilbert space \({\mathcal {H}}\), and a vector \(g\in {\mathcal {H}}\) are given such that A is closed and densely or everywhere defined on \({\mathcal {H}}\), and g is an A-smooth vector in the range of A, i.e.,

$$\begin{aligned} g \in \mathrm {ran} A\cap C^\infty (A) \end{aligned}$$
(1.1)

where \(C^\infty (A)\) is the space of elements of \({\mathcal {H}}\) simultaneously belonging to all the domains of the natural powers of A,

$$\begin{aligned} C^\infty (A) := \bigcap _{k\in {\mathbb {N}}}{\mathcal {D}}(A^k). \end{aligned}$$
(1.2)

Clearly A-smoothness is an automatic condition if A is bounded. Associated with A and g one has the ‘Krylov subspace

$$\begin{aligned} {\mathcal {K}}(A,g) := \mathrm {span} \{ A^kg\,|\,k\in {\mathbb {N}}_0\}, \end{aligned}$$
(1.3)

as well as the inverse linear problem induced by A with datum g, namely the problem of finding solution(s) \(f\in {\mathcal {D}}(A)\) such that

$$\begin{aligned} Af = g. \end{aligned}$$
(1.4)

The problem (1.4) is said to be ‘Krylov-solvable’ if for some solution f one has

$$\begin{aligned} f \in \overline{{\mathcal {K}}(A,g)}, \end{aligned}$$
(1.5)

in which case f is also referred to as a ‘Krylov solution’.

In short, Krylov solvability for the problem (1.4) is the possibility of having solution(s) f for which there are approximants, in the Hilbert norm, consisting of finite linear combinations of vectors \(g,Ag,A^2g,A^3g,\dots \).

This explains the deep conceptual relevance of Krylov solvability in scientific computing: knowing a priori whether or not an inverse problem is Krylov-solvable allows one to decide whether to treat the problem numerically by means of one of the vast class of so-called Krylov-subspace methods [19, 26], searching for approximants to the exact solution(s) over the span of explicit trial vectors \(g,Ag,A^2g,A^3g,\dots \).

In fact, Krylov subspace methods are efficient numerical schemes for finite-dimensional inverse linear problems, even counted among the ‘Top 10 Algorithms’ of the 20th century [7, 10], a framework that is by now classical and deeply understood (see, e.g., the monographs [19, 26] or also [25]), and they are naturally exported to the infinite-dimensional case (\(\dim {\mathcal {H}}=\infty \)), although the latter is less systematically studied and is better understood through special sub-classes of interest [5, 6, 8, 15, 17, 18, 21, 29]. Of course we refer here to the circumstance when (1.4) is genuinely infinite-dimensional, meaning that not only \(\dim {\mathcal {H}}=\infty \), but also (see, e.g., [27, Sect. 1.4]) that A is not reduced to \(A=A_1\oplus A_2\) by an orthogonal direct sum decomposition \({\mathcal {H}}={\mathcal {H}}_1\oplus {\mathcal {H}}_2\) with \(\dim {\mathcal {H}}_1<\infty \), \(\dim {\mathcal {H}}_2=\infty \), and \(A_2={\mathbb {O}}\) (for otherwise the effective problem would deal with a finite matrix \(A_1\)).

Clearly, Krylov solvability is non-trivial whenever \({\mathcal {K}}(A,g)\) admits a non-trivial orthogonal complement in \({\mathcal {H}}\).

Thus, for example, for the (everywhere defined and bounded) multiplication operator \(M:L^2[1,2]\rightarrow L^2[1,2]\), \(\psi \mapsto x\psi \), and for the function \(g=\mathbf {1}\) (the constant function with value 1), \({\mathcal {K}}(M,\mathbf {1})\) is the space of polynomials on [1, 2], hence it is dense in \(L^2[1,2]\): the solution to \(Mf=\mathbf {1}\), which is explicitly \(f(x)=x^{-1}\), obviously belongs to \(\overline{{\mathcal {K}}(M,\mathbf {1})}\). On the other hand, for the (everywhere defined and bounded) right-shift operator R on \(\ell ^2({\mathbb {Z}})\) defined with respect to the canonical orthonormal basis \((e_n)_{n\in {\mathbb {Z}}}\) by \(e_n\mapsto e_{n+1}\), and for the vector \(g=e_{1}\), one has \(\overline{{\mathcal {K}}(R,e_{1})}=\mathrm {span}\{e_0,e_{-1},e_{-2},\dots \}^\perp \): the problem \(Rf=e_1\) is solved by \(f=e_0\) which does not belong to \(\overline{{\mathcal {K}}(R,e_{1})}\).

If the operator A is non-injective and hence the inverse problem (1.4) admits a multiplicity of solutions, it may also well happen that some are Krylov solutions and others are not (Example 1(iv) below).

These considerations suggest that the general issue of Krylov solvability can be posed from various specific perspectives, such as:

  • Given explicit A and g, to decide about the existence or the uniqueness of a Krylov solution to the inverse problem (1.4);

  • To identify classes of operators A that induce Krylov-solvable problems (1.4) irrespective of choice of the datum g [as long as g satisfies the basic condition (1.1)];

  • To qualify ‘intrinsic’ mechanisms of Krylov solvability through general conditions on A and g.

At present the above conceptual programme appears to be only partially developed.

Given the iterative nature of Krylov subspace methods, the largest part of the related literature is mainly concerned with the fundamental issue of convergence (an ample overview of which can be found, for instance, in the monographs [11, 23, 26]) of the Krylov approximants to a solution f. It is clear, however, as shown by the above simple example on the inverse problem \(Rf=e_1\), that the question of Krylov solvability is equally fundamental to decide when to attack an inverse problem by means of computational methods that make use of Krylov approximants.

Motivated by the implications in numerical analysis as well as by abstract operator-theoretical interest, in the recent work [6] in collaboration with P. Novati we discussed the question of Krylov solvability when in the inverse problem (1.4) \(\dim {\mathcal {H}}=\infty \) and A is bounded, with a special focus on normal and, in particular, on self-adjoint operators. Certain operator-theoretic mechanisms were identified (that we called ‘Krylov reducibility’, and ‘triviality of the Krylov intersection’, among others) which account for how, at a ‘structural’ level, Krylov solvability occurs and when the Krylov solution is unique. Section 2 of the present work reviews those findings that are relevant for the subsequent discussion.

Along a parallel route, in [5] we studied the convergence of a popular Krylov subspace algorithm for the inverse problem (1.4), the so-called method of conjugate gradient, in the generalised setting when A is unbounded. In view of the conceptual programme above, [5] can be regarded as a first step to study the Krylov solvability of (1.4) in the unbounded case—and Sect. 3 here accounts for that perspective – however with the two-fold limitation that A has to be self-adjoint and non-negative (as required in conjugate gradient methods), and that Krylov solvability emerges only as a by-product result with no explicit insight on the operator-theoretic mechanism for it.

The present work aims at pushing our programme further by discussing Krylov solvability for a fairly general class of unbounded As and with a focus on the same structural mechanisms previously identified in the bounded case.

In Sect. 4 we present our first result that answers (in the affirmative) the question of Krylov solvability when A is generically (unbounded and) self-adjoint or skew-adjoint.

Then in Sect. 5 we proceed on to the general case when A is densely defined and closed on \({\mathcal {H}}\). Here we identify new obstructions in the issue of Krylov solvability, which are not present in the bounded case. A most serious one is the somewhat counterintuitive phenomenon of ‘Krylov escape’, namely the possibility that vectors of \(\overline{{\mathcal {K}}(A,g)}\) that also belong to the domain of A are mapped by A outside of \(\overline{{\mathcal {K}}(A,g)}\), whereas obviously \(A{\mathcal {K}}(A,g)\subset {\mathcal {K}}(A,g)\). From a perspective that in fact we are not carrying over here, one might observe that the possibility of Krylov escape adds further complication to the unbounded operator counterpart of the celebrated invariant subspace problem [30], at least when \(g\ne 0\) and g is not a cyclic vector for A, hence \(\overline{{\mathcal {K}}(A,g)}\) is a proper closed subspace of \({\mathcal {H}}\).

In Sect. 5 we also determine that if the closures of \({\mathcal {K}}(A,g)\) in the Hilbert space norm and in the stronger A-graph norm are the same (up to intersection with \({\mathcal {D}}(A)\)), an occurrence that we named ‘Krylov-core condition’, then Krylov escape is actually prevented.

This leads us to Sect. 6, where we generalise to the case of densely defined and closed As the previously known picture of Krylov solvability when A was bounded. In particular, we demonstrate that under assumptions like the Krylov-core condition (and, more generally, lack of Krylov escape) the intrinsic mechanisms of Krylov reducibility and triviality of the Krylov intersection play a completely analogous role as compared to the bounded case.

Last, in Sect. 7 we re-consider the (unbounded) self-adjoint scenario, that from the practical point of view is already solved in Sect. 4, investigating Krylov solvability from the perspective of the abstract operator-theoretic mechanisms mentioned above. Noticeably, this is also a perspective that rises up interesting open questions. Indeed, whereas we can prove that self-adjoint operators do satisfy the Krylov-core condition and are Krylov-reducible for a distinguished dense set of A-smooth vectors g’s, and that for the same choice of g the subspace \(\overline{{\mathcal {K}}(A,g)}\) is naturally isomorphic to \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\) (here \(\mu _g^{(A)}\) is the scalar spectral measure), we cannot decide whether Krylov escape is prevented for any self-adjoint A and A-smooth g (which is remarkable, as by other means we know that \(Af=g\) is Krylov solvable at least for a dense set of smooth vectors g). This certainly indicates a future direction of investigation.

General notation Besides further notation that will be declared in due time, we shall keep the following convention. \({\mathcal {H}}\) denotes a complex Hilbert space with norm \(\Vert \cdot \Vert _{{\mathcal {H}}}\) and scalar product \(\langle \cdot ,\cdot \rangle \), anti-linear in the first entry and linear in the second. Bounded operators on \({\mathcal {H}}\) shall be tacitly understood as linear and everywhere defined: they naturally form a space, denoted with \({\mathcal {B}}({\mathcal {H}})\), with Banach norm \(\Vert \cdot \Vert _{\mathrm {op}}\), the customary operator norm. \({\mathbb {1}}\) and \({\mathbb {O}}\) denote, respectively, the identity and the zero operator, meant as finite matrices or infinite-dimensional operators depending on the context. An upper bar denotes the complex conjugate \(\overline{z}\) when \(z\in {\mathbb {C}}\), and the norm closure \(\overline{{\mathcal {V}}}\) of the span of the vectors in \({\mathcal {V}}\) when \({\mathcal {V}}\) is a subset of \({\mathcal {H}}\). For \(\psi ,\varphi \in {\mathcal {H}}\), by \(|\psi \rangle \langle \psi |\) and \(|\psi \rangle \langle \varphi |\) we shall denote the \({\mathcal {H}}\rightarrow {\mathcal {H}}\) rank-one maps acting respectively as \(f\mapsto \langle \psi , f\rangle \,\psi \) and \(f\mapsto \langle \varphi , f\rangle \,\psi \) on generic \(f\in {\mathcal {H}}\). For identities such as \(\psi (x)=\varphi (x)\) in \(L^2\)-spaces, the standard ‘for almost every x’ declaration will be tacitly understood.

2 The Bounded Case

Krylov solvability of the inverse problem (1.4) when \(\dim {\mathcal {H}}=\infty \), when A is (everywhere defined, and) bounded and \(g\in \mathrm {ran} A\), appears to manifest or to fail to hold in a variety of situations.

Example 1

  1. (i)

    The multiplication operator \(M_z:L^2(\Omega )\rightarrow L^2(\Omega )\), \(f\mapsto zf\), where \(\Omega \subset {\mathbb {C}}\) is a bounded open subset separated from the origin, say, \(\Omega =\{z\in {\mathbb {C}}\,|\,|z-2|<1\}\), is a normal bounded bijection on \(L^2(\Omega )\), and the solution to \(M_zf=g\) for given \(g\in L^2(\Omega )\) is the function \(f(z)=z^{-1}g(z)\). Moreover, \({\mathcal {K}}(M_z,g)=\{p\,g\,|\,p\text { a polynomial in z on }\Omega \}\). One can see that \(f\in \overline{{\mathcal {K}}(M_z,g)}\) and hence the problem \(M_zf=g\) is Krylov-solvable. Indeed, \(\Omega \ni z\mapsto z^{-1}\) is holomorphic and hence is realised by a uniformly convergent power series (e.g., the Taylor expansion of \(z^{-1}\) around \(z=2\)). If \((p_n)_n\) is such a sequence of polynomial approximants, then

    $$\begin{aligned} \Vert f-p_n g\Vert _{L^2(\Omega )}&= \Vert (z^{-1}-p_n)g\Vert _{L^2(\Omega )} \\&\leqslant \Vert z^{-1}-p_n\Vert _{L^\infty (\Omega )}\Vert g\Vert _{L^2(\Omega )} \;\xrightarrow []{n\rightarrow \infty }\;0. \end{aligned}$$
  2. (ii)

    The left-shift operator L on \(\ell ^2({\mathbb {N}}_0)\), defined as usual on the canonical basis \((e_n)_{n\in {\mathbb {N}}_0}\) by \(Le_{n+1}=e_n\), is bounded, not injective, and with range \(\mathrm {ran}L=\ell ^2({\mathbb {N}}_0)\). A solution to \(Lf=g\) with \(g:=\sum _{n\in {\mathbb {N}}_0}\frac{1}{n!}e_n\) is \(f=\sum _{n\in {\mathbb {N}}_0}\frac{1}{n!}e_{n+1}\). Moreover, \({\mathcal {K}}(L,g)\) is dense in \(\ell ^2({\mathbb {N}}_0)\) and therefore f is a Krylov solution. To see the density of \({\mathcal {K}}(L,g)\): the vector \(e_0\) belongs to \(\overline{{\mathcal {K}}(L,g)}\) because

    $$\begin{aligned} \Vert k!\, L^k g-e_0\Vert _{\ell ^2}^2&= \Vert \textstyle (1,\frac{1}{k+1}, \frac{1}{(k+2)(k+1)},\cdots )-(1,0,0,\dots )\Vert _{\ell ^2}^2 \\&= \sum _{n=1}^\infty \Big (\frac{k!}{(n+k)!}\Big )^2\;\xrightarrow []{\;k\rightarrow \infty \;}\;0. \end{aligned}$$

    As a consequence, \( (0,\textstyle \frac{1}{k!},\frac{1}{(k+1)!}, \frac{1}{(k+2)!},\cdots )=L^{k-1}g-(k-1)!\,e_0\in \overline{{\mathcal {K}}(L,g)}\), therefore the vector \(e_1\) too belongs to \(\overline{{\mathcal {K}}(L,g)}\), because

    $$\begin{aligned} \Vert k!\,(L^{k-1}g-(k-1)!\,e_0)-e_1\Vert _{\ell ^2}^2 = \sum _{n=1}^\infty \Big (\frac{k!}{(n+k)!}\Big )^2\;\xrightarrow []{\;k\rightarrow \infty \;}\;0. \end{aligned}$$

    Repeating inductively the above two-step argument proves that any \(e_n\in \overline{{\mathcal {K}}(L,g)}\), whence the cyclicity of g.

  3. (iii)

    The right-shift operator on \(\ell ^2({\mathbb {Z}})\),

    $$\begin{aligned} {\mathcal {R}} = \sum _{n\in {\mathbb {Z}}}|e_{n+1}\rangle \langle e_n|, \end{aligned}$$
    (2.1)

    is a normal, bounded bijection, and the solution to \({\mathcal {R}}f = e_2\) is \(f=e_1\). However, f is not a Krylov solution, for \(\overline{{\mathcal {K}}({\mathcal {R}},e_2)}=\overline{\mathrm {span}\{e_2,e_3,\dots \}}\). The problem \({\mathcal {R}}f=e_2\) is not Krylov-solvable.

  4. (iv)

    Let A be a bounded injective operator on a Hilbert space \({\mathcal {H}}\) with cyclic vector \(g\in \mathrm {ran}A\) and let \(\varphi _0\in {\mathcal {H}}{\setminus }\{0\}\). Let \(f\in {\mathcal {H}}\) be the solution to \(Af=g\). The operator \(\widetilde{A}:=A\oplus |\varphi _0\rangle \langle \varphi _0|\) on \(\widetilde{{\mathcal {H}}}:={\mathcal {H}}\oplus {\mathcal {H}}\) is bounded. One solution to \(\widetilde{A}\widetilde{f}=\widetilde{g}:=g\oplus 0\) is \(\widetilde{f}=f\oplus 0\) and \(\widetilde{f}\in {\mathcal {H}}\oplus \{0\}=\overline{{\mathcal {K}}(\widetilde{A},\widetilde{g})}\). Another solution is \(\widetilde{f}_\xi =f\oplus \xi \), where \(\xi \in {\mathcal {H}}{\setminus }\{0\}\) and \(\xi \perp \varphi _0\). Clearly, \(\widetilde{f}_\xi \notin \overline{{\mathcal {K}}(\widetilde{A},\widetilde{g})}\).

  5. (v)

    If V is the Volterra operator on \(L^2[0, 1]\) and \(g(x) = \frac{1}{2}x^2\), then \(f(x) = x\) is the unique solution to \(Vf=g\). On the other hand, \({\mathcal {K}}(V,g)\) is spanned by the monomials \(x^2,x^3,x^4,\dots \), whence

    $$\begin{aligned} {\mathcal {K}}(V,g) = \{x^2p(x)\,|\, p\text { is a polynomial on } [0,1]\}. \end{aligned}$$

    Therefore \(f\notin {\mathcal {K}}(V,g)\), because \(f(x)=x^2\cdot \frac{1}{x}\) and \(\frac{1}{x}\notin L^2[0,1]\). Yet, \(f\in \overline{{\mathcal {K}}(V,g)}\), because in fact \({\mathcal {K}}(V,g)\) is dense in \(L^2[0,1]\). Indeed, if \(h\in {\mathcal {K}}(V,g)^\perp \), then \(0=\int _0^1\overline{h(x)}\,x^2p(x)\,\mathrm{d}x\) for any polynomial p; the \(L^2\)-density of polynomials on [0, 1] implies necessarily that \(x^2h=0\), whence also \(h=0\); this proves that \({\mathcal {K}}(V,g)^\perp =\{0\}\) and hence \(\overline{{\mathcal {K}}(V,g)}=L^2[0,1]\).

Example 1(iii) shows, in particular, that even apparently stringent assumptions on A such as the simultaneous occurrence of normality, injectivity, and bounded everywhere defined inverse do not ensure, in general, that the solution f to \(Af=g\), for given \(g\in {\mathcal {H}}\), is a Krylov solution.

A partial yet fairly informative comprehension of the general bounded scenario was recently reached in the work [6], where it was shown that Krylov solvability is intrinsically related with certain operator-theoretic mechanisms that we briefly review here.

Definition 2.1

For a given Hilbert space \({\mathcal {H}}\) let \(A\in {\mathcal {B}}({\mathcal {H}})\) and \(g\in \mathrm {ran} A\).

  1. (i)

    The orthogonal decomposition

    $$\begin{aligned} {\mathcal {H}}= \overline{{\mathcal {K}}(A,g)}\,\oplus \,{\mathcal {K}}(A,g)^\perp \end{aligned}$$
    (2.2)

    is called the Krylov decomposition of \({\mathcal {H}}\) relative to A and g.

  2. (ii)

    An operator \(T\in {\mathcal {B}}({\mathcal {H}})\) is said to be \({\mathcal {K}}(A,g)\)-reduced when both \(\overline{{\mathcal {K}}(A,g)}\) and \({\mathcal {K}}(A,g)^\perp \) are invariant under T. Such a requirement, since T is bounded, is equivalent to \(T=T|_{\overline{{\mathcal {K}}(A,g)}}\oplus T|_{{\mathcal {K}}(A,g)^\perp }\), namely T is reduced with respect to the Krylov decomposition (2.2) [27, Prop. 1.15].

  3. (iii)

    The subspace

    $$\begin{aligned} {\mathcal {I}}(A,g) := \overline{{\mathcal {K}}(A,g)} \,\cap \, (A\,{\mathcal {K}}(A,g)^\perp ) \end{aligned}$$
    (2.3)

    is called the Krylov intersection for the given A and g.

Krylov reducibility is inspired by the straightforward observation that

$$\begin{aligned} A\,{\mathcal {K}}(A,g)\,\subset \,{\mathcal {K}}(A,g) ,\qquad A^*\,{\mathcal {K}}(A,g)^\perp \,\subset \,{\mathcal {K}}(A,g)^\perp , \end{aligned}$$
(2.4)

whence also

$$\begin{aligned} A\,\overline{{\mathcal {K}}(A,g)}\,\subset \,\overline{{\mathcal {K}}(A,g)}. \end{aligned}$$
(2.5)

Thus, \(\overline{{\mathcal {K}}(A,g)}\) is always A-invariant, and when so too is \({\mathcal {K}}(A,g)^\perp \) one says that A is Krylov-reducible. Evidently, any bounded self-adjoint operator A is \({\mathcal {K}}(A,g)\)-reduced and it is easy to construct non-self-adjoint A’s that are \({\mathcal {K}}(A,g)\)-reduced as well [6, Example 2.3].

It is also clear that if A is \({\mathcal {K}}(A,g)\)-reduced, then in particular the Krylov intersection is trivial: \({\mathcal {I}}(A,g)=\{0\}\). That the converse is not true in general is easily seen already at the finite-dimensional level (with obvious infinite-dimensional generalisation) in [6, Example 2.5].

A significant property is the following.

Proposition 2.2

([6, Prop. 3.4].) For a given Hilbert space \({\mathcal {H}}\) let \(A\in {\mathcal {B}}({\mathcal {H}})\) be injective and \(g\in \mathrm {ran} A\). Let \(f\in {\mathcal {H}}\) satisfy \(Af=g\).

  1. (i)

    If \({\mathcal {I}}(A,g)=\{0\}\), then \(f\in \overline{{\mathcal {K}}(A,g)}\).

  2. (ii)

    Assume further that A is invertible with everywhere defined, bounded inverse on \({\mathcal {H}}\). Then \(f\in \overline{{\mathcal {K}}(A,g)}\) if and only if \({\mathcal {I}}(A,g)=\{0\}\).

\({\mathcal {K}}(A,g)\)-reducibility of A is a special case of triviality of \({\mathcal {I}}(A,g)\), and is therefore sufficient to ensure the Krylov solvability for \(Af=g\). This is the case for any self-adjoint A, as already observed. More generally:

Proposition 2.3

([6, Prop. 2.4].) For a given Hilbert space \({\mathcal {H}}\) let \(A\in {\mathcal {B}}({\mathcal {H}})\) and \(g\in \mathrm {ran} A\). Assume further that A is normal. Then A is \({\mathcal {K}}(A,g)\)-reduced if and only if \(A^*g\in \overline{{\mathcal {K}}(A,g)}\), in which case the associated inverse problem (1.4) is Krylov-solvable.

On the other hand, there are also inverse problems that are Krylov-solvable because they have a trivial Krylov intersection, without being Krylov-reduced. An obvious example is the problem [6, Example 2.5]. Even though the operator in [6, Example 2.5] is not normal, one can find analogous examples also in the relevant class of bounded, injective, normal operators [6, Example 3.8].

This discussion gives a strong evidence that the triviality of the Krylov intersection is the correct mechanism that captures the emergence of Krylov solvability.

In fact, the triviality of the Krylov intersection ensures also the existence of a Krylov solution.

Proposition 2.4

[6, Prop. 3.4 and 3.9]. For a given Hilbert space \({\mathcal {H}}\) let \(A\in {\mathcal {B}}({\mathcal {H}})\) and \(g\in \mathrm {ran} A\). If \({\mathcal {I}}(A,g)=\{0\}\), then there exists \(f\in \overline{{\mathcal {K}}(A,g)}\) such that \(Af=g\).

In turn, whereas not all bounded normal inverse problems are Krylov solvable, as seen above, normality ensures that the Krylov solution, if existing, is unique.

Proposition 2.5

[6, Prop. 3.10]. For a given Hilbert space \({\mathcal {H}}\) let \(A\in {\mathcal {B}}({\mathcal {H}})\) and \(g\in \mathrm {ran} A\). If A is normal, then there exists at most one \(f\in \overline{{\mathcal {K}}(A,g)}\) such that \(Af=g\). More generally, the same conclusion holds if A is bounded with \(\ker A\subset \ker A^*\).

Corollary 2.6

If \(A\in {\mathcal {B}}({\mathcal {H}})\) is self-adjoint, then the inverse problem \(Af=g\) with \(g\in \mathrm {ran}A\) admits a unique Krylov solution.

3 The Positive Self-adjoint Case: Conjugate Gradients

While in Sect. 2 we surveyed our current knowledge of Krylov solvability for inverse problems induced by bounded operators, let us now enter the scenario that is the object of the present work, namely Krylov solvability for the problem (1.4) when the operator A is possibly unbounded.

Prior to discussing the unbounded case in fairly wide generality (Sect. 5), we find it instructive to analyse, in this and the following section, a distinguished class of unbounded inverse problems that are relevant in applications, the self-adjoint inverse problems.

Of course, Corollary 2.6 already provides a complete (and affirmative) answer on the question of Krylov solvability when A is bounded and self-adjoint. Therefore, although the discussion of this and the following section covers also the bounded case as well, the perspective is actually on the unbounded case.

In this section, in particular, we consider the inverse problem (1.4) when A is a (possibly unbounded) self-adjoint and non-negative operator: \(A=A^*\geqslant {\mathbb {O}}\).

This is in fact an extremely relevant case in applications, for it is the setting of the ample class of popular Krylov-subspace-based algorithms for the numerical solution to (1.4) collectively known as the ‘method of conjugate gradients’ (also referred to as CG). CG was first proposed in 1952 by Hestenes and Stiefel [16] and since then, together with its related derivatives (e.g., conjugate gradients method on the normal equations (CGNE), least-square QR method (LSQR), etc.), it has been widely studied in the finite-dimensional setting (see the monographs [19, 26, 28]) and also, though to a lesser extent, in the infinite-dimensional Hilbert space setting.

For the purposes of the present discussion, let us briefly recall what the algorithm consists of in the special version that most evidently manifests its nature of a Krylov subspace algorithm.

Associated to the inverse problem (1.4), with \(A=A^*\geqslant {\mathbb {O}}\) and g satisfying (1.1), one has the solution manifold

$$\begin{aligned} {\mathcal {S}}(A,g) := \{f\in {\mathcal {D}}(A)\,|\,Af=g\}. \end{aligned}$$
(3.1)

As \({\mathcal {S}}(A,g)\) is a convex, non-empty set in \({\mathcal {H}}\) which is also closed, owing to the self-adjointness and hence closedness of A, the projection map \(P_{\mathcal {S}}:{\mathcal {H}}\rightarrow {\mathcal {S}}(A,g)\) is unambiguously defined and produces, for generic \(x\in {\mathcal {H}}\), the closest to x point in \({\mathcal {S}}(A,g)\). The A-smoothness of g makes the definition (1.3) of the Krylov subspace \({\mathcal {K}}(A,g)\) well posed, and next to it one can also consider the N-th order subspaces

$$\begin{aligned} {\mathcal {K}}_N(A,g) := \mathrm {span} \{g,Ag,\dots ,A^{N-1}g\},\qquad N\in {\mathbb {N}}. \end{aligned}$$
(3.2)

The CG method, in the special version that we are reviewing now, then consists of producing ‘iterates’ \(f_N\in {\mathcal {K}}_N(A,g)\) by means of the minimisation

$$\begin{aligned} f_N := \mathop {\mathrm{argmin}}\limits _{h\in {\mathcal {K}}_N(A,g)} \big \langle (h-P_{\mathcal {S}}h),A(h-P_{\mathcal {S}}h)\big \rangle , \qquad N\in {\mathbb {N}}. \end{aligned}$$
(3.3)

The jargon here reminds us that there are implementations of CG, equivalent to (3.3), which produce the \(f_N\)’s iteratively, with no reference to the a priori knowledge of \(P_{\mathcal {S}}\), and hence clearly suited for numerics [19, 26].

Clearly, if for some N one has \(f_N\in {\mathcal {S}}(A,g)\), then \(A f_N=g\): the algorithm has come to convergence in a finite number of steps. This is the case when \(\dim {\mathcal {H}}<\infty \). When instead \(\dim {\mathcal {H}}=\infty \), the generic behaviour of the CG iterates is to get asymptotically closer and closer to the solution manifold, thus providing approximate solutions to the inverse problem (1.4).

It is worth mentioning that all iterates (3.3) have the same projection onto the solution manifold, more precisely [5, Prop. 2.1],

$$\begin{aligned} P_{\mathcal {S}}\,f_N = P_{\mathcal {S}}\,\mathbf {0}\;=:f^\circ \qquad \forall N\in {\mathbb {N}}, \end{aligned}$$
(3.4)

that is, for all the \(f_N\)’s their closest to \({\mathcal {S}}(A,g)\) point is the projection onto \({\mathcal {S}}(A,g)\) of the zero vector of \({\mathcal {H}}\). Since, by linearity of A, \({\mathcal {S}}(A,g)\) is in fact an affine space, \(f^\circ \) is the minimal norm solution to \(Af=g\).

When \(\dim {\mathcal {H}}=\infty \), the convergence theory \(f_N\rightarrow f^\circ \) has been studied over the last five decades, both in the scenario where A is bounded with everywhere-defined bounded inverse [8, 9, 15], or at least with bounded inverse on its range [17], and in the scenario where A is bounded with possible unbounded inverse on its range [4, 11, 14, 17, 20,21,22]. Recently, the general unbounded-A case was covered too [5]. For concreteness, when g is a quasi-analytic vector (recall that the set \(\mathcal {D}^{qa}(A)\) of quasi-analytic vectors for A are dense, [27, Sect. 7.4]), one has the following.

Theorem 3.1

For a given Hilbert space \({\mathcal {H}}\) let \(A=A^*\geqslant {\mathbb {O}}\) and let \(g\in \mathrm {ran} A\cap \mathcal {D}^{qa} (A)\). Then the inverse problem \(Af=g\) is Krylov solvable, and in particular one has

$$\begin{aligned} \lim _{N\rightarrow \infty }\big \Vert f_N-f^\circ \big \Vert = 0 \end{aligned}$$
(3.5)

along the sequence of the conjugate gradient iterates \(f_N\in {\mathcal {K}}_N(A,g)\) defined in (3.3), where \(f^\circ \) is the minimal norm solution to the considered inverse problem.

Theorem 3.1 is the special case of a much wider class of convergence results for CG that for the (non-negative, self-adjoint) bounded-A case were proved in full completeness by Nemirovskiy and Polyak [21, 22], and for the (non-negative, self-adjoint) unbounded-A case were proved in our recent work [5]. (For the reader’s reference, Theorem 3.1 is the special case of [5, Theorem 2.4] when, in the notation therein, \(f^{[0]}=\mathbf {0}\), \(\sigma =0\), \(\xi =1\).)

This establishes Krylov solvability in the framework of unbounded, non-negative, self-adjoint inverse problems.

4 The General Self-adjoint and Skew-Adjoint Case

In this section we present our first main result, that in practice extends Corollary 2.6 to the whole classes of (possibly unbounded) self-adjoint or skew-adjoint operators.

As such, this also generalises the conjugate-gradient-based Krylov solvability statement of Theorem 3.1 established for (possibly unbounded) non-negative, self-adjoint inverse problems. The reason why we dealt first with the CG-analysis of Sect. 3 is that our next Theorem 4.1 is in fact based on the special non-negative case of Theorem 3.1.

Theorem 4.1

For a given Hilbert space \({\mathcal {H}}\) let A be a self-adjoint (\(A^*=A\)) or skew-adjoint (\(A^*=-A)\) operator on \({\mathcal {H}}\) and let \(g\in \mathrm {ran} A\cap \mathcal {D}^{qa} (A)\). Then there exists a unique solution f to \(Af=g\) such that \(f\in \overline{{\mathcal {K}}(A,g)}\). Thus, the inverse problem \(Af=g\) is Krylov-solvable.

Proof

Existence. In the self-adjoint case, since \(A^2\) is self-adjoint and \(A^2\geqslant {\mathbb {O}}\), Theorem 3.1 implies that there exists \(f\in \overline{{\mathcal {K}}(A^2,Ag)} \subset \overline{{\mathcal {K}}(A,g)}\) such that \(A^2f=Ag\). Analogously, in the skew-adjoint case, since \(-A^2\) is self-adjoint and \(-A^2\geqslant {\mathbb {O}}\), then there exists \(f\in \overline{{\mathcal {K}}(-A^2,-Ag)}=\overline{{\mathcal {K}} (A^2,Ag)}\subset \overline{{\mathcal {K}}(A,g)}\) such that \(-A^2f=-Ag\).

In either case, \(f\in \overline{{\mathcal {K}}(A,g)}\), \(f\in {\mathcal {D}}(A^2)\subset {\mathcal {D}}(A)\), and \(A^2f=Ag\), equivalently, \(A(Af-g)=0\). This shows that \(Af-g\in \ker A\).

On the other hand, both Af and g belong to \(\mathrm {ran}A\), whence \(Af-g\in \mathrm {ran}A\subset (\ker A^*)^\perp =(\ker A)^\perp \), where the last identity is clearly valid both for in the self-adjoint and in the skew-adjoint case.

Then necessarily \(Af-g=0\), which proves that \(f\in \overline{{\mathcal {K}}(A,g)}\) is a solution to the considered inverse problem.

Uniqueness. If \(f_1,f_2\in \overline{{\mathcal {K}}(A,g)}\) and \(Af_1=g=Af_2\), then \(f_1-f_2\in \ker A\cap \overline{{\mathcal {K}}(A,g)}\). Moreover, \(\ker A=\ker A^*\) and \(\overline{{\mathcal {K}}(A,g)}\subset \overline{\mathrm {ran}A}\). Therefore, \(f_1-f_2\in \ker A^*\cap \overline{\mathrm {ran}A}\). Thus, \(f_1=f_2\). \(\square \)

Remark 4.2

In view of the discussion in Sect. 3, the proof of Theorem 4.1 shows that the actual Krylov-solution f to \(Af=g\) is the minimal norm solution to \(A^2f=Ag\) and admits approximants \(f_N\), with \(\Vert f_N-f\Vert _{{\mathcal {H}}}\rightarrow 0\) as \(N\rightarrow \infty \), defined by

$$\begin{aligned} f_N := \mathop {\mathrm{argmin}}\limits _{h\in {\mathcal {K}}_N(A^2,Ag)} \big \Vert A(h-f)\big \Vert _{{\mathcal {H}}}^2. \end{aligned}$$

Thus, the iterates of the CG algorithm applied to the auxiliary problem \(A^2f=Ag\) (interpreted as \(-A^2f=-Ag\) in the skew-adjoint case) converge precisely to the Krylov solution to \(Af=g\).

Remark 4.3

Unbounded skew-adjoint inverse problems are intimately related to inverse problems induced by so-called Friedrichs operators. These constitute a class of elliptic, parabolic, and hyperbolic differential operators, that can be also characterised as abstract operators on Hilbert space \({\mathcal {H}}\) [1,2,3, 12], having the typical (but not the only one) form \(T=A+C\) where \(A^*=-A\) and \(C\in {\mathcal {B}}({\mathcal {H}})\). Theorem 4.1 is applicable when C is skew-adjoint itself.

5 New Phenomena in the General Unbounded Case: ‘Krylov Escape’, Generalised Krylov Reducibility, Generalised Krylov Intersection

Let us start in this section the analysis of Krylov solvability of the inverse problem (1.4), under the working condition (1.2), when the (possibly unbounded) operator A is densely defined and closed in \({\mathcal {H}}\) – without necessarily being self-adjoint.

A number of substantial novelties, due to domain issues, emerge in this case as compared to the bounded case discussed in Sect. 2.

The first unavoidable difference concerns the invariance of \(\overline{{\mathcal {K}}(A,g)}\) (resp., \({\mathcal {K}}(A,g)^\perp \)) under the action of A (resp., of \(A^*\)). Indeed, the inclusions (2.4) certainly cannot be valid in general, because the above subspaces may well not be included, respectively, in \({\mathcal {D}}(A)\) and \({\mathcal {D}}(A^*)\).

Example 2

The ‘quantum mechanical creation operator’

$$\begin{aligned} A&= -\frac{\mathrm{d}}{\mathrm{d}x}+x \\ {\mathcal {D}}(A)&= \big \{h\in L^2({\mathbb {R}}) \,|\,-h'+xh\in L^2({\mathbb {R}})\big \} \end{aligned}$$

is densely defined, unbounded, and closed, and has the well-known property that

$$\begin{aligned} \psi _{n+1} = \frac{1}{\sqrt{n+1}}\,A\psi _n\qquad n\in {\mathbb {N}}_0, \end{aligned}$$

where \((\psi _n)_{n\in {\mathbb {N}}_0}\) is the orthonormal basis of \(L^2({\mathbb {R}})\) of the Hermite functions \(\psi _n(x)=c_n H_n(x) e^{-x^2/2}\) (here \(c_n\) is a normalisation factor and \(H_n\) is the n-th Hermite polynomial). In particular, each \(\psi _n\) is a \(C^\infty (A)\)-function. Choosing \(g=\psi _1\) evidently yields \(\overline{{\mathcal {K}}(A,g)}=\mathrm {span}\{\psi _0\}^\perp \). But there are \(L^2\)-functions orthogonal to \(\psi _0\) that do not belong to \({\mathcal {D}}(A)\).

It is then clear that only the possible invariance of \(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\) under A and of \({\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A^*)\) under \(A^*\) makes sense in general.

This naturally leads one to consider the operators \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\) (the so-called ‘part of A on \(\overline{{\mathcal {K}}(A,g)}\)’) and \(A^*|_{{\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A^*)}\) (the ‘part of \(A^*\) on \({\mathcal {K}}(A,g)^\perp \)’). Noticeably, when A is unbounded (and hence \({\mathcal {D}}(A)\) is a proper dense subspace of \({\mathcal {H}}\)), none of the two is densely defined in \({\mathcal {H}}\), unless \(\overline{{\mathcal {K}}(A,g)}={\mathcal {H}}\), as their domain is by construction the intersection of a proper dense and a proper closed subspace. Obviously, instead, \(A|_{\overline{{\mathcal {K}}(A,g)} \cap {\mathcal {D}}(A)}\) is densely defined in the Hilbert space \(\overline{{\mathcal {K}}(A,g)}\).

Lemma 5.1

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined operator on \({\mathcal {H}}\) and let \(g\in C^\infty (A)\). Then

$$\begin{aligned} A^*\big ( {\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A^*) \big ) \;\subset \; {\mathcal {K}}(A,g)^\perp . \end{aligned}$$
(5.1)

Proof

Let \(z\in {\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A^*)\). For arbitrary \(h\in \overline{{\mathcal {K}}(A,g)}\) let \((h_n)_{n\in {\mathbb {N}}}\) be a sequence in \({\mathcal {K}}(A,g)\) of norm-approximants of h. Then each \(Ah_n\in {\mathcal {K}}(A,g)\), and therefore

$$\begin{aligned} \langle h, A^*z\rangle = \lim _{n\rightarrow \infty }\langle h_n, A^*z\rangle = \lim _{n\rightarrow \infty }\langle Ah_n,z\rangle = 0, \end{aligned}$$

thus proving (5.1). \(\square \)

The counterpart inclusion to (5.1), namely \(A(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)) \subset \overline{{\mathcal {K}}(A,g)}\) when \(\overline{{\mathcal {K}}(A,g)}\) is only a proper closed subspace of \({\mathcal {H}}\), turns out to be considerably less trivial. In fact, as somewhat counterintuitive as it appears, A may indeed map vectors from \(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\) outside of \(\overline{{\mathcal {K}}(A,g)}\). In the present context, we shall refer to this phenomenon, that has no analogue in the bounded case, as ‘Krylov escape’.

Example 3

(Krylov escape) Let \({\mathcal {H}}'\) be a Hilbert space and \(T'\) be a self-adjoint operator in \({\mathcal {H}}'\) having a cyclic vector \(g'\), meaning that there exists \(g'\in {\mathcal {D}}(T')\) such that \(\overline{{K}(T',g')}={\mathcal {H}}'\). (It is straightforward to construct many explicit examples for such a choice.) For any 1-dimensional vector space \({\mathcal {H}}_0\), say, \({\mathcal {H}}_0=\mathrm {span}\{e_0\}\), set

$$\begin{aligned} {\mathcal {H}}&:={\mathcal {H}}_0\oplus {\mathcal {H}}' \\ T&:={\mathbb {O}}\oplus T' \\ g&:=\mathbf {0}\oplus g'. \end{aligned}$$

(The last condition is just an identification of g as an element of \({\mathcal {H}}\).) Thus, T is a self-adjoint operator in \({\mathcal {H}}\) such that \(Te_0=\mathbf {0}\) and \(Tx'=T'x'\) \(\forall x'\in {\mathcal {D}}(T')\), and moreover \(\overline{{\mathcal {K}}(T,g)}={\mathcal {H}}'\). (Now the closure is taken with respect to \({\mathcal {H}}\).) Furthermore, let \(x_0\in {\mathcal {H}}\) such that \(x_0\in {\mathcal {H}}'{\setminus }{\mathcal {D}}(T')\). Then set

$$\begin{aligned} {\mathcal {D}}(A)&:= {\mathcal {D}}(T)\dotplus \mathrm {span}\{x_0\} \\ Ax_0&:= e_0 \\ Ax&:= Tx\qquad \forall x\in {\mathcal {D}}(T). \end{aligned}$$

A is meant to be defined by the above identities and extended by linearity on the whole \({\mathcal {D}}(T)\dotplus \mathrm {span} \{x_0\}\). The operator A is densely defined in \({\mathcal {H}}\) by construction.

  • Closedness. Let us check that if \(x_n+\mu _n x_0\rightarrow v\) and \(A(x_n+\mu _n x_0)\rightarrow w\) in \({\mathcal {H}}\) as \(n\rightarrow \infty \) for some vectors \(v,w\in {\mathcal {H}}\), where \((x_n)_{n\in {\mathbb {N}}}\) is a generic sequence in \({\mathcal {D}}(T)\) and \((\mu _n)_{n\in {\mathbb {N}}}\) is a generic sequence in \({\mathbb {C}}\), then \(v\in {\mathcal {D}}(A)\) and \(Av=w\). First, we observe that it must be \(\mu _n\rightarrow \mu \) for some \(\mu \in {\mathbb {C}}\), for otherwise there would be no chance for the vectors \(A(x_n+\mu _n x_0)=Tx_n+\mu _n e_0\) to converge as assumed, because \(Tx_n\perp \mu _n e_0\). Next, since \(\mu _n\rightarrow \mu \) and \(x_n+\mu _n x_0\rightarrow v\), then necessarily \(x_n\rightarrow x\) for some \(x\in {\mathcal {H}}\) satisfying \(x+\mu x_0=v\); analogously, since \(Tx_n+\mu _n e_0=A(x_n+\mu _n x_0)\rightarrow w\), then \(Tx_n\rightarrow y\) for some \(y\in {\mathcal {H}}\) satisfying \(y+\mu e_0=w\). As by construction T is self-adjoint and hence closed, then necessarily \(x\in {\mathcal {D}}(T)\) and \(Tx=y\). In turn, this implies that \(v\in {\mathcal {D}}(A)\) and \(Av=w\). The conclusion is that A is closed.

  • Occurrence of Krylov escape. By the construction above, \(\overline{{\mathcal {K}}(A,g)}={\mathcal {H}}' =\mathrm {span}\{e_0\}^\perp \). Let us focus on the vector \(e_0\). On the one hand, \(e_0=Ax_0\) and \(x_0\) by definition belongs both to \(\overline{{\mathcal {K}}(A,g)}\) and to \({\mathcal {D}}(A)\). Therefore \(e_0\in A(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A))\). On the other hand, however, \(e_0\in ({\mathcal {H}}')^\perp ={\mathcal {K}} (A,g)^\perp \), whence \(e_0\notin \overline{{\mathcal {K}}(A,g)}\). This provides a counterexample of a densely defined closed operator A in \({\mathcal {H}}\) such that the inclusion

    $$\begin{aligned} A\big (\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\big ) \;\subset \;\overline{{\mathcal {K}}(A,g)}. \end{aligned}$$

    is violated.

Owing to the possible occurrence of the Krylov escape phenomenon, the invariance of \(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}} (A)\) requires additional assumptions. A reasonable one is to assume further that the operator \(A|_{\overline{{\mathcal {K}}(A,g)} \cap {\mathcal {D}}(A)}\) and its restriction \(A|_{{\mathcal {K}}(A,g)}\) are in a sense as close as possible. To this aim, let us observe first the following.

Lemma 5.2

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\) and let \(g\in C^\infty (A)\). Then

  1. (i)

    The operator \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\) is closed,

  2. (ii)

    And the operator \(A|_{{\mathcal {K}}(A,g)}\) is closable.

Proof

Obviously \(A|_{{\mathcal {K}}(A,g)}\subset A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\), in the sense of operator inclusion, so part (ii) follows at once from part (i). In turn, part (i) is true as is always the case when one restricts a closed operator to the intersection of its domain with a closed subspace: the restriction operator too is closed. Explicitly, let \(((x_n,Ax_n))_{n\in {\mathbb {N}}}\) be an arbitrary \({\mathcal {H}}\oplus {\mathcal {H}}\)-convergent sequence in the graph of \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\), that is, for some \(x,y\in {\mathcal {H}}\) one has \(\overline{{\mathcal {K}}(A,g)} \cap {\mathcal {D}}(A)\ni x_n\rightarrow x\) and \(Ax_n\rightarrow y\) in \({\mathcal {H}}\). Then, by closedness of A, \(x\in {\mathcal {D}}(A)\) and \(Ax=y\). Moreover, since \(x_n\in \overline{{\mathcal {K}}(A,g)}\) \(\forall n\), also for the limit point one has \(x\in \overline{{\mathcal {K}}(A,g)}\). Thus, \(x\in \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\). This shows that the pair (xy) belongs to the graph of \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\), which is therefore closed in \({\mathcal {H}}\oplus {\mathcal {H}}\). \(\square \)

Remark 5.3

With a completely analogous argument one shows that the operator \(A^*|_{{\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A^*)}\) is closed too.

It is then natural to consider the case when the operator closure of \(A|_{{\mathcal {K}}(A,g)}\) is precisely \(A|_{\overline{{\mathcal {K}} (A,g)}\cap {\mathcal {D}}(A)}\).

Definition 5.4

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\) and let \(g\in C^\infty (A)\). Then the pair (Ag) is said to satisfy the ‘Krylov-core condition’ when the subspace \({\mathcal {K}}(A,g)\) is a core for \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\). Explicitly, this is the requirement that

$$\begin{aligned} \overline{A|_{{\mathcal {K}}(A,g)}} = A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)} \end{aligned}$$
(5.2)

in the sense of operator closure, equivalently, it is the requirement that \({\mathcal {K}}(A,g)\) is dense in \(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\) in the graph norm \(\Vert h\Vert _A:=(\Vert h\Vert _{{\mathcal {H}}}^2+\Vert Ah\Vert _{{\mathcal {H}}}^2)^{\frac{1}{2}}\):

$$\begin{aligned} \overline{{\mathcal {K}}(A,g)}^{\Vert \,\Vert _A} = \overline{{\mathcal {K}} (A,g)}\cap {\mathcal {D}}(A). \end{aligned}$$
(5.3)

Remark 5.5

By closedness of A, the inclusion

$$\begin{aligned} \overline{{\mathcal {K}}(A,g)}^{\Vert \,\Vert _A}\;\subset \;\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A) \end{aligned}$$
(5.4)

is always true, as one sees reasoning as for Lemma 5.2.

Example 4

Let \({\mathcal {H}}=\ell ^2({\mathbb {N}}_0)\) and, in terms of the canonical orthonormal basis \((e_n)_{n\in {\mathbb {N}}_0}\), let A be the densely defined and closed operator defined by

$$\begin{aligned} {\mathcal {D}}(A)&:= \Big \{x\equiv (x_n)_{n\in {\mathbb {N}}_0} \,\Big |\,\sum _{n=0}^\infty (n+1)^2|x_n|^2<+\infty \Big \} \\ Ax&:= (0,x_0,2x_1,3x_2,\dots )\qquad \text {for }x\in {\mathcal {D}}(A) \end{aligned}$$

(thus, in particular, \(Ae_n=(n+1) e_{n+1}\) for any \(n\in {\mathbb {N}}_0\)). Obviously, we have \({\mathcal {K}}(A,e_0)=\mathrm {span}\{e_1,e_2,e_3,\dots \}\) and \(\overline{{\mathcal {K}}(A,e_0)}=\{e_0\}^\perp \). Let

$$\begin{aligned} x := (0,x_1,x_2,x_3,x_4,\dots )\quad \text {with} \quad \sum _{n=1}^\infty (n+1)^2|x_n|^2<+\infty \end{aligned}$$

be a generic vector in \(\overline{{\mathcal {K}} (A,e_0)}\cap {\mathcal {D}}(A)\) and for each \(N\in {\mathbb {N}}\) let

$$\begin{aligned} x^{(N)} := (0,x_1^{(N)},\dots ,x_N^{(N)},0,0,\dots ) \quad \text {with}\quad x_n^{(N)}=x_n+\frac{1}{n^2N}. \end{aligned}$$

Then \(x^{(N)}\in {\mathcal {K}}(A,e_0)\) and

$$\begin{aligned} \big \Vert x-x^{(N)}\big \Vert _A^2&= \sum _{n=1}^\infty (1+(n+1)^2)\big |x_n-x_n^{(N)}\big |^2 \\&= \frac{1}{\,N^2}\sum _{n=1}^N\frac{n^2+2n+2}{n^4}+\sum _{n=N+1}^\infty (1+(n+1)^2)|x_n|^2 \xrightarrow []{N\rightarrow \infty }\;0, \end{aligned}$$

whence \(x\in \overline{{\mathcal {K}}(A,e_0)}^{\Vert \,\Vert _A}\). Thus, \(\overline{{\mathcal {K}}(A,e_0)}^{\Vert \,\Vert _A}\supset \overline{{\mathcal {K}} (A,e_0)}\cap {\mathcal {D}}(A)\), which together with (5.4) shows that the pair \((A,e_0)\) does satisfy the Krylov-core condition (5.3).

The Krylov-core condition is indeed sufficient to finally ensure that A maps \(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\) into \(\overline{{\mathcal {K}}(A,g)}\).

Lemma 5.6

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\) and let \(g\in C^\infty (A)\).

  1. (i)

    One has the inclusion

    $$\begin{aligned} A\big ( \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\big ) \;\subset \;\overline{A{\mathcal {K}}(A,g)} \end{aligned}$$
    (5.5)

    if and only if A and g satisfy the Krylov-core condition (5.2).

  2. (ii)

    In particular, under the Krylov-core condition one has

    $$\begin{aligned} A\big ( \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\big ) \;\subset \; \overline{{\mathcal {K}}(A,g)}. \end{aligned}$$
    (5.6)

Proof

Clearly (5.6) follows at once from (5.5) as \(\overline{A{\mathcal {K}}(A,g)}\subset \overline{{\mathcal {K}}(A,g)}\). Let us focus then on the proof of part (i). Assume that (5.2) is satisfied and let \(z\in \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\), the domain of \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\). Since the latter operator is the closure of \(A|_{{\mathcal {K}}(A,g)}\), then there exists \((z_n)_{n\in {\mathbb {N}}}\) in \({\mathcal {K}}(A,g)\) such that \(z_n\rightarrow z\) and \(A|_{{\mathcal {K}}(A,g)}z_n\rightarrow A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)} z\), i.e., \(Az_n\rightarrow Az\). This shows that \(Az\in \overline{A{\mathcal {K}}(A,g)}\). For the converse implication, assume now that \(z\in \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\) and that \(A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\) is a proper closed extension of \(\overline{A|_{{\mathcal {K}} (A,g)}}\). This means that whatever sequence \((z_n)_{n\in {\mathbb {N}}}\) in \({\mathcal {K}}(A,g)\) of norm-approximants of z is considered, one cannot have \(Az_n\rightarrow Az\), because this would mean \(\overline{A|_{{\mathcal {K}} (A,g)}}z_n\rightarrow A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)} z\), that is, \(\overline{A|_{{\mathcal {K}}(A,g)}}z =A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)} z\), contrarily to the assumption \(\overline{A|_{{\mathcal {K}} (A,g)}}\varsubsetneq A|_{\overline{{\mathcal {K}}(A,g)} \cap {\mathcal {D}}(A)}\). Thus, Az cannot have norm-approximants in \(A{\mathcal {K}}(A,g)\) and hence (5.5) cannot be valid. \(\square \)

Remark 5.7

In general \(\overline{A{\mathcal {K}}(A,g)}\varsubsetneq \overline{{\mathcal {K}}(A,g)}\). Example 2 shows that in that case \(\overline{A{\mathcal {K}}(A,g)}=\mathrm {span} \{\psi _0,\psi _1\}^\perp \), whereas \(\overline{{\mathcal {K}}(A,g)} =\mathrm {span}\{\psi _0\}^\perp \).

In the framework of the preceding discussion the two key notions of Krylov reducibility and Krylov intersection introduced in the bounded case in Definition 2.1 can be now generalised to the unbounded case.

Definition 5.8

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\) and let \(g\in C^\infty (A)\).

  • A is said to be \({\mathcal {K}}(A,g)\)-reduced (in the generalised sense), for short Krylov-reduced, when

    $$\begin{aligned} A\big ( \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\big )\subset & {} \overline{{\mathcal {K}}(A,g)} \nonumber \\ A\big ( {\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A)\big )\subset & {} {\mathcal {K}}(A,g)^\perp . \end{aligned}$$
    (5.7)
    1. (ii)

      The subspace

      $$\begin{aligned} {\mathcal {I}}(A,g) := \overline{{\mathcal {K}}(A,g)}\,\cap \, A \big ({\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A)\big ) \end{aligned}$$
      (5.8)

      is called the (generalised) Krylov intersection for the given A and g.

As was the case for Definition 2.1, it is clear from (5.7) to (5.8) that also in the generalised sense of Definition 5.8 Krylov reducibility implies triviality of the Krylov intersection.

Remark 5.9

The first condition of the two requirements (5.7) for Krylov-reducibility is precisely the lack of Krylov escape (5.6). Thus this condition is matched if, for instance, A and g satisfy the Krylov-core condition (Lemma 5.6).

Remark 5.10

Krylov reducibility in the generalised sense (5.7) for an unbounded operator differs from Krylov reducibility in the bounded case, which is formulated as

$$\begin{aligned} A\overline{{\mathcal {K}}(A,g)}&\subset \overline{{\mathcal {K}}(A,g)} \\ A{\mathcal {K}}(A,g)^\perp&\subset {\mathcal {K}}(A,g)^\perp , \end{aligned}$$

in that when A is bounded the subspaces \(\overline{{\mathcal {K}}(A,g)}\) and \({\mathcal {K}}(A,g)^\perp \) are reducing for A and hence with respect to the Krylov decomposition \({\mathcal {H}}= \overline{{\mathcal {K}}(A,g)}\oplus {\mathcal {K}}(A,g)^\perp \) the operator A decomposes as \(A=A|_{\overline{{\mathcal {K}}(A,g)}}\oplus A|_{{\mathcal {K}}(A,g)^\perp }\) whereas if A is unbounded and Krylov-reduced it is false in general that \(A=A|_{\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)}\oplus A|_{{\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A)}\).

6 Krylov Solvability in the General Unbounded Case

In this section we examine counterparts of Krylov solvability of the inverse problem (1.4) when the operator A is densely defined and closed.

A crucial role in this matter turns out to be played by the lack of Krylov escape, namely the property (5.6), a feature that was automatically present in the bounded case. A first example is the following technical Lemma that will be useful in a moment.

Lemma 6.1

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\), let \(g\in \mathrm {ran}(A)\cap C^\infty (A)\), and let \(f\in {\mathcal {D}}(A)\) satisfy \(Af=g\). If in addition the pair (Ag) satisfies the Krylov-core condition and \(f\in \overline{{\mathcal {K}}(A,g)}\), then

$$\begin{aligned} \overline{A {\mathcal {K}}(A,g)} = \overline{{\mathcal {K}}(A,g)}. \end{aligned}$$
(6.1)

(We already observed in Remark 5.7 that in general \(\overline{A {\mathcal {K}}(A,g)}\varsubsetneq \overline{{\mathcal {K}}(A,g)}\).)

Proof of Lemma 6.1

By assumption, \(f\in \overline{{\mathcal {K}}(A,g)} \cap {\mathcal {D}}(A)\), which is the same as \(f\in \overline{{\mathcal {K}}(A,g)}^{\Vert \Vert _A}\), owing to the Krylov-core condition. Thus there exists a sequence \((f_n)_{n\in {\mathbb {N}}}\) in \({\mathcal {K}}(A,g)\) such that \(f_n\xrightarrow []{\Vert \,\Vert _A}f\) as \(n\rightarrow \infty \), in particular \(Af_n\xrightarrow []{\Vert \,\Vert }Af=g\), which implies that g belongs to \(\overline{A{\mathcal {K}}(A,g)}\). Clearly all vectors \(Ag,A^2g,A^3g,\dots \) belong to the same space too. Therefore,

$$\begin{aligned} \mathrm {span}\{A^kg\,|\,k\in {\mathbb {N}}_0\} \;\subset \;\overline{A{\mathcal {K}}(A,g)}, \end{aligned}$$

so that \(\overline{{\mathcal {K}}(A,g)}\subset \overline{A{\mathcal {K}}(A,g)}\). The opposite inclusion is trivial, hence (6.1) follows. \(\square \)

For convenience we shall denote by \(P_{{\mathcal {K}}}\) the orthogonal projection \(P_{{\mathcal {K}}}:{\mathcal {H}}\rightarrow {\mathcal {H}}\) onto the Krylov subspace \(\overline{{\mathcal {K}}(A,g)}\).

In the unbounded injective case, the triviality of the Krylov intersection still implies Krylov solvability under additional assumptions that are automatically satisfied if A is bounded. The first requires that the orthogonal projection on \(\overline{{\mathcal {K}}(A,g)}\) of the solution manifold \({\mathcal {S}}(A,g)\) is entirely contained in the domain of A. The other requirement is the lack of Krylov escape.

In fact, Proposition 2.2 admits the following analogues.

Proposition 6.2

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\), let \(g\in \mathrm {ran}(A)\cap C^\infty (A)\), and let \(f\in {\mathcal {D}}(A)\) satisfy \(Af=g\). Assume furthermore that

  1. (a)

    A is injective;

  2. (b)

    \(A(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)) \subset \overline{{\mathcal {K}}(A,g)}\) – this assumption holds true, for example, if the pair (Ag) satisfies the Krylov-core condition (Lemma 5.6);

  3. (c)

    \(P_{{\mathcal {K}}}f\in {\mathcal {D}}(A)\);

  4. (d)

    \({\mathcal {I}}(A,g)=\{0\}\),

or also, assume the more stringent assumptions

  1. (a)

    A is injective;

  2. (b’)

    A is \({\mathcal {K}}(A,g)\)-reduced

  3. (c)

    \(P_{{\mathcal {K}}}f\in {\mathcal {D}}(A)\).

Under such assumptions, \(f\in \overline{{\mathcal {K}}(A,g)}\).

Proof

By assumption (c), \(P_{\mathcal {K}}f\in \overline{{\mathcal {K}} (A,g)}\cap {\mathcal {D}}(A)\); by assumption (b), \(AP_{\mathcal {K}}f\in \overline{{\mathcal {K}}(A,g)}\). Then \(A({\mathbb {1}}-P_{\mathcal {K}})f=g-AP_{\mathcal {K}}f \in \overline{{\mathcal {K}}(A,g)}\). On the other hand, again by assumption (c), \(({\mathbb {1}}-P_{\mathcal {K}})f \in {\mathcal {D}}(A)\), whence \(A({\mathbb {1}}-P_{\mathcal {K}})f \in A( {\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A))\). Thus, \(A({\mathbb {1}}-P_{\mathcal {K}})f\in {\mathcal {I}}(A,g)\). By assumptions (a) and (d), then \(f=P_{\mathcal {K}}f\). \(\square \)

Proposition 6.3

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\), let \(g\in \mathrm {ran}(A)\cap C^\infty (A)\), and let \(f\in {\mathcal {D}}(A)\) satisfy \(Af=g\). Assume furthermore that

  1. (a)

    A is invertible with everywhere defined, bounded inverse on \({\mathcal {H}}\);

  2. (b)

    the pair (Ag) satisfies the Krylov-core condition.

Under such assumptions, if \(f\in \overline{{\mathcal {K}}(A,g)}\), then \({\mathcal {I}}(A,g)=\{0\}\).

Proof

Let \(z\in {\mathcal {I}}(A,g)\). Then \(z=Aw\) for some \(w\in {\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A)\), and \(z\in \overline{{\mathcal {K}}(A,g)}\). Owing to Lemma 6.1, \(\overline{{\mathcal {K}}(A,g)} =\overline{A{\mathcal {K}}(A,g)}\), hence there is a sequence \((v_n)_{n\in {\mathbb {N}}}\) in \({\mathcal {K}}(A,g)\) such that \(Av_n\rightarrow z=Aw\) as \(n\rightarrow \infty \). Then also \(v_n\rightarrow w\), because \(\Vert v_n-w\Vert _{{\mathcal {H}}}\leqslant \Vert A^{-1}\Vert _{\mathrm {op}}\Vert Av_n-z\Vert _{{\mathcal {H}}}\). Since \(v_n\perp w\) for each n, then

$$\begin{aligned} 0 = \lim _{n\rightarrow \infty }\Vert v_n-w\Vert _{{\mathcal {H}}}^2 = \lim _{n\rightarrow \infty } \big (\Vert v_n\Vert _{{\mathcal {H}}}^2+\Vert w\Vert _{{\mathcal {H}}}^2\big ) = 2\Vert w\Vert _{{\mathcal {H}}}^2, \end{aligned}$$

whence \(w=0\) and also \(z=Aw=0\). \(\square \)

When A is not injective, from the perspective of the Krylov-solvability of the inverse problem (1.4) one immediately makes two observations. First, if \(g=0\), then trivially \({\mathcal {K}}(A,g)=\{0\}\) and therefore the Krylov space does not capture any of the non-zero solutions to (1.4), which all belong to \(\ker A\). Second, if \(g\ne 0\) and therefore the problem of Krylov-solvability is non-trivial, it is natural to ask whether a Krylov solution exists and whether it is unique.

The following two Propositions provide answers to these questions.

Proposition 6.4

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined operator on \({\mathcal {H}}\) and let \(g\in \mathrm {ran}(A)\cap C^\infty (A)\). If \(\ker A\subset \ker A^*\) (in particular, if A is normal), then there exists at most one \(f\in \overline{{\mathcal {K}}(A,g)} \cap {\mathcal {D}}(A)\) such that \(Af=g\).

Proof

The argument is analogous to that used for the corresponding statement of Theorem 4.1. If \(f_1,f_2\in \overline{{\mathcal {K}}(A,g)}\) and \(Af_1=g=Af_2\), then \(f_1-f_2\in \ker A\cap \overline{{\mathcal {K}}(A,g)}\). By assumption, \(\ker A\subset \ker A^*\), and moreover obviously \(\overline{{\mathcal {K}}(A,g)}\subset \overline{\mathrm {ran}A}\). Therefore, \(f_1-f_2\in \ker A^*\cap \overline{\mathrm {ran}A}=\{0\}\), whence \(f_1=f_2\). \(\square \)

Proposition 6.5

For a given Hilbert space \({\mathcal {H}}\) let A be a densely defined and closed operator on \({\mathcal {H}}\), let \(g\in \mathrm {ran}(A)\cap C^\infty (A)\), and let \(f\in {\mathcal {D}}(A)\) satisfy \(Af=g\). Assume furthermore that

  1. (a)

    \(A(\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)) \subset \overline{{\mathcal {K}}(A,g)}\);

  2. (b)

    \(P_{{\mathcal {K}}}f\in {\mathcal {D}}(A)\);

  3. (c)

    \({\mathcal {I}}(A,g)=\{0\}\),

or also, assume the more stringent assumptions

  1. (a’)

    A is \({\mathcal {K}}(A,g)\)-reduced

  2. (b’)

    \(P_{{\mathcal {K}}}f\in {\mathcal {D}}(A)\).

Then there exists \(f_\circ \in \overline{{\mathcal {K}} (A,g)}\cap {\mathcal {D}}(A)\) such that \(Af_\circ =g\).

Proof

Let f be a solution to \(Af=g\) (it certainly exists, for \({\mathcal {S}}(A,g)\) is non-empty). Reasoning as in the proof of Proposition 6.2: \(P_{\mathcal {K}}f\in \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\) (by (b)), \(AP_{\mathcal {K}}f\in \overline{{\mathcal {K}}(A,g)}\) (by (a)), \(({\mathbb {1}}-P_{\mathcal {K}})f\in {\mathcal {D}}(A)\) (by (b)), whence \(A({\mathbb {1}}-P_{\mathcal {K}})f\in A( {\mathcal {K}}(A,g)^\perp \cap {\mathcal {D}}(A))\) and also \(A({\mathbb {1}}-P_{\mathcal {K}})f=g-AP_{\mathcal {K}}f \in \overline{{\mathcal {K}}(A,g)}\). Thus, \(A({\mathbb {1}} -P_{\mathcal {K}}) f\in {\mathcal {I}} (A,g)\), whence \(g=Af=AP_{\mathcal {K}}f\) (by (c)). Set

$$\begin{aligned} f_\circ := P_{\mathcal {K}}f \in \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A). \end{aligned}$$

Now, if \(g=0\), then \(\overline{{\mathcal {K}}(A,g)}=\{0\}\), whence \(f_\circ =0\): this (trivial) solution to the corresponding inverse problem is indeed a Krylov-solution. If instead \(g\ne 0\), then necessarily \(f_\circ \ne 0\). In either case \(Af_\circ =g\). \(\square \)

7 The Self-adjoint Case Revisited: Structural Properties

We return in this section to the general question of Krylov-solvability for an inverse problem of the form \(Af=g\) when \(A=A^*\).

More precisely, whereas the analysis of Sect. 4 (Theorem 4.1) already provides an affirmative answer, based on conjugate gradient arguments, it does not explain how the operator A and the datum g behave in the self-adjoint case with respect to the abstract operator-theoretic mechanisms for Krylov-solvability identified in Sect. 6 (Krylov-reducibility, triviality of the Krylov intersection, stability of the solution manifold inside \({\mathcal {D}}(A)\) under the projection \(P_{\mathcal {K}}\)).

Of course it is straightforward to observe that if \(A=A^*\) and \(g\in C^\infty (A)\), then the second of the two conditions (5.7) for Krylov reducibility is automatically true (owing to (5.1)) and therefore \({\mathcal {I}}(A,g)\) is always trivial.

Unlike the bounded case, however, in order for A to be \({\mathcal {K}}(A,g)\)-reduced no Krylov escape must occur, namely A has to match also the first of the two conditions (5.7), and we have already observed (Remark 5.9) that an assumption on A and g such as the Krylov-core condition would indeed prevent the Krylov escape phenomenon.

As relevant such issues are, from a more abstract perspective, to understand why ‘structurally’ a self-adjoint inverse problem is Krylov-solvable, and as deceptively simple the underlying mathematical questions appear, to our knowledge no complete answer is available in the affirmative (i.e., a proof) or in the negative (a counterexample) to the following questions:

(Q1):

When \(A=A^*\) and \(g\in C^\infty (A)\), is it true that \(\overline{{\mathcal {K}}(A,g)}^{\Vert \,\Vert _A} =\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\), i.e., is the Krylov core condition satisfied by the pair (Ag) ?

(Q2):

When \(A=A^*\) and \(g\in C^\infty (A)\), is it true that \( A\big (\overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}} (A)\big )\subset \overline{{\mathcal {K}}(A,g)}\), i.e., is A \({\mathcal {K}}(A,g)\)-reduced in the generalised sense?

(Clearly in (Q2) it is tacitly understood that \({\mathcal {K}}(A,g)\) is not dense in \({\mathcal {H}}\).)

We can provide a partial answer in a vast class of cases, namely whenever the vector g is ‘bounded’ for A.

Let us recall (see, e.g., [27, Sect. 7.4]) that a \(C^\infty \)-vector \(g\in {\mathcal {H}}\) for a linear operator A on a Hilbert space \({\mathcal {H}}\) is bounded for A when there is a constant \(B_g>0\) such that

$$\begin{aligned} \Vert A^n g\Vert _{{\mathcal {H}}} \leqslant B_g^n\qquad \forall n\in {\mathbb {N}}_0. \end{aligned}$$
(7.1)

As well-known (see, e.g., [27, Lemma 7.13]), the vector space of bounded vectors for A, when A is self-adjoint, is dense in \({\mathcal {H}}\).

Theorem 7.1

For a given Hilbert space \({\mathcal {H}}\) let A be a self-adjoint operator on \({\mathcal {H}}\) and let \(g\in {\mathcal {H}}\) be a bounded vector for A. Then the pair (Ag) satisfies the Krylov-core condition and consequently A is \({\mathcal {K}}(A,g)\)-reduced in the generalised sense of Definition 5.8.

As observed already (Remark 5.5), the thesis follows from the inclusion

$$\begin{aligned} \overline{{\mathcal {K}}(A,g)}^{\Vert \,\Vert _A} \supset \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A) \end{aligned}$$
(7.2)

that we shall prove now.

Proof of Theorem 7.1

Let \(x\in \overline{{\mathcal {K}}(A,g)}\cap {\mathcal {D}}(A)\). We need to exhibit a sequence \((x_n)_{n\in {\mathbb {N}}}\) in \({\mathcal {K}}(A,g)\) such that \(x_n\xrightarrow []{\;\Vert \,\Vert _A\;}x\) as \(n\rightarrow \infty \). Since \(x\in \overline{{\mathcal {K}}(A,g)}\), there surely exists a sequence \((p_n(A)g)_{n\in {\mathbb {N}}}\), for some polynomials \(p_n\), that converges to x in the \({\mathcal {H}}\)-norm, although a priori not in the stronger \(\Vert \,\Vert _A\)-norm.

First we show that one can refine and ‘regularise’ such a sequence with respect to the action of A in order to strengthen the convergence. Explicitly, we show that up to taking a subsequence of the \(p_n\)’s, one has

where \(B_g\) is the constant of (7.1).

To this aim, we split

$$\begin{aligned} \big \Vert x- e^{-A^2/(nB_g)}p_n(A)g\big \Vert _A&\leqslant \big \Vert x- e^{-A^2/(nB_g)}x\big \Vert _A \\&\quad +\big \Vert e^{-A^2/(nB_g)}(x-p_n(A)g)\big \Vert _A \end{aligned}$$

and observe that by dominated convergence

$$\begin{aligned} \big \Vert x- e^{-A^2/(nB_g)}x\big \Vert _A^2 = \int _{\mathbb {R}}(1 + \lambda ^2) \,|1-e^{-\lambda ^2/(nB_g)}|^2\,\mathrm{d}\mu _{x}^{(A)}(\lambda )\;\xrightarrow []{n\rightarrow \infty }\;0, \end{aligned}$$

where \(\mu _{x}^{(A)}\) is the scalar spectral measure for A relative to the vector x. A suitable integrable majorant for the dominated convergence argument is \(4(1 +\lambda ^2)\) and for that we used the assumption that \(x\in {\mathcal {D}}(A)\). As for the second summand, one has

$$\begin{aligned} \big \Vert e^{-A^2/(nB_g)}(x-p_n(A)g)\big \Vert _A^2&= \big \Vert ({\mathbb {1}}+A^2)^{\frac{1}{2}}e^{-A^2/(nB_g)}(x-p_n(A)g)\big \Vert _{{\mathcal {H}}}^2 \\&\leqslant \big \Vert ({\mathbb {1}}+A^2)^{\frac{1}{2}}e^{-A^2/(nB_g)} \big \Vert _{\mathrm {op}}^2\,\Vert x-p_n(A)g\Vert _{{\mathcal {H}}}^2. \end{aligned}$$

By assumption \(\Vert x-p_n(A)g\Vert _{{\mathcal {H}}}\xrightarrow {n\rightarrow \infty } 0\), whereas \(\Vert ({\mathbb {1}}+A^2)^{\frac{1}{2}}e^{-A^2/(nB_g)} \Vert _{\mathrm {op}}=\Vert (1+\lambda )^{1/2}e^{-\lambda ^2/(nB_g)} \Vert _{L^\infty ({\mathbb {R}}^+)} < + \infty \) diverges in the limit if A is unbounded. Up to passing to a convenient subsequence of the \(p_n\)’s, which we shall denote again by \((p_n)_{n\in {\mathbb {N}}}\), it is always possible to make the vanishing of \(\Vert x-p_n(A)g\Vert _{{\mathcal {H}}}\) sufficiently fast so as to compensate the divergence of \(\Vert ({\mathbb {1}}+A^2)^{\frac{1}{2}}e^{-A^2/(nB_g)}\Vert _{\mathrm {op}}\) and make their product eventually vanish.

Clearly the new sequence \((e^{-A^2/(nB_g)}p_n(A)g)_{n \in {\mathbb {N}}}\) has the expected \(\Vert \,\Vert _A\)-convergence to x, but a priori does not belong to \({\mathcal {K}}(A,g)\) any longer. Now we claim that there exists a monotone increasing sequence of integers \((N_n)_{n\in {\mathbb {N}}}\), with \(N_n\rightarrow \infty \) as \(n\rightarrow \infty \), such that the vectors

$$\begin{aligned} x_n := \sum _{k=0}^{N_n}\frac{(-1)^k}{\,k!(nB_g)^k}\, A^{2k} \, p_n(A)\,g \in {\mathcal {K}}(A,g) \end{aligned}$$

satisfy

whence also, combining (i) and (ii), the thesis \(x_n\xrightarrow []{\;\Vert \,\Vert _A\;}x\).

To prove the claim, for each \(n\in {\mathbb {N}}\) and \(\lambda \in {\mathbb {R}}\) we use the notation

$$\begin{aligned} p_n(\lambda ) = \sum _{\ell =0}^{D_n} a_\ell ^{(n)} \lambda ^\ell , \end{aligned}$$

where \(a_\ell ^{(n)}\in {\mathbb {C}}\) and \(D_n\in {\mathbb {N}}\) with \(a_{D_n}^{(n)}\ne 0\) (meaning that \(\mathrm {deg}\,p_n=D_n\)). Let us focus on the second summand in the r.h.s. of the identity

$$\begin{aligned} \big \Vert e^{-A^2/(nB_g)}p_n(A)g-x_n\big \Vert _A^2&= \big \Vert e^{-A^2/(nB_g)}p_n(A)g-x_n\big \Vert _{{\mathcal {H}}}^2 \\&\quad +\big \Vert A(e^{-A^2/(nB_g)}p_n(A)g-x_n)\big \Vert _{{\mathcal {H}}}^2 \end{aligned}$$

(the argument for the first summand is the very same), and let us re-write

$$\begin{aligned}&A\big (e^{-A^2/(nB_g)}p_n(A)g-x_n\big )\\&\quad = \sum _{\ell =0}^{D_n} a_\ell ^{(n)}\Big (e^{-A^2/(nB_g)}-\sum _{k=0}^{N_n}\frac{(-1)^k}{k!} \Big (\frac{A^2}{nB_g}\Big )^k \Big ) A^{\ell +1}g \end{aligned}$$

for some \(N_n\) to be determined.

Thus,

$$\begin{aligned}&\big \Vert A\big (e^{-A^2/(nB_g)}p_n(A)g-x_n\big )\big \Vert _{{\mathcal {H}}} \\&\quad \leqslant \sum _{\ell =0}^{D_n}\big |a_\ell ^{(n)}\big | \,\Big \Vert \Big (e^{-A^2/(nB_g)}-\sum _{k=0}^{N_n}\frac{(-1)^k}{k!} \Big (\frac{A^2}{nB_g}\Big )^k \Big ) A^{\ell +1}g\Big \Vert _{{\mathcal {H}}}. \end{aligned}$$

Now, for each \(\ell \) one has

$$\begin{aligned}&\Big \Vert \Big (e^{-A^2/(nB_g)}-\sum _{k=0}^{N_n}\frac{(-1)^k}{k!} \Big (\frac{A^2}{nB_g}\Big )^k \Big ) A^{\ell +1}g\Big \Vert _{{\mathcal {H}}}^2 \\&\quad = \int _{{\mathbb {R}}}\Big | e^{-\lambda ^2/(nB_g)} -\sum _{k=0}^{N_n}\frac{(-1)^k}{k!}\Big (\frac{\lambda ^2}{nB_g}\Big )^k \Big |^2 \lambda ^{2(\ell +1)}\,\mathrm{d}\mu _{g}^{(A)}(\lambda ) \\&\quad = \int _{{\mathbb {R}}}\Big | \sum _{k=N_n+1}^{\infty }\frac{(-1)^k}{k!} \Big (\frac{\lambda ^2}{nB_g}\Big )^k \Big |^2 \,\lambda ^{2(\ell +1)} \,\mathrm{d}\mu _{g}^{(A)}(\lambda ) \\&\quad \leqslant \int _{{\mathbb {R}}}\Big |\frac{1}{(N_n+1)!} \Big (\frac{\lambda ^2}{nB_g}\Big )^{N_n+1}\Big |^2\,\lambda ^{2(\ell +1)} \,\mathrm{d}\mu _{g}^{(A)}(\lambda ) \\&\quad = \frac{1}{((N_n+1)!\,(nB_g)^{N_n+1})^2}\,\Vert A^{N_n+\ell +2}g\Vert _{{\mathcal {H}}}^2 \\&\quad \leqslant \Big (\frac{B_g^{\ell +1}}{(N_n+1)!\,n^{N_n+1}}\Big )^2, \end{aligned}$$

where the last inequality above is due to assumption (7.1), whereas the previous inequality follows by a standard estimate of the remainder in Taylor’s formula.

Combining the last two estimates one finds

$$\begin{aligned} \big \Vert A\big (e^{-A^2/(nB_g)}p_n(A)g-x_n\big )\big \Vert _{{\mathcal {H}}} \leqslant \sum _{\ell =0}^{D_n}\big |a_\ell ^{(n)}\big | \,\frac{B_g^{\ell +1}}{(N_n+1)!\,n^{N_n+1}}. \end{aligned}$$

Each \(\ell \)-th term of the sum above depends on n. We can argue that there is a suitable sequence \((N_n)_{n\in {\mathbb {N}}}\) in \({\mathbb {N}}\), with \(N_n\rightarrow \infty \) as \(n\rightarrow \infty \), such that for every \(\ell \in {\mathbb {N}}_0\) and for every \(n\in {\mathbb {N}}\) with \(n\geqslant \max \{1,B_g\}\) one has

$$\begin{aligned} \big |a_\ell ^{(n)}\big |\,\frac{B_g^{\ell +1}}{(N_n+1)!\,n^{N_n+1}} \leqslant \frac{1}{n\,(\ell +1)^2}. \end{aligned}$$

Indeed, let \((N_n)_{n\in {\mathbb {N}}}\) be a generic monotone increasing and divergent sequence of natural numbers and let us refine it as follows. First, let us take a subsequence, for convenience denoted again by \((N_n)_{n\in {\mathbb {N}}}\), such that \(N_n\geqslant D_n\). Thus,

$$\begin{aligned} \big |a_\ell ^{(n)}\big |\,\frac{B_g^{\ell +1}}{(N_n+1)!\,n^{N_n+1}} \leqslant \frac{\big |a_\ell ^{(n)}\big |}{(N_n+1)!}\qquad \text {for }n \geqslant \max \{1,B_g\}. \end{aligned}$$

Next, for each fixed \(\ell \in {\mathbb {N}}_0\), proceeding from \(\ell =0\) and increasing \(\ell \) by one at each step, let us further refine \((N_n)_{n\in {\mathbb {N}}}\) to a subsequence, renamed \((N_n)_{n\in {\mathbb {N}}}\) again, such that \(N_n\geqslant n(\ell +1)^2|a_\ell ^{(n)}|-1\). Then

$$\begin{aligned} \frac{\big |a_\ell ^{(n)}\big |}{(N_n+1)!} \leqslant \frac{\big |a_\ell ^{(n)}\big |}{N_n+1} \leqslant \frac{1}{n(\ell +1)^2}\qquad \text {for }n\geqslant \max \{1,B_g\}. \end{aligned}$$

The above procedure amounts to a countable number of refinements of the original sequence \((N_n)_{n\in {\mathbb {N}}}\), and thus produces a final subsequence, still denoted by \((N_n)_{n\in {\mathbb {N}}}\), with the desired property, and for which therefore (for \(n\geqslant \max \{1,B_g\}\))

$$\begin{aligned} \big \Vert A\big (e^{-A^2/(nB_g)}p_n(A)g-x_n\big )\big \Vert _{{\mathcal {H}}}&\leqslant \sum _{\ell =0}^{D_n}\frac{1}{n(\ell +1)^2} \leqslant \frac{1}{n}\sum _{\ell =0}^\infty \frac{1}{\,(\ell +1)^2} \\&= \frac{\,\pi ^2}{6n}\;\xrightarrow []{\;n\rightarrow \infty \;}0. \end{aligned}$$

This establishes (ii) and concludes the proof. \(\square \)

What seems to suggest a special relevance of the assumption that g be a bounded vector for A is the fact that not only under such assumption is the conclusion of Theorem 7.1 valid, but also the corresponding Krylov space is naturally isomorphic to \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\).

Theorem 7.2

For a given Hilbert space \({\mathcal {H}}\) let A be a self-adjoint operator on \({\mathcal {H}}\) and let \(g\in {\mathcal {H}}\) be a bounded vector for A. Then \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\cong \overline{{\mathcal {K}}(A,g)}\) via the isomorphism \(h\mapsto h(A)g\).

In fact, Theorem 7.2 (that we shall prove in a moment) allows one to re-obtain Krylov solvability for the inverse problem (1.4) when A is self-adjoint and injective.

Corollary 7.3

For a given Hilbert space \({\mathcal {H}}\) let A be an injective and self-adjoint operator on \({\mathcal {H}}\) and let \(g\in \mathrm {ran} A\) be a bounded vector for A. Then the unique solution \(f\in {\mathcal {D}}(A)\) such that \(Af=g\) is a Krylov solution.

Proof

Let h be the measurable function defined by \(h(\lambda ):=\lambda ^{-1}\) for \(\lambda \in {\mathbb {R}}\). Since \(\Vert f\Vert _{{\mathcal {H}}}^2=\int _{\mathbb {R}}\lambda ^{-2}\mathrm{d}\mu _g^{(A)}(\lambda )\), then \(h\in L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\). Therefore, owing to Theorem 7.2, \(f=h(A)g\in \overline{{\mathcal {K}}(A,g)}\). \(\square \)

For the proof of Theorem 7.2 it is convenient to single out the following facts.

Lemma 7.4

Let \(\mu \) be a positive finite measure on \({\mathbb {R}}\). Then the space of \({\mathbb {R}}\rightarrow {\mathbb {C}}\) functions that are restrictions of \({\mathbb {C}}\rightarrow {\mathbb {C}}\) entire functions and are square-integrable with respect to \(\mu \) is dense in \(L^2({\mathbb {R}},\mathrm{d}\mu )\).

Proof

Let \(f\in L^2({\mathbb {R}},\mathrm{d}\mu )\). For every \(\varepsilon >0\) then there exists a continuous and \(L^2({\mathbb {R}},\mathrm{d}\mu )\)-function \(f_{\mathrm {c}}\) such that \(\Vert f-f_{\mathrm {c}}\Vert _{L^2({\mathbb {R}},\mathrm{d}\mu )}\leqslant \varepsilon \) (see, e.g., [24, Chapt. 3, Theorem 3.14]). In turn, by Carleman’s theorem (see, e.g., [13, Chapt. 4, §3, Theorem 1]), there exists an entire function \(f_{\mathrm {e}}\) such that \(|f_{\mathrm {c}}(\lambda )-f_{\mathrm {e}}(\lambda )|\leqslant {\mathcal {E}}(\lambda )\ \forall \lambda \in {\mathbb {R}}\), where \({\mathcal {E}}\) is an arbitrary error function. \({\mathcal {E}}\) can be therefore chosen so as \(\Vert f_{\mathrm {c}}-f_{\mathrm {e}}\Vert _{L^2({\mathbb {R}},\mathrm{d}\mu )}\leqslant \varepsilon \). This shows that \(f_{\mathrm {e}}\in L^2({\mathbb {R}},\mathrm{d}\mu )\) and \(\Vert f-f_{\mathrm {e}}\Vert _{L^2({\mathbb {R}},\mathrm{d}\mu )}\leqslant 2\varepsilon \). As \(\varepsilon \) is arbitrary, this completes the proof. \(\square \)

Lemma 7.5

For a given Hilbert space \({\mathcal {H}}\) let A be a self-adjoint operator on \({\mathcal {H}}\) and let \(g\in {\mathcal {H}}\) be a bounded vector for A. If \(f:{\mathbb {C}}\rightarrow {\mathbb {C}}\) is an entire function, then its restriction to the real line belongs to \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\) and \(g\in {\mathcal {D}}(f(A))\).

Proof

As f is entire, in particular \(f(\lambda )=\sum _{k=0}^\infty \frac{f^{(k)}(0)}{k!}\lambda ^k\) for every \(\lambda \in {\mathbb {R}}\), where the series converges point-wise for every \(\lambda \) and uniformly on compact subsets of \({\mathbb {R}}\). Thus, by functional calculus (see, e.g., [27, Prop. 4.12(v)]), for every \(N\in {\mathbb {N}}\)

$$\begin{aligned} \mathbf {1}_{[-N,N]}(A)f(A)g = \sum _{k=0}^\infty \frac{f^{(k)}(0)}{k!} (A\mathbf {1}_{[-N,N]}(A))^k g, \end{aligned}$$

where \(\mathbf {1}_\Omega \) denotes the characteristic function of \(\Omega \) and the series in the r.h.s. above converges in the norm of \({\mathcal {H}}\). Again by standard properties of the functional calculus one then has

$$\begin{aligned} \Big (\int _{-N}^N|f(\lambda )|^2\,\mathrm {d}\mu _g^{(A)}(\lambda )\Big )^{1/2}&= \big \Vert \mathbf {1}_{[-N,N]}(A)f(A)g\big \Vert _{{\mathcal {H}}} \\ {}&= \Big \Vert \sum _{k=0}^\infty \frac{f^{(k)}(0)}{k!}\,\mathbf {1}_{[-N,N]} (A)\,A^k g \Big \Vert _{{\mathcal {H}}} \\ {}&\leqslant \sum _{k=0}^\infty \Big |\frac{f^{(k)}(0)}{k!} \Big |\, \big \Vert \mathbf {1}_{[-N,N]}(A)\,A^k g\big \Vert _{{\mathcal {H}}} \\ {}&\leqslant \sum _{k=0}^\infty \Big |\frac{f^{(k)}(0)}{k!} \Big |\,\big \Vert A^k g\big \Vert _{{\mathcal {H}}} \\ {}&\leqslant M_R\sum _{k=0}^\infty \Big (\frac{B_g}{R}\Big )^k = M_R\big (1-B_g/R\big )^{-1}, \end{aligned}$$

where in the last inequality we used the bound \(\Vert A^k g\Vert _{{\mathcal {H}}}\leqslant B_g^k\) for some \(B_g>0\) and we applied Cauchy’s estimate \(|\frac{f^{(k)}(0)}{k!}|\leqslant M_R/R^k\) for \(R>B_g\) and \(M_R:=\max _{|z|=R}|f(x)|\). Taking the limit \(N\rightarrow \infty \) yields the thesis. \(\square \)

Lemma 7.6

For a given Hilbert space \({\mathcal {H}}\) let A be a self-adjoint operator on \({\mathcal {H}}\) and let \(g\in {\mathcal {H}}\) be a bounded vector for A. Let \(f:{\mathbb {C}}\rightarrow {\mathbb {C}}\) be an entire function and let \(f(z)=\sum _{k=0}^\infty \frac{f^{(k)}(0)}{k!} z^k\), \(z\in {\mathbb {C}}\), be its Taylor expansion. Then,

$$\begin{aligned} \lim _{n\rightarrow \infty }\Big \Vert f(A)g-\sum _{k=0}^n \frac{f^{(k)}(0)}{k!}\,A^k g\Big \Vert _{{\mathcal {H}}} = 0 \end{aligned}$$

and therefore \(f(A)g\in \overline{{\mathcal {K}}(A,g)}\).

Proof

Both f and \({\mathbb {C}}\ni z\mapsto \sum _{k=0}^n \frac{f^{(k)}(0)}{k!}\,z^k\) are entire functions for each \(n\in {\mathbb {N}}\), and so is their difference. Then, reasoning as in the proof of Lemma 7.5,

$$\begin{aligned} \Big \Vert f(A)g-\sum _{k=0}^n\frac{f^{(k)}(0)}{k!}\,A^k g\Big \Vert _{{\mathcal {H}}}&= \Big \Vert \sum _{k=n+1}^\infty \frac{f^{(k)}(0)}{k!}\,A^k g \Big \Vert _{{\mathcal {H}}} \\&\leqslant \sum _{k=n+1}^\infty \Big |\frac{f^{(k)}(0)}{k!} \Big |\,\big \Vert A^k g\big \Vert _{{\mathcal {H}}} \\&\leqslant M_R\sum _{k=n+1}^\infty \Big (\frac{B_g}{R}\Big )^k \\&= M_R(B_g/R)^{n+1}\big (1-B_g/R)^{-1}\;\xrightarrow []{n\rightarrow \infty }\;0, \end{aligned}$$

which completes the proof. \(\square \)

Let us finally prove Theorem 7.2.

Proof of Theorem 7.2

Let us denote by \(\mathscr {E}({\mathbb {R}})\) the space of \({\mathbb {R}}\rightarrow {\mathbb {C}}\) functions that are restrictions of \({\mathbb {C}}\rightarrow {\mathbb {C}}\) entire functions. Owing to Lemma 7.5, \(\mathscr {E}({\mathbb {R}})\subset L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\), and owing to Lemma 7.4, \(\mathscr {E}({\mathbb {R}})\) is actually dense in \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\). Moreover, for each \(h\in \mathscr {E}({\mathbb {R}})\), \(h(A)g\in \overline{{\mathcal {K}}(A,g)}\), as found in Lemma 7.6. As \(\Vert h(A)g\Vert _{{\mathcal {H}}}=\Vert h\Vert _{L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})}\), the map \(h\mapsto h(A)g\) is an isometry from \(\mathscr {E}({\mathbb {R}})\) to \(\overline{{\mathcal {K}}(A,g)}\), which extends canonically to a \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\rightarrow \overline{{\mathcal {K}}(A,g)}\) isometry. In fact, this map is also surjective and therefore is an isomorphism. Indeed, for a generic \(w\in \overline{{\mathcal {K}}(A,g)}\) there exists a sequence \((p_n)_{n\in {\mathbb {N}}}\) of polynomials such that \(p_n(A)g\rightarrow w\) in \({\mathcal {H}}\) as \(n\rightarrow \infty \), and moreover \((p_n)_{n\in {\mathbb {N}}}\) is a Cauchy sequence in \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\) because \(\Vert p_n-p_m\Vert _{L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})}=\Vert p_n(A)g-p_m(A)g\Vert _{{\mathcal {H}}}\) for any \(n,m\in {\mathbb {N}}\). Therefore \(p_n\rightarrow h\) in \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\) for some h and consequently \(w=\lim _{n\rightarrow \infty }p_n(A)g=h(A)g\). \(\square \)

Remark 7.7

The reasoning of the proof of Theorem 7.2 reveals that for generic \(g\in C^\infty (A)\), not necessarily bounded for A, the map \(h\mapsto h(A)g\) is an isomorphism

$$\begin{aligned} \overline{\{\,\text {polynomials }{\mathbb {R}} \rightarrow {\mathbb {C}}\,\}}^{\Vert \,\Vert _{L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})}} \cong \;\overline{{\mathcal {K}}(A,g)}. \end{aligned}$$
(7.3)

The assumption of A-boundedness for g was used to show, concerning the l.h.s. above, that polynomials are dense in \(L^2({\mathbb {R}},\mathrm{d}\mu _g^{(A)})\).