1 Introduction

Note: There are four pictures in this part and one picture in Part 2.

In 2017, we are celebrating the 100th anniversary of the birth of Tosio Kato (August 25, 1917–October 2, 1999). While there can be arguments as to which of his work is the deepest or most beautiful, there is no question that the most significant is his discovery, published in 1951, of the self-adjointness of the quantum mechanical Hamiltonian for atoms and molecules [314]. This is the founding document and Kato is the founding father of what has come to be called the theory of Schrödinger operators. So it seems appropriate to commemorate Kato with a comprehensive review of his work on non-relativistic quantum mechanics (NRQM) that includes the context and later impact of this work.

One might wonder why I date this field only from Kato’s 1951 paper. After all, quantum theory was invented in 1925–1926 as matrix mechanics in Göttingen (by Heisenberg, Born and Jordan) and as wave mechanics in Zürich (by Schrödinger) and within a few years, books appeared on the mathematical foundations of quantum mechanics by two of the greatest mathematicians of their generation: Hermann Weyl [681] (not coincidentally, in Zürich; indeed the connection between Weyl and Schrödinger was more than professional—Weyl had a passionate love affair with Schrödinger’s wife) and John von Neumann [664] (von Neumann, whose thesis had been in logic, went to Göttingen to work with Hilbert on that subject, but was swept up in the local enthusiasm for quantum theory, in response to which, he developed the spectral theory of unbounded self-adjoint operators and his foundational work). One should also mention the work of Bargmann and Wigner (prior to Kato, summarized in [579] with references) on quantum dynamics. I think of this earlier work as first level foundations and the theory of Schrödinger operators as second level. Another way of explaining the distinction is that the Weyl–von Neumann work is an analog of setting up a formalism for classical mechanics like the Hamiltonian or Lagrangian while the theory initiated by Kato is the analog of celestial mechanics—the application of the general framework to concrete systems.

figure a

When I began this project I decided to write about all of Kato’s major contributions to the field in a larger context and this turned into a much larger article than I originally planned. As such, it is a review of a significant fraction of the work of the last 65 years on the mathematics of NRQM. Two important areas only touched on or totally missing are N-body systems and the large N limit. Of course, Kato’s self-adjointness work includes N-body systems, and there are papers on bound states in Helium and on properties of many body eigenfunctions. As we’ll see, his theory of smooth perturbations applies to give a complete spectral analysis of certain N-body systems with only one scattering channel and is one tool in the study of general N-body systems. But there is much more to the N-body theory—for reviews, see [101, 116, 197, 212, 264]. Except for the 1972 work of Lieb–Simon on Thomas Fermi almost all the large N limit work is after 1980 when Kato mostly left the field; for recent reviews of different aspects of this subfield, see [51, 424, 425, 428, 429, 529, 551].

While this review will cover a huge array of work, it is important to realize it is only a fraction, albeit a substantial fraction, of Kato’s opus. I’d classify his work into four broad areas, NRQM, non-linear PDE’s, linear semigroup theory and miscellaneous contributions to functional analysis. We will not give references to all this work. The reader can get an (almost) complete bibliography from MathSciNet or, for papers up to 1987, the dedication of the special issue of JMAA on the occasion of Kato’s 70th birthday [122] has a bibliography.

Around 1980, one can detect a clear shift in Kato’s interest. Before 1980, the bulk of his papers are on NRQM with a sprinkling in the other three areas while after 1980, the bulk are on nonlinear equations with a sprinkling in the other areas including NRQM. Kato’s nonlinear work includes looking at the Euler, Navier–Stokes, KdV and nonlinear Schrödinger equations. He was a pioneer in existence results—we note that his famous 1951 paper can be viewed as a result on existence of solutions for the time dependent linear Schrödinger equation! It is almost that when NRQM became too crowded with workers drawn by his work, he moved to a new area which took some time to become popular. Terry Tao said of this work: the Kato smoothing effect for Schrödinger equations is fundamental to the modern theory of nonlinear Schrödinger equations, perhaps second only to the Strichartz estimates in importance...Kato developed a beautiful abstract (functional analytic) theory for local well posedness for evolution equations; it is not used directly too much these days because it often requires quite a bit more regularity than we would like, but I think it was influential in inspiring more modern approaches to local existence based on more sophisticated function space estimates.

And here is what Carlos Kenig told me: T. Kato played a pioneering role in the study of nonlinear evolution equations. He not only developed an abstract framework for their study, but also introduced the tools to study many fundamental nonlinear evolutions coming from mathematical physics. Some remarkable examples of this are: Kato’s introduction of the “local smoothing effect” in his pioneering study of the Korteweg–de Vries equation, which has played a key role in the development of the theory of nonlinear dispersive equations.

Kato’s unified proof of the global well-posedness of the Euler and Navier–Stokes equations in 2d, which led to the development of the Beale–Kato–Majda blow-up criterion for these equations. Kato’s works with Ponce on strong solutions of the Euler and Navier–Stokes equations, which developed the tools for the systematic application of fractional derivatives in the study of evolutions, which now completely permeates the subject. These contributions and many others, have left an indelible and enduring impact for the work of Kato on nonlinear evolutions.

The basic results on generators of semigroups on Banach spaces date back to the early 1950s going under the name Feller–Miyadera–Phillips and Hille–Yosida theorems (with a later 1961 paper of Lumer–Phillips). A basic book with references to this work is Pazy [475]. This is a subject that Kato returned to often, especially in the 1960s. Pazy [475] lists 19 papers by Kato on the subject. There is overlap with the NRQM work and the semigroup work. Perhaps the most important of these results are the Trotter–Kato theorems (discussed below briefly after Theorem 3.7) and the definition of fractional powers for generators of (not necessarily self-adjoint) semigroups. There are also connections between quantum statistical mechanics and contraction semigroup on operator algebras. To keep this review within bounds, we will not discuss this work.

The fourth area is a catchall for a variety of results that don’t fit into the other bins. Among these results is an improvement of the celebrated Calderón–Vaillancourt bounds on pseudo-differential operators [346]. In [342], Kato proved the absolute value for operators is not Lipschitz continuous even restricted to the self-adjoint operators but for any pair of bounded, even non-self-adjoint, operators one has that

$$\begin{aligned} |||S|-|T||| \le \frac{2}{\pi } ||S-T||\left( 2+\log \frac{||S||+||T||}{||S-T||}\right) \end{aligned}$$
(1.1)

(I don’t think there is any significance to the fact that the constant is the same as in (10.31)).

The last of these miscellaneous things that we’ll discuss (but far from the last of the miscellaneous results) involves what has come to be called the Heinz–Loewner inequality. In 1951, Heinz [225] proved for positive operators, AB on a Hilbert space, one has that \(A \le B \Rightarrow \sqrt{A} \le \sqrt{B}\). Heinz was a student of Rellich and Kato was paying attention to the work of Rellich’s group and a year later published a paper [319] with an elegant, simple proof and extended the result to \(A \mapsto A^s\) for \(0< s <1\) replacing the square root. Neither of them knew at the time that Loewner [436] had already proven a much more general result in 1934! Despite the 17 year priority, the monotonicity of the square root is called variably, the Heinz inequality, the Heinz–Loewner inequality or even, sometimes, the Heinz–Kato inequality. Heinz and Kato found equivalent results to the monotonicity of the square root (one paper with lots of additional equivalent forms is [180]). In particular, the following equivalent form is almost universally known as the Heinz–Kato inequality.

$$\begin{aligned} ||T\varphi || \le ||A\varphi ||, \quad ||T^*\psi || \le ||B\psi || \Rightarrow |\langle \psi ,T\varphi \rangle | \le ||A^s \varphi || ||B^{1-s}\psi || \end{aligned}$$
(1.2)

Kato returned several times to this subject, most notably [333] finding a version of the Heinz–Loewner inequality (with an extra constant depending on s) for maximal accretive operators on a Hilbert space.

Returning to the timing of Kato’s fundamental 1951 paper [314], I note that he was 34 when it was published (it was submitted a few years earlier as we’ll discuss in Sect. 7). Before it, his most important work was his thesis, awarded in 1951 and published in 1949–1951. One might be surprised at his age when this work was published but not if one understands the impact of the war. Kato got his BS from the University of Tokyo in 1941, a year in which he published two (not mathematical) papers in theoretical physics. But during the war, he was evacuated to the countryside. We were at a conference together one evening and Kato described rather harrowing experiences in the camp he was assigned to, especially an evacuation of the camp down a steep wet hill. He contracted TB in the camp. In his acceptance for the Wiener Prize [1], Kato says that his work on essential self-adjointness and on perturbation theory were essentially complete “by the end of the war.” Recently, several of Kato’s notebook were discovered dated 1945 that contain most of results published in Kato [314, 316] sometimes with different proofs from the later publications (these notes have recently been edited for publication in [358]).

figure b

In 1946, Kato returned to the University of Tokyo as an Assistant (a position common for students progressing towards their degrees) in physics, was appointed Assistant Professor of Physics in 1951 and full professor in 1958. I’ve sometimes wondered what his colleagues in physics made of him. He was perhaps influenced by the distinguished Japanese algebraic geometer, Kunihiko Kodaira (1915–1997) 2 years his senior and a 1954 Fields medalist. Kodaira got a BS in physics after his BA in mathematics and was given a joint appointment in 1944, so there was clearly some sympathy towards pure mathematics in the physics department. In 1948, Kato and Kodaira wrote a 2 page note [360] to a physics journal whose point was that every \(L^2\) wave function was acceptable for quantum mechanics, something about which there was confusion in the physics literature.

Beginning in 1954, Kato started visiting the United States. This bland statement masks some drama. In 1954, Kato was invited to visit Berkeley for a year, I presume arranged by F. Wolf. Of course, Kato needed a visa and it is likely it would have been denied due to his history of TB. Fortunately, just at the time (and only for a period of about a year), the scientific attaché at the US embassy in Tokyo was Otto Laporte (1902–1971) on leave from a professorship in Physics at the University of Michigan. Charles Dolph (1919–1994), a mathematician at Michigan, learned of the problem and contacted Laporte who intervened to get Kato a visa. Dolph once told me that he thought his most important contribution to American mathematics was his helping to allow Kato to come to the US. In 1987, in honor of Kato’s 70th birthday, there was a special issue of the Journal of Mathematical Analysis and Applications and the issue was jointly dedicated [122] to Laporte (he passed away in 1971) and Kato and edited by Dolph and Kato’s student Jim Howland.

During the mid 1950s, Kato spent close to 3 years visiting US institutions, mainly Berkeley, but also the Courant Institute, American University, National Bureau of Standards and Caltech. In 1962, he accepted a professorship in Mathematics from Berkeley where he spent the rest of his career and remained after his retirement. One should not underestimate the courage it takes for a 45 year old to move to a very different culture because of a scientific opportunity. That said, I’m told that when he retired and some of his students urged him to live in Japan, he said he liked the weather in Northern California too much to consider it. The reader can consult the Mathematics Genealogy Project (http://www.genealogy.ams.org/id.php?id=32842) for a list of Kato’s students (24 listed there, 3 from Tokyo and 21 from Berkeley; the best known are Ikebe and Kuroda from Tokyo and Balslev and Howland from Berkeley) and [98] for a memorial article with lots of reminisces of Kato.

One can get a feel for Kato’s impact by considering the number of theorems, theories and inequalities with his name on them. Here are some: Kato’s theorem (which usually refers to his result on self-adjointness of atomic Hamiltonians), the Kato–Rellich theorem (which Rellich had first), the Kato–Rosenblum theorem and the Kato–Birman theory (where Kato had the most significant results although, as we’ll see, Rosenblum should get more credit than he does), the Kato projection lemma and Kato dynamics (used in the adiabatic theorem), the Putnam–Kato theorem, the Trotter–Kato theorem (which is used for three results; see Sect. 3), the Kato cusp condition (see Sect. 19 in Part 2), Kato smoothness theory, the Kato class of potentials and Kato–Kuroda eigenfunction expansions. To me Kato’s inequality refers to the self-adjointness technique discussed in Sect. 9, but the term has also been used for the Hardy like inequality with best constant for \(r^{-1}\) in three dimensions (which we discuss in Sect. 10), for a result on hyponormal operators that follows from Kato smoothness theory (the book [441] has a section called “Kato’s inequality” on it) and for the above mentioned variant of the Heinz–Loewner inequality for maximal accretive operators. There are also Heinz–Kato, Ponce–Kato and Kato–Temple inequalities. In [550], Erhard Seiler and I proved that if \(f,g \in L^p({\mathbb {R}}^\nu ),\, p\ge 2\), then \(f(X)g(-i\nabla )\) is in the trace ideal \({\mathcal {I}}_p\). At the time, Kato and I had correspondence about the issue and about some results for \(p<2\). In [496], Reed and I mentioned that Kato had this result independently. Although Kato never published anything on the subject, in recent times, it has come to be called the Kato–Seiler–Simon inequality.

Of course, when discussing the impact of Kato’s work, one must emphasize the importance of his book Perturbation Theory for Linear Operators [345] which has been a bible for several generations of mathematicians. One of its virtues is its comprehensive nature. Percy Deift told me that Peter Lax told him that Friedrichs remarked on the book: “Oh, its easy to write a book when you put everything in it!”

We will not discuss every piece of work that Kato did in NRQM—for example, he wrote several papers on variational bounds on scattering phase shifts whose lasting impact was limited. And we will discuss Kato’s work on the definition of a self-adjoint Dirac Hamiltonian which of course isn’t non-relativistic. It is closely related to the Schrödinger work and so belongs here. Perhaps I should have dropped “non-relativistic” from the title but since almost all of Kato’s work on quantum theory is non-relativistic and even the Dirac stuff is not quantum field theory, I decided to leave it.

Roughly speaking, this article is in five parts. Sections 26 discuss eigenvalue perturbation theory in both the analytic (where many of his results were rediscoveries of results of Rellich and Sz-Nagy) and asymptotic (where he was the pioneer). There is a section on situations where either an eigenvalue is initially embedded in continuous spectrum or where as soon the perturbation is turned on the location of the spectrum is swamped by continuous spectrum (i.e. on the theory of QM resonances). There are a pair of sections on two issues that Kato studied in connection with eigenvalue perturbation theory: pairs of projections and on the Temple–Kato inequalities.

Next come four sections on self-adjointness. One focuses on the Kato–Rellich theorem and its applications to atomic physics, one on his work with Ikebe and one on what has come to be called Kato’s inequality. Finally his work on quadratic forms is discussed including his work on monotone convergence for forms. That will end Part 1.

Part 2 begins with two pioneering works on aspects of bound states—his result on non-existence of positive energy bound states in certain two body systems and his paper on the infinity of bound states for Helium, at least for infinite nuclear mass.

Next four sections on scattering and spectral theory which discuss the Kato–Birman theory (trace class scattering), Kato smoothness, Kato–Kuroda eigenfunction expansions and the Jensen–Kato paper on threshold behavior.

Last is a set of three miscellaneous gems: his work on the adiabatic theorem, on the Trotter product formula and his pioneering look at eigenfunction regularity.

I should warn the reader that I use two conventions that are universal among physicists but often the opposite of many mathematicians. First, my (complex) Hilbert space inner product \(\langle \varphi ,\psi \rangle \) is linear in \(\psi \) and anti-linear in \(\varphi \). Secondly my wave operators are defined by (note ± vs. \({\mp }\))

$$\begin{aligned} \Omega ^{\pm }(A,B) = \text {s}-\lim _{t \rightarrow {\mp } \infty } e^{itA}e^{-itB}P_{ac}(B) \end{aligned}$$

In Sect. 15 in Part 2, I’ll explain the historical reason for this very strange convention. I should also warn the reader that I use two non-standard abbreviations “esa” and “esa-\(\nu \)” (where \(\nu \) can be an explicit integer. They are defined at the start of in Sect. 7).

With apologies to those inadvertently left out, I’d like to thank a number of people for useful information Yosi Avron, Jan Dereziński, Pavel Exner, Rupert Frank, Fritz Gesztesy, Gian Michele Graf, Sandro Graffi, Vincenzo Grecchi, Evans Harrell, Ira Herbst, Bernard Helffer, Arne Jensen, Carlos Kenig, Toshi Kuroda, Peter Lax, Hiroshi Oguri, Sasha Pushnitski, Derek Robinson, Robert Seiringer, Heinz Siedentop, Israel Michael Sigal, Erik Skibsted, Terry Tao, Dimitri Yafaev and Kenji Yajima. The pictures here are all from the estate of Mizue Kato, Tosio’s wife who passed away in 2011. Her will gave control of the pictures to H. Fujita, M. Ishiguro and S. T. Kuroda. I thank them for permission to use the pictures and H. Okamoto for providing digital versions.

2 Eigenvalue perturbation theory, I: regular perturbations

This is the first of five sections on eigenvalue perturbation theory; this section deals with the analytic case. Section 3 begins with examples that delimit some of the possibilities when the analytic theory doesn’t apply and that section and the next discuss two sets of those examples after which there are two sections on related mathematical issues which are connected to the subject and where Kato made important contributions.

Eigenvalue perturbation theory in the case where the eigenvalues are analytic (aka regular perturbation theory or analytic perturbation theory) is central to Kato’s opus—it is both a main topic of his famous book on Perturbation Theory and the main subject of his thesis. We’ll begin this section by sketching the modern theory as presented in Kato’s book [345] or as sketched in Simon [616, Sections 1.4 and 2.3] (other book presentations include Baumgärtel [44], Friedrichs [174], Reed–Simon [497] and Rellich [511]). Then we’ll give a Kato–centric discussion of the history.

As a preliminary, we want to recall the theory of spectral projections for general bounded operators, A, on a Banach space, X. If the spectrum of A, \(\sigma (A)=\sigma _1\cup \sigma _2\) is a decomposition into disjoint closed sets, one can find a chain (finite sum and/or difference of contours), \(\Gamma \), so that if \(w(z,\Gamma )\) is the winding number about \(z \notin \Gamma \), (i.e. \(w(z,\Gamma ) = (2\pi i)^{-1} \oint _{\zeta \in \Gamma } (\zeta -z)^{-1} d\zeta \)), then \(\Gamma \cap \sigma (A) = \emptyset \), \(w(z,\Gamma )=0\) or 1 for all \(z \in {\mathbb {C}}{\setminus }\Gamma \), \(w(z,\Gamma )=1\) for \(z \in \sigma _1\), and \(w(z,\Gamma )=0\) for \(z \in \sigma _2\) (see [613, Section 4.4]).

One defines an operator

$$\begin{aligned} P_{\sigma _1} = \frac{1}{2\pi i}\oint _\Gamma \frac{dz}{z-A} \end{aligned}$$
(2.1)

Then one can prove [616, Section 2.3] that \(P_{\sigma _1}\) is a projection (i.e. \(P_{\sigma _1}^2=P_{\sigma _1}\)) commuting with A. Thus A maps each of \({\mathrm{ran}}\, P_{\sigma _1}\) and \({\mathrm{ran}}({\varvec{1}}-P_{\sigma _1})\) onto themselves and one can prove that

(2.2)

Of particular interest are isolated points, \(\lambda \), of \(\sigma (A)\) in which case one can consider \(\sigma _1 = \{\lambda \}, \, \sigma _2 = \sigma (A){\setminus }\{\lambda \}\). We write \(P_{\sigma _1} = P_\lambda \) and \({\mathcal {H}}_\lambda ={\mathrm{ran}}\, P_\lambda \). If \(\dim {\mathcal {H}}_\lambda < \infty \), we call \(\lambda \) a point of the discrete spectrum. In that case, it is known there is a nilpotent, \(N_\lambda \), with \(P_\lambda N_\lambda = N_\lambda P_\lambda = N_\lambda \) (and so ) so that

$$\begin{aligned} AP_\lambda = \lambda P_\lambda + N_\lambda \end{aligned}$$
(2.3)

In particular, this implies that \(\lambda \) is an eigenvalue. The \(P_\lambda \) are called eigenprojections and the \(N_\lambda \) are called eigennilpotents. Just as the \(P_\lambda \) are first order residues of the poles of \((z-A)^{-1}\) at \(z=\lambda \), the \(N_\lambda \) are second order residues (and \(N_\lambda ^k\) is the \((z-\lambda )^{-k-1}\) residue)—see [616, Section 2.3] for more on the subject.

Kato’s book [345] is the standard reference for this beautiful complex analysis approach to Jordan normal forms whose roots go back further. In 1913, Riesz [516], in one of the first books on operator theory on infinite dimensional spaces, mentioned residues of poles of \((z-A)^{-1}\) could be studied and, in 1930, he noted [517] in the Hilbert space case that decompositions of the spectrum into disjoint closed sets induced a decomposition of the space. Nagumo [452] used (2.1) for Banach algebras in 1930. Gel’fand’s great 1941 paper [184] discussed functions, f, analytic in a neighborhood of \(\sigma (x)\) where \(x \in {\mathfrak {A}}\), a commutative Banach algebra with unit and defined

$$\begin{aligned} f(x) = \frac{1}{2\pi i}\oint _\Gamma \frac{f(z)}{z-x} dz \end{aligned}$$
(2.4)

where \(\Gamma \) surrounds the whole spectrum.

If \(\sigma _1 \cup \sigma _2\) is a decomposition, f can be taken to be 1 in a neighborhood of \(\sigma _1\) and 0 in a neighborhood of \(\sigma _2\). \(P_\lambda ^2 = P_\lambda \) is then a special case of his functional calculus result \((fg)(x)=f(x)g(x)\). In 1942–1943, this functional calculus was further developed in the United States by Dunford [125, 126], Lorch [437] and Taylor [636]. In his book, Kato calls (2.4) a Dunford–Taylor integral.

With this formalism out of the way, we can turn to sketch the theory of regular perturbations. For details see the book presentations of Kato [345, Chaps. II and VII], Reed–Simon [497, Chap XII] and Simon [616, Sections 1.4 and 2.3].

Step 1. Finite Dimensional Theory. Let \(A(\beta )\) be an analytic family of \(n \times n\) matrices for \(\beta \in \Omega \), a domain in \({\mathbb {C}}\). The eigenvalues are solutions of

$$\begin{aligned} \det (A(\beta )-\lambda ) = 0 \end{aligned}$$
(2.5)

so algebroidal functions. The theory of such functions (see Knopp [378] or Simon [613, Section 3.5]) implies there is a discrete set of points \(S \subset \Omega \) (i.e. with no limit points in \(\Omega \)) so that all solutions of (2.5) are multivalued analytic functions on \(\Omega {\setminus } S\) and so that the number of distinct solutions and their multiplicities are constant on \(\Omega {\setminus } S\). At points of S, the solutions have finite limits and are locally given by all the branches of one or more locally convergent Puiseux series (power series in \((\beta -\beta _0)^{1/p}\) for some \(p \in {\mathbb {Z}}_+\)). From the integral formula (2.1) and its analog for \(N_\lambda \), one sees that the eigenprojections and eigennilpotents are also multivalued analytic functions on \(\Omega {\setminus } S\). They can have polar singularities at points in S, i.e. their Puiseux–Laurent series can have finitely many negative index terms. Indeed, in 1959, Butler [77] proved that if some \(\lambda (\beta )\) has a fractional power at a point \(\beta _0 \in S\), then the Puiseux–Laurent series for \(P(\beta )\) must have non-vanishing negative powers.

The set of early significant results include two theorems of Rellich [504,505,506,507,508, Part I]. If \(A(\beta )\) is self-adjoint (i.e. \(\Omega \) is invariant under complex conjugations and \(A(\bar{\beta })=A(\beta )^*)\), then \(\lambda (\beta )\) and \(P(\beta )\) are real analytic on \(\Omega \cap {\mathbb {R}}\), i.e. no fractional powers in \(\lambda (\beta )\) at points of \(S \cap {\mathbb {R}}\) and no polar singularities of \(P(\beta )\) there. The first comes from the fact that if a Puiseux series based at \(\beta _0 \in {\mathbb {R}}\) has a non-trivial fractional power term, then some branch must have non-real values for some real values of \(\beta \) near \(\beta _0\) (interestingly enough, in his book, Kato [345] appeals to Butler’s theorem instead of using this simple argument of Rellich). The second relies on the fact that if \(P(\beta )\) has polar terms at \(\beta _0\), since there are only finitely many negative index terms, one has that \(\lim _{|\beta -\beta _0| \downarrow 0} ||P(\beta )|| = \infty \) which is inconsistent with the fact that spectral projections for self-adjoint matrices are self-adjoint, so with norm 1.

For later purposes, we want to note the two leading terms in the perturbations series

$$\begin{aligned} E(\beta ) = E_0 + a_1\beta +a_2\beta ^2+\text {O}(\beta ^3) \end{aligned}$$
(2.6)

of a simple eigenvalue, \(E_0\), of \(A+\beta B\) with A and B Hermitian. Suppose \(\{\varphi _j\}_{j=0}^{n-1}\) are an orthonormal basis of eigenvectors of A with \(A\varphi _j = E_j\varphi _j\). Then

$$\begin{aligned} a_1 = \langle \varphi _0,B\varphi _0 \rangle , \qquad a_2 = \sum _{j\ne 0}\frac{|\langle \varphi _j,B\varphi _0 \rangle |^2}{E_0-E_j} \end{aligned}$$
(2.7)

One of Kato’s contributions is to describe \(a_2\) succinctly in the general infinite dimensional case where \(E_0\) is discrete but A may have continuous spectrum. Let P be the projection onto multiples of \(\varphi _0\). Define the reduced resolvent, S, of A at \(E_0\) by

$$\begin{aligned} S = (A-E_0)^{-1}(1-P) \end{aligned}$$
(2.8)

i.e. \(S\varphi _0=0\) and \(S\psi =\lim _{\epsilon \rightarrow 0; \epsilon \ne 0}(A-E_0-\epsilon )^{-1}\psi \) if \(\psi \perp \varphi _0\). Thus for any \(\eta \):

$$\begin{aligned} (A-E_0)S\eta = (1-P)\eta \end{aligned}$$
(2.9)

In his thesis, Kato [316] realized that \(a_2\) could be written

$$\begin{aligned} a_2 = -\langle \varphi _0,BSB\varphi _0 \rangle \end{aligned}$$
(2.10)

Step 2. Bounded Analytic Operator Valued Functions. For \(A(\beta )\), a function from a domain \(\Omega \subset {\mathbb {C}}\) to the bounded operators on a Banach space, X, we say that A is analytic at \(\beta _0 \in \Omega \) if it is given by a convergent power series near \(\beta _0\). This is equivalent to A having a complex Fréchet derivative or to \(A(\beta )x\) being a Banach space valued analytic function for all \(x \in X\) or to \(\ell (A(\beta )x)\) being a scalar analytic function for all \(\ell \in X^*\) and \(x \in X\) (see [613, Theorem 3.1.12]).

Step 3. Analytic Resolvents and Spectral Projections. Because the set of invertible maps in \({\mathcal {L}}(X)\) is open and on that set, \(A \mapsto A^{-1}\) is analytic (by using geometric series), if \(A(\beta )\) is an analytic operator valued functions, then \({\mathcal {R}}\equiv \{(\beta ,z) \,|\, \beta \in \Omega , z \in {\mathbb {C}}, A(\beta )-z{\varvec{1}}\text { is invertible}\}\) is open in \(\Omega \times {\mathbb {C}}\) and the resolvent \((A(\beta )-z)^{-1}\) is analytic there. It follows that if \(\lambda _0\) is an isolated point of the spectrum of \(A(\beta _0)\), then there are \(\epsilon , \delta \) so that for \(|\beta -\beta _0| < \epsilon \) and \(|z-\lambda _0| = \delta \), we have that \((\beta ,z) \in {\mathcal {R}}\) and moreover that \(\sigma (A(\beta _0)) \cap \{z\,|\, |z-\lambda _0| \le \delta \} = \{\lambda _0\}\). We can thus use (2.1) to define projections \(P(\beta )\) for \(|\beta -\beta _0| < \epsilon \) so that \(A(\beta )P(\beta ) = P(\beta )A(\beta )\) and . \(P(\beta )\) is analytic in \(\beta \), so, by shrinking \(\epsilon \) if need be, we can suppose that

$$\begin{aligned} |\beta -\beta _0|< \epsilon \Rightarrow ||P(\beta )-P(\beta _0)|| < 1 \end{aligned}$$
(2.11)

Step 4. Reduction to a finite dimensional problem. A basic fact that we’ll prove in Sect. 5 (see Theorem 5.1) is that when (2.11) holds, we can define an invertible map \(U(\beta )\) for \(|\beta -\beta _0| < \epsilon \) analytic in \(\beta \) so that

$$\begin{aligned} U(\beta )P(\beta )U(\beta )^{-1} = P(\beta _0) \end{aligned}$$
(2.12)

Moreover, if X is a Hilbert space and \(P(\beta )\) is self-adjoint for \(|\beta -\beta _0| < 1\) and \(\text {Im}\,(\beta -\beta _0) = 0\), then \(U(\beta )\) is unitary for such \(\beta \).

Because of (2.12), \(\widetilde{A}(\beta ) \equiv U(\beta )A(\beta )U(\beta )^{-1}\) leaves \({\mathrm{ran}}\, P(\beta _0)\) invariant and

(2.13)

If now \(\lambda _0\) is a point of the discrete spectrum of \(A(\beta _0)\), then \(P(\beta _0)\) is finite dimensional, so is a finite dimensional problem and all the results of Step 1 apply. Moreover, if X is a Hilbert space and \(A(\beta )\) is self-adjoint for \(\beta \) real, then so is \(\widetilde{A}(\beta )\) and Rellich’s Theorems extend. Note that even if \(A(\beta )\) is linear in \(\beta \), \(\widetilde{A}(\beta )\) will not even be polynomial in \(\beta \) so it is important that step 1 be done for general analytic families.

Step 5 Regular Families of Closed Operators. For \(\beta \in \Omega \), a domain, we consider a family, \(A(\beta )\) of closed, densely defined (but not necessarily bounded) operators on a Banach space, X. We say that A is a regular family if, for every \(\beta _0 \in \Omega \), there is a \(z_0 \in {\mathbb {C}}\) and \(\epsilon >0\) so that for \(|\beta -\beta _0| < \epsilon \), we have that \(z_0 \notin \sigma (A(\beta ))\) and \(\beta \mapsto (A(\beta )-z_0)^{-1}\) is a bounded analytic function near \(\beta _0\). Kato [345, Section VII.1.2] has a more general definition that applies even to closed operators between two Banach spaces X and Y but he proves that it is equivalent to the above definition so long as \(X=Y\) and every \(A(\beta )\) has a non-empty resolvent set (which is no restriction if you want to consider isolated eigenvalues).

With this definition, all the eigenvalue perturbation theory for the bounded case carries over since \(\lambda _0\) is a discrete eigenvalue of \(A(\beta _0)\) if and only if \((\lambda _0-z_0)^{-1}\) is a discrete eigenvalue of \((A(\beta _0)-z_0)^{-1}\).

Step 6 Criteria for Regular Families. A type (A) family is a function, \(A(\beta )\), for \(\beta \in \Omega \), a region in \({\mathbb {C}}\), so that \(A(\beta )\) is a closed, densely defined operator on a Banach space, X, with domain \(D(A(\beta )) = {\mathcal {D}}\) independent of \(\beta \) and so that for all \(\varphi \in {\mathcal {D}}\) we have that \(\beta \mapsto A(\beta )\varphi \) is an analytic vector valued function. If \(A(\beta _0)\) has non-empty resolvent set, it is easy to see that \(A(\beta )\) is a regular family for \(\beta \) near \(\beta _0\). In particular, if the resolvent set is non-empty for all \(\beta \in \Omega \), then \(A(\beta )\) is a regular family on \(\Omega \).

Of particular interest is the case where \(A(\beta ) = A_0+\beta B\) where \({\mathcal {D}}=D(A_0)\) and B is an operator with \({\mathcal {D}}\subset D(B)\). Then \(A(\beta )\) is closed for all \(\beta \) small if only if there are \(a, b > 0\) so that for all \(\varphi \in {\mathcal {D}}\), one has that

$$\begin{aligned} ||B\varphi || \le a||A_0\varphi || + b||\varphi || \end{aligned}$$
(2.14)

Thus (2.14) is a necessary and sufficient condition for a linear \(A(\beta )\) to be an analytic family of type (A) near \(\beta =0\).

If a bound like (2.14) holds, we say that B is A-bounded. The relative bound is the \(\inf \) over all a for which (2.14) holds (typically, if \(a_0\) is this \(\inf \), the bound only holds for \(a > a_0\) and the corresponding b’s go to \(\infty \) as \(a \downarrow a_0\)). There exist unbounded B for which the relative bound is 0. There are similar bounds for general analytic families of type (A): \(A(\beta ) = A +\sum _{n=1}^{\infty } \beta ^n B_n\) and \(B_n\) obeys \(D(B_n) \supset D(A)\) and for some abc and all \(\varphi \in D(A)\) one has that

$$\begin{aligned} ||B_n\varphi || \le c^{n-1} (a||A\varphi ||+b||\varphi ||) \end{aligned}$$
(2.15)

There is also a notion of type(B) families on Hilbert space (due to Kato [345]) where one demands that \(A(\beta )\) be m-accretive with \(\beta \) independent form domain.

Example 2.1

(1 / Z expansion) A simple example of regular perturbation theory of physical interest concerns two electron ions which in the limit of infinite nuclear mass (ignoring relativistic and spin corrections) is described by

$$\begin{aligned} H(Z) = -\Delta _1-\Delta _2 - \frac{Z}{r_1}-\frac{Z}{r_2} + \frac{1}{|\varvec{r_1}-\varvec{r_2}|} \end{aligned}$$
(2.16)

on \(L^2({\mathbb {R}}^6,d^3\varvec{r_1} d^3\varvec{r_2})\). Under a scale transformation \(Z^{-2}H(Z)\) is unitarily equivalent to

$$\begin{aligned} A(1/Z) = -\Delta _1-\Delta _2 - \frac{1}{r_1}-\frac{1}{r_2} + \frac{1}{Z|\varvec{r_1}-\varvec{r_2}|} \end{aligned}$$
(2.17)

This is an entire family of type (A) in 1 / Z. At \(1/Z = 0\), the ground state energy is \(E_0(0) = -\tfrac{1}{2}\). For all Z, the HVZ theorem ([497, Section XIII.5]) implies that the continuous spectrum of A(1 / Z) is \([-\tfrac{1}{4},\infty )\).

Kato was concerned with rigorous estimates on the radius of convergence, \(\rho \), of the power series for \(E_0(1/Z)\). He discussed this in his thesis and, in his book [345, Section VII.4.9], was able to show that \(\rho > 0.24\) and he noted that this didn’t cover the physically important cases \(1/Z = 1/2\), i.e, Helium (\(Z=2\)). In fact the case \(1/Z=1\) is also important because it describes the \(H^-\) ion which is known to exist.

There has been considerable physical literature on this example. Stillinger [628] found numerically that the perturbation coefficients (not found numerically using perturbation theory but by fitting variationally calculated eigenvalues) are eventually all positive, so there is a singularity on the positive real axis at \(\rho \). As \(\beta = 1/Z\) increases, \(E(\beta )\) is monotone increasing and known to be real analytic at least until E reaches the bottom of the continuous spectrum, \(-\tfrac{1}{4}\), at \(\beta =\beta _c\). Since \(H^-\) exists, \(\beta _c > 1\). The best current numerical estimate [145] suggests that \(\rho = \beta _c\) and

$$\begin{aligned} \beta _c = 1.09766083373855980(5) \end{aligned}$$

It is known [251] (see [159, 163, 206] for improved results) that at \(\beta =\beta _c\), \(A(\beta )\) has an eigenvalue at \(E(\beta _c) = -\tfrac{1}{4}\). It would be interesting to understand the nature of the singularity at \(\beta =\beta _c\), e.g. is there a convergent Puiseux series?

This completes our discussion of the theory of eigenvalue perturbation theory so we turn to some remarks on its history. Eigenvalue perturbation theory goes back to fundamental work of Lord Rayleigh on sound waves in 1897 [492, pp. 115–118] and [493] and by Schrödinger at the dawn of (new) quantum mechanics [544] and is often called Rayleigh–Schrödinger perturbation theory.

The first substantial rigorous mathematical work on the subject is a five part series of papers by Rellich [504,505,506,507,508] published from 1937 to 1942. It included an exhaustive treatment of the finite dimensional case including what we called Rellich’s Theorems on the lack of singularities in the self-adjoint case. He also noted the simple example:

$$\begin{aligned} A(\beta ,\gamma ) = \left( \begin{array}{cc} \beta &{} \gamma \\ \gamma &{} -\beta \\ \end{array} \right) \end{aligned}$$
(2.18)

with eigenvalues \(\pm \sqrt{\beta ^2+\gamma ^2}\) which shows that his analyticity results for the self-adjoint case do not extend to more than one variable. He also considered the infinite dimensional case where (2.14) holds (A self-adjoint and B symmetric) and (2.15) appeared in his papers. His papers did not use spectral projections but rather some brute force calculations.

Sz-Nagy followed up Rellich’s work in two papers published in 1947 and 1951 [454, 455] in which he treated the self-adjoint Hilbert space case and general closed operators on Banach spaces respectively. The first paper had a 1942 Hungarian language version [453]. He defined type (A) perturbations via (2.15). His main advance is to exploit the definition of spectral projections via (2.1). As a student of F. Riesz, this is not surprising. This was also the first place that it was proven (in the Hilbert space case) that two orthogonal projections, P and Q with \(||P-Q|| < 1\) are related via \(Q=UPU^{-1}\) for a unitary which is analytic function of Q, i.e. he implemented Step 4 above.

Wolf [689] also extended the Nagy approach to the Banach space case is 1952. Perhaps the most significant aspect of this work is that it served eventually to introduce Kato to Wolf for Wolf was a Professor at Berkeley who was essential to recruiting Kato to come to Berkeley both in 1954 and 1962.

František Wolf (1904–1989) was a Czech mathematician who had a junior position at Charles University in Prague. Wolf had spent time in Cambridge and did some significant work on trigonometric series under the influence of Littlewood. When the Germans invaded Czechoslovakia in March 1938, he was able to get an invitation to Mittag–Leffler. He got permission from the Germans for a 3 weeks visa but stayed in Sweden! He was then able to get an instructorship at Macalester College in Minnesota. He made what turned out to be a fateful decision in terms of later developments. Because travel across the Atlantic was difficult, he took the trans-Siberian railroad across the Soviet Union and then through Japan and across the Pacific to the US. This was mid-1941 before the US entered the war and made travel across the Pacific difficult.

Wolf stopped in Berkeley to talk with G. C. Evans (known for his work on potential theory) who was then department chair. Evans knew of Wolf’s work and offered him a position on the spot!! After the year he promised to Macalester, Wolf returned to Berkeley and worked his way up the ranks. In 1952, Wolf extended Sz-Nagy’s work to the Banach space case. At about the same time Nagy himself did similar work and in so did Kato. While Wolf and Kato didn’t know of each other’s work, Wolf learned of Kato’s work and that led to his invitation for Kato to visit Berkeley.

Kato’s thesis dealt with both analytic and asymptotic perturbation theory (we’ll discuss the later in the next section). It appears that Kato found much of this in about 1944 without knowing about the work of Rellich or Nagy although he did know about Rellich by the time his thesis was written and he learned about the work of Nagy before the publication of the last of his early papers on perturbation theory [320, 322].

Interestingly enough, Kato’s first published work on the perturbation theory of eigenvalues [306] was a brief 1948 note with examples where the theory didn’t apply - these will be discussed in the next section (Examples 3.5, 3.6). His thesis was published in a university journal in full [316] in 1951 with parts published a year early in broader journals in both English [308, 309] and Japanese [310]. Two final early papers [320, 322] dealt with the Banach space case and with further results on asymptotic perturbation theory (discussed further in Sect. 6).

Many of the most significant results in Kato’s work on regular eigenvalue perturbation theory had been found (independently but) earlier by Rellich and Nagy. Kato’s work, especially if you include his book [345], was more systematic. His main contribution beyond theirs concerns the use of reduced resolvents. And, as we’ll see, he was the pioneer in the theory of asymptotic perturbation theory.

3 Eigenvalue perturbation theory, II: asymptotic perturbation theory

In this section and the next, we discuss situations where the Kato–Nagy–Rellich theory of regular perturbations does not apply. Lest the reader think this is a strange pathology, we begin with six (!) simple examples, four from the standard physics literature and then two that appeared in Kato’s first paper—a brief note—on perturbation theory [306].

Example 3.1

(Anharmonic oscillator and Zeeman effect) Let

$$\begin{aligned} A_0 = -\frac{d^2}{dx^2}+x^2, \qquad B=x^4, \qquad A(\beta ) = A_0 + \beta B \end{aligned}$$
(3.1)

on \(L^2({\mathbb {R}},dx)\). This is an example much beloved by teachers of quantum mechanics since one can compute \(a_2\) explicitly since the sum in (2.7) is finite (indeed only two terms which can be computed in closed form). It is also regarded as a paradigm of the simplest quantum field theory, i.e. \(\varphi ^4_1\) in one space–time dimension (see [194, 578]). A basic fact is that the perturbation series exists to all orders, in fact all the sums in the books [345, 497] for individual terms are finite or, alternatively, there exists a simple set of recursion relations [49] for the \(a_n\) so that formally, the ground state energy is given by

$$\begin{aligned} E_0(\beta ) = E_0 + \sum _{n=1}^{\infty } a_n \beta ^n \end{aligned}$$
(3.2)

However, the series in (3.2) has zero radius of convergence. One intuition comes from Dyson [128] who argued that the perturbation series in quantum electrodynamics shouldn’t converge because the theory doesn’t make sense if \(e^2 < 0\) when electrons attract and there is collapse. Similarly, \(A_0-\beta x^4\) does not define a self-adjoint operator since it is limit circle at \(\pm \infty \) (see [616, Section 7.4]). While this is not a proof, one can show ([434, 435, 568]) that \(A(\beta )\) is a type (A) family for \(\beta \in {\mathbb {C}}{\setminus } (-\infty ,0]\) (but not at \(\beta = 0\)), that any eigenvalue, \(E_n(\beta )\), of \(A(\beta )\) for \(\beta >0\) can be analytically continued to all of \(\beta \in {\mathbb {C}}{\setminus } (-\infty ,0]\) with limits on \((-\infty ,0)\) from either side with \(\text {Im} E_n(-\beta +i0) >0\) for any \(\beta > 0\) (so the continuation is not analytic at \(\beta =0\)). [568] has much about the analytic structure near \(\beta = 0\).

This doesn’t quite imply that the series is divergent, only that it can’t converge to the right answer. In fact, one knows that the \(a_n\) grow so fast that the series diverges for all \(\beta \ne 0\). Indeed, it is known that

$$\begin{aligned} a_n = 4 \pi ^{-3/2} (-1)^{n+1} \left( \tfrac{3}{2}\right) ^{n+1/2} \Gamma (n+\tfrac{1}{2}) \left( 1+\text {O}\left( \tfrac{1}{n}\right) \right) \end{aligned}$$
(3.3)

This formula with its n! growth is called the Bender–Wu formula. They guessed it [49] from a calculation of the first 75 \(a_n\) in 1969 and found a non-rigorous argument for it in 1973 [50]. It was proven by Harrell–Simon [222] in 1980—we’ll discuss it in the next section.

There is also literature on the higher order anharmonic oscillator,

$$\begin{aligned} A(\beta ) = -\tfrac{d^2}{dx^2} + x^2 + \beta x^{2m}; \quad m=2,3,\ldots \end{aligned}$$
(3.4)

In this case the analogs of Bender–Wu asymptotics have \(a_n \sim C (-1)^{n+1} A^n n^\gamma \Gamma ((m-1)n)\) for suitable m-dependent \(A, C, \gamma \).

There is a historically important model that has a similar divergence, namely the Zeeman effect for Hydrogen which describes Hydrogen in a constant magnetic field, B, which if \(\varvec{B}\) points in the z direction in \(\varvec{r}=(x,y,z)\) coordinates is given by the Hamiltonian

$$\begin{aligned} A(B) = -\tfrac{1}{2}\Delta - \tfrac{1}{r} +\tfrac{B^2}{8}(x^2+y^2)+BL_z \end{aligned}$$
(3.5)

where \(L_z\) is the z component of the angular momentum. For the ground state (where \(L_z=0\)), one has that

$$\begin{aligned} E_0(B) = \sum _{k=0}^{\infty } E_k B^{2k} \end{aligned}$$
(3.6)

Avron [20] found a Bender–Wu type formula

$$\begin{aligned} E_k = \left( \frac{4}{\pi }\right) ^{5/2} (-1)^{k+1} \pi ^{-2k} \Gamma \left( 2k+\frac{3}{2}\right) \left( 1+\text {O}\left( \frac{1}{k}\right) \right) \end{aligned}$$
(3.7)

with a rigourous proof by Helffer–Sjöstrand [234]. In natural units, the magnetic field in early twentieth century laboratories was very small so lowest order perturbation theory worked very well.

Example 3.2

(Autoionizing States of Two Electron Atoms) We further consider the Hamiltonian A(1 / Z) of Example 2.1; see (2.17). For \(1/Z = 0\), A(0) is the Hamiltonian of two uncoupled Hydrogen atoms so its eigenvalues are \(E_{n,m} = -\tfrac{1}{4n^2}-\tfrac{1}{4m^2}, \, m,n=1,2,\ldots \). The continuous spectrum starts at \(-\tfrac{1}{4}\) (for \(n=1, m\rightarrow \infty \)), so, for example, \(E_{2,2}\) at energy \(-\tfrac{1}{8}\) is an eigenvalue but not isolated, rather it is embedded in the continuous spectrum on \([-\tfrac{1}{4},\infty )\). According to the physicist’s expectation, this eigenvalue becomes a decaying state, where in a finite time, one electron drops to the ground state and the other gets kicked out of the atom with the left over energy (i.e. \(-\tfrac{1}{8}-(-\tfrac{1}{4})=\tfrac{1}{8}\)). For obvious reasons, these are called autoionizing states. These states are actually seen as electron scattering resonances (under \(e + He^+ \rightarrow e+He^+\)) or as photo ionization resonances (\(\gamma +He \rightarrow He^+ + e\)) called Auger resonances.

The situation has a complication we’ll ignore. The eigenvalue at energy \(-\tfrac{1}{8}\) has multiplicity 16 which one can reduce by using exchange, rotation and parity symmetry. For our purposes, it is useful to look at states with angular momentum 2 and azimuthal angular momentum 2 which are simple. In fact, there are states of unnatural parity (with angular momentum 1 but parity +); the continuous spectrum below \(-\tfrac{1}{16}\) is only of natural parity states so these unnatural parity eigenvalues are not embedded in continuous spectrum and so they don’t disappear. There are actually 15 subspaces with definite symmetry. In one, there is a doubly degenerate embedded eigenvalue, in 3 an isolated eigenvalue and in 11 a simple embedded eigenvalue.

According to what is called the Wigner–Weisskopf theory [676], these scattering resonances are complex poles of the S-matrix so the perturbed energy, \(E(\beta )\) has a non-zero imaginary part

$$\begin{aligned} \text {Im}\, E(\beta ) = \frac{\Gamma (\beta )}{2} \end{aligned}$$
(3.8)

where \(\Gamma \) is the width of the resonance, i.e. \(|(E-E_0)+\tfrac{i}{2}\Gamma |^{-2}\) (the impact of a pure pole to a quantum probability) has a distance \(\Gamma \) between the two points where it takes half its maximum value.

Physicists argue that \(\Gamma = \hbar /\tau \), where \(\tau \) is the lifetime of the excited state. Sometimes Rayleigh–Schrödinger perturbation theory is called time–independent perturbation theory because there is a formal textbook argument for computing lifetimes of embedded eigenvalues coupled to the continuum called time–dependent perturbation theory. In particular, the second order term in this theory is called the Fermi golden rule, discussed, for example, in Landau–Lifshitz [407, pp. 140–153]. Simon [574] has a compact way to write this second order term. If \(A(\beta ) = A_0+\beta B\), \(A_0\varphi _0 = E_0\varphi _0\) and \(\widetilde{P}_0(\lambda )\) is the spectral projection for \(A_0\) with \(\{E_0\}\) removed, i.e. \(\widetilde{P}_0(\lambda ) = f_\lambda (A)\) where

$$\begin{aligned} f_\lambda (x) = \left\{ \begin{array}{ll} 1, &{}\quad x<\lambda , x \ne E_0 \\ 0, &{}\quad x \ge \lambda , \text { or } x=E_0 \end{array} \right. \end{aligned}$$

then

$$\begin{aligned} \Gamma (\beta )= & {} \Gamma _2 \beta ^2 + \text {O}(\beta ^3) \end{aligned}$$
(3.9)
$$\begin{aligned} \Gamma _2= & {} \left. \frac{d}{d\lambda }\langle B\varphi _0,\widetilde{P}_0(\lambda )B\varphi _0 \rangle \right| _{\lambda =E_0} \end{aligned}$$
(3.10)

The physics literature arguments for time–dependent perturbation theory are mathematically questionable and there were arguments about what the higher order terms were.

So this example causes lots of problems we’ll look at in Sect. 4: What is a resonance? What does the perturbation series have to do with the resonance energy? Can one mathematically justify the Fermi golden rule? What are the higher terms? Is there a convergent series?

In 1948, Friedrichs [172] considered a model (related to some earlier work of his [170]) with operators acting on \(L^2([a,b],dx)\oplus {\mathbb {C}}\) with \(A_0(f(x),\zeta ) = (xf(x),\zeta )\) where \(a< 1 < b\) so that \(A_0\) has an embedded eigenvalue at \(E_0=1\). \(A(\beta )=A_0+\beta B\) where B is the rank two operator \(B(f(x),\zeta ) = (\zeta h(x), \langle h,f \rangle )\) for some \(h \in L^2([a,b],dx)\). For suitable h and small \(\beta > 0\), Friedrichs proved that \(A(\beta )\) has no eigenvalues in spite of the fact of a first order perturbation term so the eigenvalue indeed dissolves. He did not discuss resonances but this was an early attempt to study a model which in his words “is clearly related to the Auger effect.”

Example 3.3

(Stark Effect) The Stark Hamiltonian describes the Hydrogen atom in an electric field. If F is the strength of the field and \(\varvec{r}=(x,y,z)\), then the operator on \(L^2({\mathbb {R}}^3)\) has the form

$$\begin{aligned} A(F,Z)=-\Delta -\frac{Z}{r}+Fz \end{aligned}$$
(3.11)

We will primarily consider \(Z=1\). Schrödinger developed eigenvalue perturbation theory [544] to apply it to the Stark Hamiltonian. As with the Zeeman effect, laboratory F’s are small so first or second order perturbation theory worked well when compared to experiment and this was regarded as a great success.

Early on, Oppenheimer [470] pointed out that when \(F \ne 0\), A(FZ) is not bounded below so that the \(A(F=0,Z=1)\) ground state is, as soon as \(F \ne 0\), swamped by continuous spectrum. Put differently, it becomes a finite lifetime state that decays. He claimed to compute the lifetime but his calculation was wrong. There are arguments about whether his method was correct but eventually universal agreement that the correct asymptotics for the width, when \(Z=1\) and F is small, is that found by Lanczos [404,405,406]:

$$\begin{aligned} \Gamma (F) \sim \frac{1}{2F}\exp \left( -\frac{1}{6F}\right) \end{aligned}$$
(3.12)

which is usually called the Oppenheimer formula.

In fact, one can prove that for any \(F \ne 0\), and any Z including \(Z=0\), A(FZ) has spectrum \((-\infty ,\infty )\) with infinite multiplicity, purely absolutely continuous spectrum. Titchmarsh [654] proved there are no embedded eigenvalues using the separability in parabolic coordinates we’ll use again below, Avron–Herbst [24] proved the existence of wave operators from \(A(F,Z=0)\) to A(FZ) (wave operators are discussed in Sect. 13 in Part 2) and Herbst [240] proved that those wave operators were unitaries, U, with \(UA(F,Z=0)U^{-1} = A(F,Z)\).

In this regard, I should mention what I’ve called [588] Howland’s Razor after [258, 259] and Occam’s Razor: “Resonances cannot be intrinsic to an abstract operator on a Hilbert space but must involve additional structure.” For \(\{A(F,1)\}_{F \ne 0}\) are all unitarily equivalent but we believe they have F-dependent resonance energies. We’ll discuss the possible extra structures in the next section.

There is also a Bender–Wu type asymptotics

$$\begin{aligned}&E(F) \sim \sum _{n=0}^{\infty } A_{2n} F^{2n} \end{aligned}$$
(3.13)
$$\begin{aligned}&A_{2n} = -6^{2n+1} (2\pi )^{-1} (2n)! \left( 1+\text {O}\left( \frac{1}{n}\right) \right) \end{aligned}$$
(3.14)

found formally by Herbst–Simon [245] and proven by Harrell–Simon [222]. Interestingly enough, there is a close connection between (3.14) and the original Bender–Wu formula (3.3) or rather its analog for

$$\begin{aligned} -\frac{d^2}{dx^2} + x^2 + \beta x^4 -\frac{1}{4x^2} \end{aligned}$$
(3.15)

whose Bender–Wu formula was found by Banks, Bender and Wu [43]. Jacobi [276] discovered that a Coulomb plus linear potential in classical mechanics separates in elliptic coordinates and then Schwarzschild [546] and Epstein [141] extended this idea to old quantum theory. In particular, Epstein used parabolic coordinates. Schrödinger [544] and Epstein [142] extended this use of parabolic coordinates to the Hamiltonian (3.11). This separation was also used by Titchmarsh [647,648,649,650,651, 654], Harrell–Simon [222] and by Graffi–Grecchi and collaborators [48, 79, 198, 199, 201, 203, 205].

Many of the same questions occur as for Example 3.2 which we’ll study in Sect. 4: What is a resonance? What is the meaning of the divergent perturbation series? What is the difference between (3.9) where \(\Gamma (\beta ) = \text {O}(\beta ^2)\) and (3.12) where \(\Gamma (\beta ) = \text {O}(\beta ^k)\) for all k.

Example 3.4

(Double Wells) The standard double well problem is

$$\begin{aligned} A(\beta ) = -\frac{d^2}{dx^2}+x^2-2\beta x^3 + \beta ^2 x^4 \end{aligned}$$
(3.16)

Writing

$$\begin{aligned} V(\beta ,x)&\equiv x^2-2\beta x^3 + \beta ^2 x^4 \\&= x^2(1-x\beta )^2 \\&= \beta ^2 x^2(\beta ^{-1}-x)^2 \end{aligned}$$

we see that if \(U_\beta f(x) = f(\beta ^{-1}-x)\) which is unitary, then \(U_\beta A(\beta ) U_\beta ^{-1} = A(\beta )\). If we let \(\varphi _0(x) = \pi ^{-1/4}\exp (-\tfrac{1}{2} x^2)\), then \(\langle \varphi _0,A(\beta )\varphi _0 \rangle = 1 + \text {O}(\beta ^2)\). But by symmetry, \(\langle U_\beta \varphi _0,A(\beta )U_\beta \varphi _0 \rangle = 1 + \text {O}(\beta ^2)\) while \(\langle \varphi _0,U_\beta \varphi _0 \rangle \) and \(\langle A(\beta )\varphi _0,U_\beta \varphi _0 \rangle \) are O\((\exp (-1/(4\beta ^2)))\), so very small. Thus, we see that while \(A(\beta =0)\) has simple eigenvalues at \(2n+1, \, n=0,1,2,\ldots \), for \(\beta \ne 0\), \(A(\beta )\) has a least two eigenvalues near each \(E_n(\beta =0)\).

So far as I know, Kato never discussed anything like double wells in print, but we’ll see shortly that it illuminates the meaning of stability, a subject that Kato was the first to emphasize.

This model is closely related to the family on \(L^2({\mathbb {R}}^\nu )\):

$$\begin{aligned} H(\lambda ) = -\Delta +\lambda ^2 h(x) + \lambda g(x) \end{aligned}$$
(3.17)

where hg are \(C^\infty \), g is bounded from below, \(h \ge \epsilon > 0\) near \(\infty \), \(h \ge 0\), \(h(x)=0\) for only finitely many points and so that at those points the Hessian matrix \(\frac{\partial ^2 h}{\partial x_i \partial x_j}\) is strictly positive definite. One is interested in eigenvalues of \(H(\lambda )\) as \(\lambda \rightarrow \infty \). Notice that when \(g=0\), \(\lambda ^{-2} H(\lambda ) = -\lambda ^{-2}\Delta + h\), so this is a quasi-classical (\(\hbar \rightarrow 0\)) limit. One can rephrase the double well as looking at \(-\tfrac{d^2}{dx^2}+\lambda ^2 x^2(1-x)^2\) by scaling of space and energy (see Simon [601]). There is a considerable literature both on leading asymptotics and on the exponential splitting of the two lowest eigenvalues—see, for example, Simon [601, 603] and Helffer–Sjöstrand [228,229,230,231,232,233,234]. We note that Witten [687] has a proof of the Morse inequalities that relies on this leading quasi-classical limit (see also Cycon et al. [101]).

Example 3.5

Our last two examples, unlike the first four are neither well-known nor heavily studied. They are from Kato’s first paper on perturbation of eigenvalues, a one page letter to the editor of Progress of Theoretical Physics in 1948. Both examples, which also appear in his thesis [316], have \(A(\beta ) = A_0+\beta B\) with

$$\begin{aligned} A_0 = -\langle \psi ,\cdot \rangle \psi \end{aligned}$$
(3.18)

where \(\psi \in L^2({\mathbb {R}},dx)\) has \(||\psi ||_2 = 1\). He focuses on what happens to the simple eigenvalue \(A_0\) has at \(E_0 = -1\).

In his first example, he takes B to be multiplication by x. This model is the poor man’s Stark effect. He doesn’t mention this connection in the paper but does in the thesis. He states without proof in the Note (but does have a proof in the thesis) that for \(\beta \ne 0\), \(A(\beta )\) has no eigenvalues but has a purely continuous spectrum. He remarked that this example shows that the formal perturbation series may be quite meaningless even if no “divergence” occurs. In his later work, as we’ll see in Sect. 4, he did discuss a possible significance of such series.

Example 3.6

\(A_0\) is given by (3.18) but now B is multiplication by \(x^2\). Kato states and proves in his thesis that for \(\beta \) small and positive, \(A(\beta )\) has a simple eigenvalue near \(E=-1\). Kato proves this by direct calculation rather than the more general strong convergence method in his book which we discuss below. He then discusses two explicit special \(\psi \)’s for which the first order term, \(\int x^2|\psi (x)|^2 dx\), is infinite. For \(\psi = c(1+x^2)^{-1/2}\), he finds (in the thesis; the paper only has the O(\(\beta ^{1/2}\)) term):

$$\begin{aligned} E(\beta ) = -1 + \beta ^{1/2} - \tfrac{1}{2}\beta + \tfrac{1}{8}\beta ^{3/2}+\text {O}(\beta ^2). \end{aligned}$$
(3.19)

For \(\psi = c|x|^{1/2}(1+x^2)^{-1}\) where the first order integral is only logarithmically divergent, he claims that

$$\begin{aligned} E(\beta ) = -1 + \beta \log (\beta ) + \text {O}(\beta ) \end{aligned}$$
(3.20)

The thesis but not the paper also discusses \(\psi = c(1+x^2)^{-1}\) where the first order integral is finite, he claims that

$$\begin{aligned} E(\beta ) = -1 + \beta -2\beta ^{3/2} + \text {O}(\beta ^2) \end{aligned}$$
(3.21)

Kato is primarily a theorem prover and concept developer but occasionally he produces detailed calculational results, often without details; we’ll discuss this further in Sect. 7.

This example is quite artificial but in his book [345], Kato has an example going back to Rayleigh [492]

$$\begin{aligned} A(\beta ) = -\frac{d^2}{dx^2}+\beta \frac{d^4}{dx^4}, \quad \beta >0 \end{aligned}$$
(3.22)

with

$$\begin{aligned} \varphi (0) = \varphi '(0) = \varphi (1) = \varphi '(1) = 0 \end{aligned}$$
(3.23)

boundary conditions. Clearly A(0) should have \(A(0) = -\tfrac{d^2}{dx^2}\) but the boundary conditions (3.23) are too strong to get a self-adjoint operator. One can show that the right boundary conditions for a strong limit are \(\varphi (0) = \varphi (1) = 0\) and that

$$\begin{aligned} E_n(\beta ) = n^2 \pi ^2 \left[ 1+4\beta ^{1/2}+\text {O}(\beta )\right] \end{aligned}$$
(3.24)

With these examples in mind, we turn to the general theory of asymptotic series. Recall [614, Section 15.1] that given a function \(\beta \mapsto f(\beta )\) on (0, B) and a sequence \(\{a_n\}_{n=0}^\infty \), we say that \(\sum _{n=0}^{\infty } a_n \beta ^n\) is an asymptotic series to order N if an only if

$$\begin{aligned} f(\beta ) -\sum _{n=0}^{N} a_n \beta ^n = \text {o}(\beta ^N) \end{aligned}$$
(3.25)

Of course, if the series is asymptotic to order \((N+1)\), the right side of (3.25) can be replaced by O(\(\beta ^{N+1}\)). We’ll mainly discuss series asymptotic to infinite order (i.e. to order N for all \(N=1,2,\ldots \)). It is easy to see that if f has an asymptotic series to infinite order, then f determines all the coefficients \(a_n\) uniquely.

The function \(g(\beta ) = 10^6 \exp (-1/10^6 \beta )\) has a zero asymptotic series. \(f(\beta )\) and \(f(\beta )+g(\beta )\) thus have the same asymptotic series so an asymptotic series tells us nothing about the value, \(f(\beta _0)\), for a fixed \(\beta _0\). Typically however, for \(\beta _0\) small, a few terms approximate \(f(\beta _0)\) well but too many terms diverge. A good example is given [614, Table after (15.1.18)] for the error function \(\text {Erfc}(x) = \tfrac{2}{\sqrt{\pi }} \int _{x}^{\infty } \exp (-y^2) dy\) for which \(h(x) \equiv \pi x \exp (x^2) \text {Erfc}(x)\) has an asymptotic series in 1 / x about \(x = \infty \). At \(x=10\), \(h(x) = .99507\ldots \). The order \(N=2\) asymptotic series is good to 5 decimal places and for \(N=108\) to more than 22 decimal places. But for \(N=1000\), the series is about \(10^{565}\). So it is interesting and important to know that a series is asymptotic but if one knows the series and wants to know f, it is disappointing not to know more.

One often considers \(A(\beta )\) defined in a truncated sector \(\{\beta \in {\mathbb {C}}\,|\, 0<|\beta |< B, |\arg \beta | <A\}\) and demands (3.25) (with \(\beta ^N\) in the error replaced by \(|\beta |^N\)) in the whole sector.

In his thesis, Kato [316] only considered \(A(\beta ) = A_0+\beta B\) with \(A \ge 0, \,B \ge 0\) where \(A(\beta )\) is self-adjoint (with a suitable interpretation of the sum). He used what are now called Temple–Kato inequalities to obtain asymptotic series to all orders in [316, 322]. We discuss this approach in Sect. 6 below.

About the same time, Titchmarsh started a series of papers [647,648,649,650,651, 654] on eigenvalues of second order differential equations including asymptotic perturbation results for \(A(\beta ) = -\tfrac{d^2}{dx^2}+V(x)+\beta W(x)\) on \(L^2({\mathbb {R}},dx)\) (or \(L^2((0,\infty ),dx)\) with a boundary condition at \(x=0\)). Typically both V(x) and W(x) go to infinity as \(|x| \rightarrow \infty \) (so the spectra are discrete) and W goes to \(\infty \) faster (so analytic perturbation theory fails; think \(V(x) = x^2, W(x) = x^4\)). His work relied heavily on ODE techniques. They have overlap of applicability with Kato’s operator theoretic approach, but Kato’s method is more broadly applicable.

In his book, Kato totally changed his approach to be able to say something about the Banach space (and also non-self-adjoint operators in Hilbert space) so he couldn’t use the Temple–Kato inequality which relies on the spectral theorem. There is some overlap of this work from his book and work of Huet [260], Kramer [388, 389], Krieger [392] and Simon [568].

Central to Kato’s approach is the notion of strong resolvent convergence and of stability. Kato often discusses this for sequences \(A_n\) converging to A in some sense as \(n \rightarrow \infty \); for our purposes here, it is more natural to consider \(A(\beta )\) depending on a positive real parameter as \(\beta \downarrow 0\). To avoid various technicalities, we’ll also focus initially on the self-adjoint case were there are a priori bounds on \((B-z)^{-1}\) for \(z \in {\mathbb {C}}{\setminus }{\mathbb {R}}\), although we’ll consider some non-self-adjoint operators later.

For (possibly unbounded) self-adjoint \(\{A(\beta )\}_{0< \beta < B}\) and self-adjoint \(A_0\), we say that \(A(\beta )\) converges in strong resolvent sense (srs) if and only if for all \(z \in {\mathbb {C}}{\setminus }{\mathbb {R}}\), we have that \((A(\beta )-z)^{-1} \rightarrow (A_0-z)^{-1}\) in the strong (bounded) operator topology. Here is a theorem, going back to Rellich [504,505,506,507,508, Part 2] describing some results critical for asymptotic perturbation theory:

Theorem 3.7

Let \(A_0\) be self-adjoint and \(\{A(\beta )\}_{0< \beta < B}\) a family of self-adjoint operators on a Hilbert space, \({\mathcal {H}}\).

  1. (a)

    If \({\mathcal {D}}\subset {\mathcal {H}}\) is a dense subspace with \({\mathcal {D}}\subset D(A_0)\) and for all \(\beta \in (0,B), \, {\mathcal {D}}\subset D(A(\beta ))\), and if \({\mathcal {D}}\) is a core for \(A_0\) and for all \(\varphi \in {\mathcal {D}},\) we have that \(A(\beta )\varphi \rightarrow A_0\varphi \) as \(\beta \downarrow 0\), then \(A(\beta ) \rightarrow A_0\) in srs.

  2. (b)

    If \(a,b \in {\mathbb {R}}\) are not eigenvalues of \(A_0\) and \(A(\beta ) \rightarrow A_0\) in srs, then

    $$\begin{aligned} P_{(a,b)}(A(\beta )) \overset{s}{\rightarrow } P_{(a,b)}(A_0) \end{aligned}$$
    (3.26)

    where \(P_\Omega (B)\) is the spectral projection for B associated to the set \(\Omega \subset {\mathbb {R}}\) [616, Chapter 5 and Section 7.2]

Proof

(a) follows from a simple use of the second resolvent formula; see [616, Theorem 7.2.11]. For (b), one first proves (3.26) when \(P_{(a,b)}\) is replaced by a continuous function [616, Theorem 7.2.10] and then approximates \(P_{(a,b)}\) with continuous functions [616, Problem 7.2.5]. \(\square \)

Remark

Before leaving the subject of abstract srs results, we should mention two results known as the Trotter–Kato theorem (Kato’s ultimate Trotter product formula, the subject of Sect. 18, is also sometimes called the Trotter–Kato theorem). One version says that if \(A_n\) and A are generators of contraction semigroups on a Banach space, X, then \(e^{-tA_n} \overset{s}{\rightarrow } e^{-tA}\) for all \(t>0\) if and only if for one (or for all) \(\lambda \) with \(\text {Re}\,(\lambda ) > 0\), one has \((A_n+\lambda )^{-1} \overset{s}{\rightarrow } (A+\lambda )^{-1}\). Related, sometimes part of the statement of the theorem, is that one doesn’t require A to exist a priori but only that for some \(\lambda \) in the open half plane that \((A_n+\lambda )^{-1}\) have a strong limit whose range is dense. The basic theorem is then due to Trotter [655] in his thesis (written under the direction of Feller, whose interest in semigroups was motivated by Markov processes). Kato’s name is often on the theorem because he clarified an obscure point in this second version [331]. This theorem has also been called the Trotter–Kato–Neveu or Trotter–Kato–Neveu–Kurtz–Sova theorem after related contributions by these authors [402, 403, 466, 622]. There is another related result of this genre sometimes called the Trotter–Kato theorem. It says that if \(A_n\) is a family of self-adjoint operators, they have a srs limit for some A if and only if \((A_n-z)^{-1}\) has a strong limit with dense range for one z in \({\mathbb {C}}_+\) and one z in \({\mathbb {C}}_-\).

Returning to perturbation theory, Kato introduced and developed the key notion of stability. Let \(\{A(\beta )\}_{0< \beta < B}\) (or \(\beta \) in a sector) be a family of closed operators in a Banach space, X. Let \(A_0\) be a closed operator so that as \(\beta \downarrow 0\), \(A(\beta )\) converges to \(A_0\) in some sense. Let \(E_0\) be an isolated, discrete, eigenvalue of \(A_0\). We say that \(E_0\) is stable if there exists \(\epsilon > 0\) so that \(\sigma (A_0) \cap \{z \,|\, |z-E_0| \le \epsilon \} = \{E_0\}\) and so that

(a) \(|\beta | < B\) and \(|z-E_0| = \epsilon \Rightarrow z \notin \sigma (A(\beta ))\) and for each \(\varphi \in X\)

$$\begin{aligned} \lim _{\beta \downarrow 0} (A(\beta ) - z)^{-1}\varphi = (A_0-z)^{-1}\varphi \end{aligned}$$
(3.27)

uniformly in \(\{z \,|\, |z-E_0| = \epsilon \}\)

(b) If \(P(\beta )\) is given by (2.1) with \(A=A(\beta )\) and with \(\Gamma \) the counterclockwise circle indicated at the end of (a), then, for all \(\beta \) small, we have that

$$\begin{aligned} \dim {\mathrm{ran}}\, P(\beta ) = \dim {\mathrm{ran}}\, P(0) \end{aligned}$$
(3.28)

The uniform strong convergence in (a) implies that

$$\begin{aligned} P(\beta ) \overset{s}{\rightarrow } P(0) \end{aligned}$$
(3.29)

In the self-adjoint case, even without (a), if \(A(\beta ) \rightarrow A_0\) in srs, then

$$\begin{aligned} P_{(E_0-\epsilon ,E_0+\epsilon )}(A(\beta )) \overset{s}{\rightarrow } P_{\{E_0\}}(A_0) \end{aligned}$$
(3.30)

for \(\epsilon \) small if \(E_0\) is in the discrete spectrum of \(A_0\). \(P \mapsto \dim {\mathrm{ran}}\, P\) is continuous in the topology of norm convergence but it is only lower semicontinuous in the topology of strong operator convergence. For example, if \(P_n\) is the rank one projection onto multiples of the nth element of an orthonormal basis, then \(P_n \overset{s}{\rightarrow } 0\). The lower semicontinuity says that

$$\begin{aligned} P_n \overset{s}{\rightarrow } P_\infty \Rightarrow \dim {\mathrm{ran}}\, P_\infty \le \liminf \dim {\mathrm{ran}}\, P_n \end{aligned}$$
(3.31)

Kato was well aware that equality might not hold on the right side of (3.31) for examples of relevance to physics—a main example that he mentions is the Stark effect where the right side is infinite. Double wells show that even if (a) above holds, (b) may fail. Simon [601] describes an extension of stability for multiple well problems.

There are two main ways that one can prove stability in cases where it is true. One is to note that if \(A(\beta ) \ge A_0\) as happens if

$$\begin{aligned} A(\beta ) = A_0 + \beta B \end{aligned}$$
(3.32)

and \(B \ge 0\), then \(\dim {\mathrm{ran}}\, P_{(-\infty ,a)} (A(\beta )) \le \dim {\mathrm{ran}}\, P_{(-\infty ,a)} (A_0)\). This and (3.31) implies stability for \(E_0\) below the bottom of the essential spectrum for \(A_0\). This is the typical approach that Kato uses in several places.

The second way one can have stability is illustrated by

Example 3.8

(Example 3.1 (revisited)) One might have the impression that regular perturbation theory is associated with norm continuity of resolvents and spectral projections and asymptotic perturbation theory always only strong convergence. While there is some truth to this, Simon [568] found the surprising fact that even in situations where perturbation theory diverges, one can have norm convergence of resolvents in a sector. One starts by noting that with \(p = \tfrac{1}{i}\tfrac{d}{dx}\), one has that

$$\begin{aligned} (p^2+W)^2&= p^4+W^2+p^2W+Wp^2 \\&= p^4+W^2 + 2pWp + [p,[p,W]] \\&= p^4 + W^2 + 2pWp - W'' \\&\ge \tfrac{1}{2}W^2 - c \end{aligned}$$

if \(W'' \le \tfrac{1}{2}W^2+c\) and \(W \ge 0\). In this way, one sees that for positive constants c and d

$$\begin{aligned} ||(p^2+x^2+\beta x^4)\varphi ||^2 + c||\varphi ||^2 \ge d \left[ ||x^2\varphi ||^2 + \beta ^2 ||x^4\varphi ||^2\right] \end{aligned}$$
(3.33)

which is called a quadratic estimate. This, in turn, implies that \(||x^2(p^2+x^2+1)^{-1}||\) and \(||(p^2+x^2+\beta x^4+1)^{-1}x^2||\) are bounded so that

$$\begin{aligned}&||(p^2+x^2+\beta x^4+1)^{-1}-(p^2+x^2+1)^{-1}|| \\&\quad = \beta ||(p^2+x^2+\beta x^4+1)^{-1}x^4(p^2+x^2+1)^{-1}|| \\&\quad \le \beta ||(p^2+x^2+\beta x^4+1)^{-1}x^2||||x^2(p^2+x^2+1)^{-1}|| \end{aligned}$$

is O\((\beta ) \rightarrow 0\) in norm. This implies stability by a simple argument.

A similar argument works for \(p^2+\gamma x^2+\beta x^4\) for any \(\gamma \in \partial {\mathbb {D}}{\setminus } \{-1\}\) so using scaling and the ideas below, one proves that for each n, the nth eigenvalue, \(E_n(\beta )\), of \(p^2+x^2+\beta x^4\) has an asymptotic series in each sector \(\{\beta \,|\, 0<|\beta |<B_A; |\arg \beta | < A\}\) so long as \(A \in (0,\tfrac{3\pi }{2})\) [568].

The above argument doesn’t work for \(\beta x^{2m}; \, m > 2\) but by using that \(||\beta x^{2m}(p^2+x^2+\beta x^{2m}+1)^{-1}||\) is bounded, one sees that the norm of the difference of the resolvents is O\((\beta ^{1/m})\) which also goes to zero.

To state results on asymptotic series, we focus on getting series for all orders. Kato [345] is interested mainly in first and second order, so he needs much weaker hypotheses. Let \(C \ge 1\) be a self-adjoint operator on a Hilbert space, \({\mathcal {H}}\). Then \(D^\infty (C) \equiv \cap _{n \ge 0}D(C^n)\) is a countably normed Fréchet space with the norms \(||\varphi ||_n \equiv ||C^n\varphi ||_{\mathcal {H}}\) (see [612, Section 6.1]). A densely defined operator, X, on \(D^\infty (C)\) is continuous in the Fréchet topology if and only if for all m, there is k(m) and \(c_m\) so that \(D^{k(m)}(C) \subset D(X), \, X\left[ D^{k(m)}(C)\right] \subset D^m(C)\) and \(||X\varphi ||_m \le c_m ||\varphi ||_{k(m)}\). Typically, for some \(\ell \), k(m) can be chosen to be \(m+\ell \).

Theorem 3.9

Let \(C \ge 1\) be a self-adjoint operator on a Hilbert space, \({\mathcal {H}}\). Let \(\{A(\beta )\}_{0 \le \beta < B}\) be a family of closed operators with \(E_0\) a simple isolated eigenvalue of \(A_0 \equiv A(0)\). Suppose that \(D^\infty (C) \cap D(A_0) \subset D(A(\beta ))\) for all \(\beta \). Let V be an operator with \(D^\infty (C) \cap D(A_0) \subset D(V)\) so that for \(\varphi \in D^\infty (C) \cap D(A_0)\), we have that

$$\begin{aligned} A(\beta )\varphi = (A_0+\beta V)\varphi \end{aligned}$$
(3.34)

Suppose that \(E_0\) is stable (in the sense that the spectrum of \(A(\beta )\) for \(\beta \) small is discrete near \(E_0\) and that (3.28) holds) and that V is a continuous map on \(D^\infty (C)\) and that for some \(\delta \) with \(\sigma (A_0) \cap {\{z \,|\, |z-E_0| = \delta \}} = \{E_0\}\), we have that if \(|z-E_0| = \delta \), then \((A_0-z)^{-1}\) is a continuous map of \(D^\infty (C)\) and continuous in z. Suppose also that if \(\varphi _0 \ne 0\) with \(A_0\varphi _0 = E_0\varphi _0\), then \(\varphi _0 \in D^\infty (C)\). Then, there is a sequence of complex numbers, \(\{a_n\}_{n=0}^\infty \), so that the unique eigenvalue, \(E(\beta )\), of \(A(\beta )\) near \(E_0\) is asymptotic to \(E_0 + \sum _{n=1}^{\infty } a_n \beta ^n\).

Remarks

1. The proof is easy. If \(P(\beta )\) is the spectral projection for \(E(\beta )\), then \(P(\beta )\varphi _0 \rightarrow \varphi _0\) so for \(\beta \) small

$$\begin{aligned} E(\beta ) = \frac{\langle \varphi _0,A(\beta )P(\beta )\varphi _0 \rangle }{\langle \varphi _0,P(\beta )\varphi _0 \rangle } \end{aligned}$$
(3.35)

Thus, it is enough to get asymptotic series for the numerator and denominator. Write \(P(\beta )\) as a contour integral and expand \((A(\beta )-z)^{-1}\varphi _0\) in a geometric series with remainder. Since \(\varphi _0 \in D^\infty (C)\), all terms including the remainder are in \({\mathcal {H}}\). The last factor \(||(A(\beta )-z)^{-1}||\) is uniformly bounded in z and small \(\beta \), so we get an O\((\beta ^{N+1})\) error.

2. The set of algebraic terms obtained by the above proof are the same for asymptotic and analytic perturbation theory so the \(a_n\) are given by Rayleigh–Schrödinger perturbation theory.

3. Two useful choices for C are \(C=A_0+1\) and \(C=x^2+1\). For \(A_0=-\tfrac{d^2}{dx^2}+x^2\), there are very good estimates on \(||(A_0+1)^m\varphi _0||_2\) (see [612, Section 6.4]). If \(A_0 = -\Delta +W+1\), for extremely general W’s, it is known that for \(z \notin \sigma (A_0)\), \((A_0-z)^{-1}\) has an integral kernel with exponential decay [600, Theorem B.7.1], which implies that \({||(1+x^2)^m(A_0-z)^{-1}(1+x^2)^{-m}||}\) is bounded on \(L^2({\mathbb {R}})\), so \((A_0-z)^{-1}\) is bounded on \(D^\infty (1+x^2)\).

Asymptotic series have the virtue of uniquely determining the perturbation coefficients from the eigenvalues as functions and they often give good numeric results if \(\beta \) is small and one takes only a few terms. But mathematically, the situation is unsatisfactory—one would like the coefficients to uniquely determine \(E(\beta )\) (as they do in the regular case) or even better, one would like to have an algorithm to compute \(E(\beta )\) from \(\{a_n\}_{n=0}^\infty \). This is not an issue that Kato seems to have written about but it is an important part of the picture, so we will say a little about it.

It is a theorem of Carleman [82] that if \(\epsilon > 0\) and g is analytic in \(R_{\epsilon ,B} = \{z \,|\, |\arg z|< \tfrac{\pi }{2}+ \epsilon ,\, 0< |z| < B\}\), if \(|g(z)| \le b_n|z|^n\) there and \(\sum _{n=1}^{\infty } b_n^{-1/n} = \infty \) (e.g. \(b_n = n!\)), then \(g \equiv 0\) on \(R_{\epsilon ,B}\). This leads to a notion of strong asymptotic condition and an associated result of there being at most one function obeying that condition (and so a strong asymptotic series determines E)—see Simon [571, 572] or Reed–Simon [497, Section XII.4].

Algorithms for recovering a function from a possibly divergent series are called summability methods. Hardy [219] has a famous book on the subject. Many methods, such as Abel summability (i.e. \(\lim _{t \uparrow 1} \sum _{n=0}^{\infty } a_n t^n\)) work only for barely divergent series like \(a_n = (-1)^n\). The series that arise in eigenvalue perturbation theory are usually badly divergent but, fortunately, there are some methods that work even in that case. Two that have been shown to work for suitable eigenvalue problems are Padé and Borel summability.

The ordinary approximates for a power series are by the polynomials obtained by truncating the power series. If instead, one uses rational functions, one gets Padé, aka Hermite–Padé, approximates (they were formally introduced by Padé [473] in his thesis—Hermite, who was Padé’s advisor, introduced them earlier in the special case of the exponential function [246]). Specifically, given a formal power series, \(\sum _{n=0}^{\infty } a_n z^n\), the Padé approximates, \(f^{[N,M]}\), are given by

$$\begin{aligned}&f^{[N,M]}(z)= \frac{P^{[N,M]}(z)}{Q^{[N,M]}(z)}; \quad \deg P^{[N,M]} = M, \quad \deg Q^{[N,M]} = N \end{aligned}$$
(3.36)
$$\begin{aligned}&f^{[N,M]}(z) - \sum _{n=0}^{N+M} a_n z^n = \text {O}\left( z^{N+M+1}\right) \end{aligned}$$
(3.37)

In (3.37), \(f^{[N,M]}\) has \((N+1+M+1)-1\) parameters as does the sum. Thus (3.36)/(3.37) is \((N+M+1)\) equations in the coefficients of P and Q. So long as certain determinants formed from \(\{a_n\}_{n=0}^{N+M}\) are non-zero, there is a unique solution, \(f^{[N,M]}(z)\). For more on Padé approximates, see Baker [35,36,37].

The other method is called Borel summability, introduced by Borel [65]. The method requires that

$$\begin{aligned} |a_n| \le AB^n n! \end{aligned}$$
(3.38)

for some AB and all n. If that is so, one forms the Borel transform

$$\begin{aligned} g(w) = \sum _{n=0}^{\infty } \frac{a_n}{n!} w^n \end{aligned}$$
(3.39)

which defines an analytic function in \(\{w \,|\, |w| < B^{-1}\}\). One supposes that g has an analytic continuation to a neighborhood of \([0,\infty )\) and defines for z real and positive

$$\begin{aligned} f(z) = \int _{0}^{\infty } e^{-a} g(az) da \end{aligned}$$
(3.40)

Since \(\int _{0}^{\infty } e^{-a} a^n da = n!\), formally f(z) is \(\sum _{n=0}^{\infty } a_n z^n\). For this method to work, g has to have an analytic continuation so that the integral in (3.40) converges.

As far as Padé is concerned, a major result involves sequences, \(\{a_n\}_{n=0}^\infty \), called series of Stieltjes which have the form

$$\begin{aligned} a_n = (-1)^n \int _{0}^{\infty } x^n d\mu (x) \end{aligned}$$
(3.41)

for some positive measure \(d\mu \) on \([0,\infty )\) with all moments finite. The associated Stieltjes transform of \(\mu \)

$$\begin{aligned} f(z) = \int _{0}^{\infty } \frac{d\mu (x)}{1+xz} \end{aligned}$$
(3.42)

is defined and analytic in \(z \in {\mathbb {C}}{\setminus } (-\infty ,0]\). Expanding \((1+xz)^{-1}\) in a geometric series with remainder, one sees that in every sector \(\{z \,|\, |\arg z| < \pi - \epsilon \}\) with \(\epsilon > 0\), \(\sum _{0}^{\infty } a_n z^n\) is an asymptotic series for f. Here is the big theorem for such series:

Theorem 3.10

If \(\{a_n\}_{n=0}^\infty \) is a series of Stieltjes, then for each \(j \in {\mathbb {Z}}\), the diagonal Padé approximates, \(f^{[N,N+j]}(z)\), converge as \(N \rightarrow \infty \) for all \(z \in {\mathbb {C}}{\setminus } [0,\infty )\) to a function \(f_j(z)\) given by (3.42) with \(\mu \) replaced by \(\mu _j\) which obeys (3.41) (with \(\mu =\mu _j\)). The \(f_j\) are either all equal or all different depending on whether (3.41) has a unique solution, \(\mu \), or not.

The result is due to Stieltjes [626, 627] who discussed solutions of the moment problem (3.40) but not Padé approximates. Rather following ideas of Jacobi, Chebyshev and Markov, he discussed continued fractions expansions

$$\begin{aligned} \frac{\alpha _1}{z+\beta _1+\frac{\alpha _2}{z+\beta _3+ \frac{\alpha _3}{\ddots }}} \end{aligned}$$

for the Stieltjes transform. These are the \(f^{[N+1,N]}(z)\) and his convergence results imply the theorem. For details, see Baker [36] or Simon [616, Section 7.7].

It follows from results of Loeffel et al. [434, 435] that if \(E_m(\beta )\) is an eigenvalue of \(p^2+x^2+\beta x^4\) for \(\beta \in [0,\infty )\), then \(E_m(\beta )\) has an analytic continuation to \({\mathbb {C}}{\setminus } [0,\infty )\) with a positive imaginary part in the upper half plane. Results of Simon [568] imply that \(|E_m(\beta )| \le C(1+|\beta |)^{1/3}\). A Cauchy integral formula then implies that \((E_m(0)-E_m(\beta ))/\beta \) has a representation of the form (3.42). Thus, by [435], the diagonal Padé approximates converge. Moreover, it is a fact (related to the above mentioned theorem of Carleman) that if \(\{a_n\}_{n=0}^\infty \) is the set of moments of a measure on \([0,\infty )\) with \(|a_n| \le CD^n(kn)!\) with \(k \le 2\), then the solution to the moment problem is unique [612, Problem 5.6.2]. This implies that for the \(x^4\) anharmonic oscillator, the diagonal Padé approximates converge to the eigenvalues. The same is true for the \(x^6\) oscillator but for the \(x^8\) oscillator, it is known (Graffi–Grecchi [200]) that, while the diagonal Padé approximates converge, they have different limits and none is the actual eigenvalue!

The key convergence result for Borel sums is a theorem of Watson [673]; see Hardy [219] for a proof:

Theorem 3.11

Let \(\Theta \in \left( \tfrac{\pi }{2},\tfrac{3\pi }{2}\right) \) and \(B > 0\). Define

$$\begin{aligned} \Omega&= \{z\,|\,0< |z|< B, |\arg z| < \Theta \} \end{aligned}$$
(3.43)
$$\begin{aligned} \widetilde{\Omega }&= \{z\,|\,0< |z|< B, |\arg z| < \Theta - \tfrac{\pi }{2} \} \end{aligned}$$
(3.44)
$$\begin{aligned} \Lambda&= \{w\,|\,w \ne 0, |\arg w| < \Theta - \tfrac{\pi }{2} \} \end{aligned}$$
(3.45)

Suppose that \(\{a_n\}_{n=0}^\infty \) is given and that f is analytic in \(\Omega \) and obeys

$$\begin{aligned} \left| f(z) - \sum _{n=0}^{N} a_n z^n\right| \le A C^{N+1} (N+1)! \end{aligned}$$
(3.46)

on \(\Omega \) for all N. Define

$$\begin{aligned} g(w) = \sum _{n=0}^{\infty } \frac{a_n}{n!} w^n; \qquad |w| < C^{-1} \end{aligned}$$
(3.47)

Then g(w) has an analytic continuation to \(\Lambda \) and for all \(z \in \widetilde{\Omega }\), we have that

$$\begin{aligned} f(z) = \int _{0}^{\infty } e^{-a} g(az) da \end{aligned}$$
(3.48)

Graffi–Grecchi–Simon [204] proved that this theorem is applicable to the \(x^4\) anharmonic oscillator. They did numeric calculations making an unjustified use of Padé approximation to analytically continue g to all of \([0,\infty )\) and found more rapid convergence than Padé on the original series. By conformally mapping a subset of the union of \({\mathbb {D}}\) and \(\Lambda \) containing \([0,\infty )\) onto the disk, one can do the analytic continuation by summing a mapped power series and so do numerics without an unjustified Padé; see Hirsbrunner and Loeffel [249].

There is a higher order Borel summation where one picks \(m=2,3,\ldots \), \(\Theta \in \left( \tfrac{m\pi }{2},\tfrac{3m\pi }{2}\right) \) and replaces \(\Theta -\tfrac{\pi }{2}\) in (3.44) by \(\Theta - \tfrac{m\pi }{2}\), \((N+1)!\) in (3.46) is replaced by \([m(N+1)]!\), n! in (3.47) by (mn)! and (3.48) by

$$\begin{aligned} f(z) = \int _{0}^{\infty } e^{-a^{1/m}} g(za) a^{\left( \tfrac{1}{m} - 1\right) } da \end{aligned}$$
(3.49)

They showed [204] that the \(x^{2(m+1)}\) oscillator is modified m-Borel summable.

Avron–Herbst–Simon [25, 26, 27, 28, Part III] proved that for the Zeeman effect in arbitrary atoms, the perturbation series of the discrete eigenvalues is Borel summable. The Schwinger functions of various quantum field theories have been proven to have Borel summable Feynman perturbation series: \(P(\phi )_2\) [133], \(\phi ^4_3\) [438], \(Y_2\) [513, 514], \(Y_3\) [439].

In general, Padé summability is hard to prove because it requires global information, so it has been proven to work only in very limited situations (for example a higher dimensional quartic anharmonic oscillator is known to be Borel summable but nothing about Padé is known). Clearly, when it can be proven, Borel summability is an important improvement over the mere asymptotic series that concerned Kato.

Before leaving asymptotic perturbation theory, we mention a striking example of Herbst–Simon [243]

$$\begin{aligned} A(\beta ) = -\frac{d^2}{dx^2}+x^2-1+\beta ^2 x^4 + 2\beta x^3 -2 \beta x \end{aligned}$$

If \(E_0(\beta )\) is the lowest eigenvalue, they prove that for all small, non-zero positive \(\beta \)

$$\begin{aligned} 0< E_0(\beta ) < C \exp (-D\beta ^{-2}) \end{aligned}$$

Thus \(E_0(\beta )\) has \(\sum _{n=0}^{\infty } a_n \beta ^n\) as asymptotic series where \(a_n \equiv 0\). The asymptotic series converges but, since \(E_0\) is strictly positive, it converges to the wrong answer!

4 Eigenvalue perturbation theory, III: spectral concentration

Starting around 1950, Kato [316] and Titchmarsh [647,648,649,650,651, 654] considered what the perturbation series might mean for a problem like the Stark problem where a discrete eigenvalue is swallowed by continuous spectrum as soon as the perturbation is turned on. Titchmarsh looked mainly at ODEs; in particular, he looked at what has come to be called the Titchmarsh problem, (\(g \ge -\tfrac{1}{4}, z > 0\))

$$\begin{aligned} h(g,z,f) = -\frac{d^2}{dx^2}+\frac{g}{x^2}-\frac{z}{x}-fx \end{aligned}$$
(4.1)

(for some values of g, one needs a boundary condition at \(x=0\)). Kato used operator theory techniques and studied Examples 3.3 and 3.5.

Titchmarsh proved that the Green’s kernel for h, originally defined for energies in \({\mathbb {C}}_+\), had a continuation onto the lower half plane with a pole near the discrete eigenvalues of h(gzf) and he identified the real part of the pole with perturbation theory up to second order. He conjectured that the imaginary part of the pole was exponentially small in 1 / f. He then showed in a certain sense that the spectrum of \(h(g,z,f \ne 0)\) as \(f \downarrow 0\) concentrated near the real parts of his poles [647, 648, 649, 650, 651, Part V].

Kato discussed things in terms of what he called pseudo-eigenvalues and pseudo-eigenvectors. He later realized that these notions imply a concentration of spectrum like that used by Titchmarsh. In his book [345], he emphasized what he formally defined as spectral concentration and linked the two approaches. In this section, I’ll begin by defining spectral concentration and then prove, following Kato, that it is implied by the existence of pseudo-eigenvectors. Finally, I’ll discuss the complex scaling theory of resonances and how it extends and illuminates the theory of spectral concentration.

Consider first the case where \(A(\beta )\) converges to \(A_0\) as \(\beta \downarrow 0\) in srs and \(E_0\) is a discrete simple eigenvalue of \(A_0\). Let T be a closed interval with \(\sigma (A_0) \cap T = \{E_0\}\). By Theorem 3.7, for any \(\epsilon > 0\), we have that \(P_{T{\setminus } (E_0-\epsilon ,E_0+\epsilon )} (A(\beta )) \overset{s}{\rightarrow } 0\). Thus, in a sense, the spectrum of \(A(\beta )\) in T is concentrated near \(E_0\). In the above, if we could replace \((E_0-\epsilon ,E_0+\epsilon )\) by \((E_0+a_1\beta -\beta ^{3/2},E_0+a_1\beta + \beta ^{3/2})\), we’d be able to claim that the spectrum was concentrated near \(E_0+a_1\beta \) in a way that would determine \(a_1\).

Taking into account that we may want to also have T shrink in cases like Example 3.2, we make the following definition. Let \(T(\beta ), S(\beta )\) be Borel sets in \({\mathbb {R}}\) given for \(0< \beta < B\) so that if \(0< \beta ' < \beta \), then \(T(\beta ') \subset T(\beta ), S(\beta ') \subset S(\beta )\) and so that for all \(\beta \), \(S(\beta ) \subset T(\beta )\). We say that the spectrum of \(A(\beta )\) in \(T(\beta )\) is asymptotically concentrated in \(S(\beta )\) if and only if \(P_{T(\beta ){\setminus } S(\beta )} \overset{s}{\rightarrow } 0\).

If \(E_0\) is a simple eigenvalue of \(A_0\) and \(\{a_j\}_{j=1}^N\) are real numbers, we say the spectrum near \(E_0\) is asymptotically concentrated near \(E_0+\sum _{j=0}^{N} a_j \beta ^j\) if there exist positive functions f and g obeying \(f(\beta ) \rightarrow 0,\, f(\beta )/\beta \rightarrow \infty ,\, g(\beta )/\beta ^N \rightarrow 0\) as \(\beta \downarrow 0\) so that the spectrum of \(A(\beta )\) in \((E_0-f(\beta ),E_0+f(\beta ))\) is asymptotically concentrated in \((E_0+\sum _{j=0}^{N} a_j \beta ^j -g(\beta ), E_0+\sum _{j=0}^{N} a_j \beta ^j + g(\beta ))\). It is easy to see if that happens, it determines the \(a_j,\, j=1,\ldots ,n\).

Kato’s thesis [316] introduced the notion of Nth order pseudo-eigenvectors and pseudo-eigenvalues. In later usage, this is a pair of functions, \(\varphi (\beta )\) and \(\lambda (\beta )\), on (0, B) with values in \({\mathcal {H}}\) and \({\mathbb {R}}\) so that

$$\begin{aligned}&\varphi (\beta ) \in D(A(\beta )), \qquad ||\varphi (\beta )||=1, \qquad \lambda (\beta ) \rightarrow E_0 \end{aligned}$$
(4.2)
$$\begin{aligned}&||(A(\beta )-\lambda (\beta ))\varphi (\beta )|| = \text {o}(\beta ^N) \end{aligned}$$
(4.3)

Conley–Rejto [95] and Riddell [515] (Riddell was a student of Kato and this paper was based on his PhD. thesis) proved the following

Theorem 4.1

If \(E_0\) is a simple isolated eigenvalue of \(A_0\) and \((\varphi (\beta ),\lambda (\beta ))\) are an Nth order pseudo-eigenvector and pseudo-eigenvalue so that as \(\beta \downarrow 0\), we have that

$$\begin{aligned} (1-P_{E_0}(A_0)) \varphi (\beta ) \rightarrow 0 \end{aligned}$$
(4.4)

Then there exists \(g(\beta ) = \text {o}(\beta ^N)\) and \(d>0\) so that the spectrum of \(A(\beta )\) in \((E_0-d,E_0+d)\) is concentrated in \((\lambda (\beta )-g(\beta ),\lambda (\beta )+g(\beta ))\).

Remarks

  1. 1.

    Riddell also has a converse.

  2. 2.

    Both papers consider the situation where \(E_0\) has multiplicity \(k < \infty \) and there are k orthonormal pairs obeying (4.3) and they prove spectral concentration on a union of k intervals of size \(\text {o}(\beta ^N)\) about the \(\lambda _j\).

  3. 3.

    The proof isn’t hard. One picks \(g(\beta ) = \text {o}(\beta ^N)\) so that \(||(A(\beta )-\lambda (\beta ))\varphi (\beta )||/g(\beta ) \rightarrow 0\). This implies that if \(Q(\beta ) = P_{(\lambda (\beta )-g(\beta ),\lambda (\beta )+g(\beta ))}(A(\beta ))\), then \(||(1-Q(\beta ))\varphi (\beta )|| \rightarrow 0\). By (4.4), this implies that

    $$\begin{aligned} ||Q(\beta )-P_{E_0}(A_0)|| \rightarrow 0 \end{aligned}$$
    (4.5)

    If \(d < {\mathrm{dist}}(E_0,\sigma (A){\setminus }\{E_0\})\), Theorem 3.7 implies that \(P_{(E_0-d,E_0+d)}(A(\beta ))\psi \rightarrow P_{E_0}(A_0)\psi \) for any \(\psi \). Thus by (4.5), \(\left[ P_{(E_0-d,E_0+d)}(A(\beta ))-Q(\beta )\right] \psi \rightarrow 0\) which is the required spectral concentration

These ideas were used by Friedrichs and Rejto [175] to prove spectral concentration in Example 3.5 (i.e. \(A_0\) of rank 1 and B multiplication by x). They assumed the function \(\psi (x)\) of (3.18) is strictly positive on \({\mathbb {R}}\) and Hölder continuous and prove that \(A(\beta )\) has no point eigenvalues and has a weak spectral concentration (of order \(\beta ^p\) for some \(0<p<1\)). Riddell [515] proved spectral concentration to all orders for the Stark effect for Hydrogen using pseudo-eigenvectors and Rejto [501, 502] proved the analog for Helium (see below for more on spectral concentration for the Stark effect).

Veselić [661] systematized and simplified the results in Theorem 4.1 and applied it to certain models (not linear in \(\beta \)) where \(A_0\) has a discrete eigenvalue while \(A(\beta )\) has no eigenvalue due to tunnelling through a barrier. An example is \(A(\beta ) = -\tfrac{d^2}{dx^2}+V(x,\beta )\) where

$$\begin{aligned} V(x,\beta ) = V_0(x)-(1-e^{-\beta x}) \end{aligned}$$
(4.6)

\(V_0\) goes to zero at infinity and is such that \(A_0\) has a single negative eigenvalue at \(-\tfrac{1}{2}\). Thus \(A(\beta )\) has essential spectrum \([-1,\infty )\) and instantaneously the discrete eigenvalue is swamped in continuous spectrum. There is a barrier of size \(\beta ^{-1}\) trapping the initial bound state. Veselić proved spectral concentration.

As noted Titchmarsh related spectral concentration to second sheet poles of Green’s functions for certain differential operators. This theme was developed by James Howland, a student of Kato, in 5 papers [255,256,257,258,259]. Howland discussed two situations. One was where \(A_0\) was finite rank and whose non-zero eigenvalues are washed away much like Example 3.5. The other was where \(A_0\) has eigenvalues embedded in continuous spectrum and B is finite rank, so related to the Friedrichs model mentioned at the end of Example 3.2.

In both cases, there is a finite dimensional space, \({\mathcal {V}}\), where the finite rank operator lives and Howland considered \({\{\langle \varphi ,(A(\beta )-z)^{-1}\psi \rangle ,|\, \varphi ,\psi \in {\mathcal {V}}\}}\) and proved (under suitable conditions) that these functions initially defined on \({\mathbb {C}}_+\) have meromorphic continuations through \({\mathbb {R}}\) into a neighborhood of \(E_0\), a finite multiplicity eigenvalue of \(A_0\). These continuations had second sheet poles at \(E_j(\beta )\) converging as \(\beta \downarrow 0\) to \(E_0\). The number of poles is typically the multiplicity of \(E_0\) as an eigenvalue of \(A_0\).

In the case where \(A_0\) has a discrete eigenvalue, Howland showed that \(\text {Im}\,E(\beta ) = \text {O}(\beta ^\ell )\) for all \(\ell \) and was able to use this to prove spectral concentration to all orders. But in cases where \(A_0\) had an embedded eigenvalue, it was typically true that \(\text {Im}\, E(\beta ) = a_k \beta ^k + \text {o}(\beta ^k)\) for some k and some \(a_k < 0\); indeed Howland often proved a Fermi golden rule with \(a_2 \ne 0\). In that case, he showed there was spectral concentration of order \(k-1\) but not k so spectral concentration couldn’t specify a perturbation series to all orders.

Howland also discovered that even when \(A_0\) and B were self-adjoint, an eigenvalue could turn into a second order pole whose perturbation series could have non-trivial fractional power series in the asymptotic expression, i.e. Rellich’s theorem fails for resonance energies.

Howland also introduced what I’ve called Howland’s razor (see the discussion of Example 3.3) and he gave one possible answer: it often happened that the embedded eigenvalue turned into a resonance, i.e. second sheet pole, for real values of \(\beta \) but for suitable complex \(\beta \), it was a pole in \({\mathbb {C}}_+\) and so a normal discrete eigenvalue of \(A(\beta )\). Thus the resonance energy could be interpreted as the analytic continuation of a perturbed eigenvalue.

Perhaps the most successful approach to the study of resonances, one that handles problems in atomic physics like Examples 3.2 and 3.3, is the method of complex scaling, initially called dilation or dilatation analyticity (the name change to complex scaling was by quantum chemists when they took up the method for numerical calculation of molecular resonances). The idea appeared initially in a technical appendix of a never published note by J. M. Combes who realized the potential of this idea and then published papers with coauthors: Aguilar–Combes [6] on the two body problem and Balslev–Combes [42] on N-body problems (Eric Balslev was Kato’s first Berkeley student); see Simon [573] for extensions and simplifications and [497, Sections XIII.10 and XII.6] for a textbook presentation. Combes and collaborators knew that the formalism, which they used to prove the absence of singular continuous spectrum, provided a possible definition of a resonance. It was Simon [574] who realized that the formalism was ideal for studying eigenvalues embedded in the continuous spectrum like autoionizing states. We will not discuss an extension needed for molecules in the limit of infinite nuclear masses where one uses exterior complex scaling or a close variant, see Simon [594], Hunziker [263] and Gérard [187].

We begin with the two body case. On \(L^2({\mathbb {R}}^\nu ,d^\nu x)\), let \(U(\theta ),\,\theta \in {\mathbb {R}}\) be the set of real scalings:

$$\begin{aligned} (U(\theta )f)(\varvec{r}) = e^{\nu \theta /2}f(e^\theta \varvec{r}) \end{aligned}$$
(4.7)

which defines a unitary group. If \(H = -\Delta +V(\varvec{r})\), then

$$\begin{aligned} H(\theta ) \equiv U(\theta )HU(\theta )^{-1} = -e^{-2\theta }\Delta +V(e^\theta \varvec{r}) \end{aligned}$$
(4.8)

The first term, \(H_0(\theta )\), can be analytically continued and

$$\begin{aligned} \sigma (H_0(\theta )) = \{z \in {\mathbb {C}}{\setminus }\{0\}\,|\, \arg z = -2\text {Im}\,\theta \} \cup \{0\} \equiv S_\theta \end{aligned}$$
(4.9)

Suppose that \(\theta \mapsto V(e^\theta \varvec{r})\) has an analytic continuation as a compact operator from \(D(-\Delta )\) to \(L^2({\mathbb {R}}^\nu )\) for \(|\text {Im}\,\theta | < \Theta _0\) as happens for \(V(\varvec{r})=r^{-\alpha }\,(0<\alpha <2\); including \(\alpha = 1\), i.e. Coulomb) for all \(\Theta _0\) or for \(V(\varvec{r})=e^{-\gamma r}\) for \(\Theta _0=\tfrac{\pi }{2}\). Such V’s are called dilation analytic. Then \(H(\theta )\) is a type (A) analytic family on the strip of width \(2\Theta _0\) about \({\mathbb {R}}\). For any \(\theta \), the essential spectrum of \(H(\theta )\) is \(S_\theta \).

Discrete eigenvalues are given by analytic functions, \(E_j(\theta )\). Since changing \(\text {Re}\,\theta \) provides unitarily equivalent H’s, \(E_j(\theta )\) is constant under changes of \(\text {Re}\,\theta \), so constant by analyticity. We conclude that so long as discrete eigenvalues avoid \(S_\theta \), they remain discrete eigenvalues of \(H(\theta )\). In particular, negative eigenvalues of H are eigenvalues of \(H(\theta )\) if \(|\text {Im}\,\theta | < \tfrac{\pi }{2}\). An additional argument shows that embedded positive eigenvalues become discrete eigenvalues of \(H(\theta )\) for \(\text {Im}\,\theta \in (0,\tfrac{\pi }{2})\).

By this persistence, \(H(\theta )\) for \(\theta \) with \(\text {Im}\,\theta \in (0,\tfrac{\pi }{2})\), there can’t be any eigenvalues in \(\{z\,|\, \arg z \in (0,2\pi -2\text {Im}\,\theta ){\setminus }\{-\pi \}\}\) (for taking \(\theta \) back to zero would result in non-real eigenvalues of H) but there isn’t any reason there can’t be for z with \(\arg z \in (-2\text {Im}\,z,0)\). That is, moving \(\text {Im}\,\theta \) can uncover eigenvalues in \({\mathbb {C}}_-\) which we interpret as resonances (but see the discussion below).

Using techniques from N-body quantum theory (essentially the HVZ theorem to be discussed in Sect. 11; we’ll use notation from that section below), one can similarly analyze N-body Hamiltonians with center of mass removed when all the \(V_{ij}\) are dilation analytic. The spectrum of \(H(\theta )\) with \(\theta \) not real looks like that in Fig. 1.

Fig. 1
figure 1

The Spectrum of \(H(\theta )\) (a) Discrete eigenvalue of H (b) Continuum-embedded eigenvalues (c) Thresholds of H (d) Resonance eigenvalues (e) Complex thresholds

If \({\mathcal {C}}\) is a non-trivial cluster decomposition of \(\{1,\ldots ,N\}\), \({\mathcal {C}}=\{C_1,\ldots ,C_k\}\) and \(h(C_j)\) is the internal Hamiltonian of \(C_j\), the set of \(E_1+\cdots +E_k\) where \(E_j\) is an eigenvalue of \(h(C_j)\) is called the set of thresholds (if some \(C_\ell \) has one particle, then \(h(C_\ell )\) is the zero operator on \({\mathbb {C}}\) and \(E_\ell =0\)). It can be shown [42, 573] that the set, \(\Sigma \), of all thresholds (running over all non-trivial cluster decompositions) is a closed countable set and that for \(0< \text {Im}\,\theta<\Theta _0<\tfrac{\pi }{2}\), one has that

$$\begin{aligned} \sigma _\mathrm{{ess}}(H(\theta )) = \bigcup _{\lambda \in \Sigma (\theta )} \lambda +S_\theta \end{aligned}$$
(4.10)

Here \(\Sigma (\theta )\) includes some complex \(\lambda \) where the \(E_j\) are resonance eigenvalues of \(h(C_j,\theta )\).

Example 3.2 revisited. (following [574]) The thresholds are \(\left\{ -\tfrac{1}{4n^2}\right\} _{n=1}^\infty \) so the eigenvalue at \(E_{2,2} = -\tfrac{1}{8}\) is not a threshold. Thus it is an isolated eigenvalue of \(A(1/Z,0,\theta )\) if \(-i\theta \in (0,\tfrac{\pi }{2})\). It follows that the Kato–Rellich theory applies so, for 1 / Z small, there is an eigenvalue, \(E_{2,2}(1/Z,\theta )\) independent of \(\theta \) (although it is only an eigenvalue if \(-\arg (E_{2,2}(1/Z)+\tfrac{1}{4}) < \text {Im}\,\theta \). This first implies there is a convergent perturbation series (i.e. time–dependent perturbation theory, suitably defined, converges). One can compute the perturbation coefficients which are \(\theta \) independent for \(-i\theta \in (0,\tfrac{\pi }{2})\) and then take \(-i\theta \) to 0. One gets a suitable limit of \(-(V\varphi ,SV\varphi )\) where S is a reduced resolvent. Using the fact that the distribution limit of \(1/(x+i\epsilon )\) is \({\mathcal {P}}\left( \tfrac{1}{x}\right) -i\pi \delta (x)\), Simon [574] computed \(\text {Im}\, a_2\) as given by the Fermi golden rule.

For Stark Hamiltonians, the initial belief among mathematical physicists was that complex scaling couldn’t work. For let

$$\begin{aligned} H_0(\theta ,F)=-e^{-2\theta }\Delta +Fe^\theta z \end{aligned}$$
(4.11)

on \(L^2({\mathbb {R}}^3)\). Since \(H_0(\theta =0,F \ne 0)\) has no threshold (translating z by a constant, adds a constant to the energy), there is no place for the spectrum \((-\infty ,\infty )\) to go when \(\theta \) is made imaginary. So it was assumed the theory could not make sense.

In spite of this accepted wisdom, a quantum chemist, Bill Reinhardt, did calculations for the Stark problem using complex scaling [498] and got sensible results. Motivated by this, Herbst [241] was able to define complex scaling for a class of two body Hamiltonians including the Hydrogen Stark problem. He discovered that for \(F \ne 0\), and \(0< \arg \theta < \pi /3\), \(H_0(\theta ,F)\) has empty spectrum (!), i.e. \((H_0(\theta ,F) - z)\) is invertible for all z. It is a theorem that elements in Banach algebras and, in particular, bounded operators on any Banach space, have non-empty spectrum but that is only for bounded operators. In some sense, \(H_0(\theta ,F)\) has only \(\infty \) in its spectrum—specifically \(\sigma [(H_0(\theta ,F)-z)^{-1}] = \{0\}\) for all z.

Example 3.3 revisited With this in hand, Herbst [241] considered (3.11) and defined \(A(F,Z,\theta )\) by

$$\begin{aligned} A(F,Z,\theta ) = -e^{-2\theta }\Delta - e^{-\theta }\frac{Z}{r} + e^\theta Fz \end{aligned}$$
(4.12)

and proved that for \(0< -i\theta <\tfrac{\pi }{3}\), and \(F \ne 0\), \(A(F,Z,\theta )\) has purely discrete spectrum and if \(E_0 \in (-\infty ,0)\) is an eigenvalue of \(A(F=0,Z,\theta =0)\) of multiplicity k, then for F small and \(-i\theta \in (0.\pi /3)\), \(A(F,Z,\theta )\) has at most k eigenvalues near \(E_0\) and their combined multiplicities is k. The Rayleigh–Schrödinger series can be proven to be asymptotic by the method of Theorem 3.9. Since its coefficients are real, Herbst showed that the width, \(\Gamma (F)\), is \(\text {o}(F^\ell )\) for all \(\ell \) and, by Howland’s method, this provided another proof of spectral concentration for all orders for the Stark problem.

Herbst–Simon [245] studied the analytic properties of \(E(F,Z,\theta )\) and proved analyticity for \(-F^2 \in \{z \,|\, |z| < R\} \cap ({\mathbb {C}}{\setminus } (-\infty ,0])\) and used this to prove Borel summability that recovers \(E(F,Z,\theta )\) directly for \(\text {Re}\,(-F^2)>0\) (which doesn’t include any real F). The physical value is then determined by analytic continuation. Graffi–Grecchi [198] had proven Borel summability slightly earlier using very different methods. Graffi–Grecchi [202] and Herbst–Simon [245] also proved Borel summability for discrete eigenvalues of general atoms.

For Hydrogen, Herbst–Simon conjectured (3.14) noting that it was implied by their analyticity results and the then unproven Oppenheimer formula. Shortly thereafter, Harrell–Simon [222] proved the Oppenheimer formula for the complex scaled defined Stark resonance and so also (3.14). They used similar arguments to prove the Bender–Wu formula for the anharmonic oscillator. Later Helffer–Sjöstrand [234] proved Bender–Wu formulae for higher dimensional oscillators.

We have not discussed in detail various subtleties that are dealt with in the quoted papers: among them, Herbst [241] showed that \(A(F,Z,\theta )\) is of type(A) with domain \(D(-\Delta )\cap D(z)\) on \(\{(F,Z,\theta ) \,|\, F>0, \text {Im}\,\theta \in (0,\pi /3)\}\) by proving a quadratic estimate. The proof of stability of the eigenvalues of \(A(F=0,Z,\theta )\) for \(\text {Im}\,\theta \in (0,\pi /3)\) uses ideas from [25, 26, 27, 28, Part I]. While the free Stark problem has scaled Hamiltonians with empty spectrum when there is one positive charge and N particles of equal mass and equal negative charge, there are charges and masses, where the spectrum is not empty.

Sigal [559,560,561,562] and Herbst–Møller–Skibsted [242] have further studied Stark resonances in multi-electron atoms proving that the widths are strictly positive and exponentially small in 1 / F.

We end this discussion by noting that I have reason to believe that, at least at one time, Kato had severe doubts about the physical relevance of the complex scaling approach to resonances. [222] was rejected by the first journal it was submitted to. The editor told me that the world’s recognized greatest expert on perturbation theory had recommended rejection so he had no choice. I had some of the report quoted to me. The referee said that the complex scaling definition of resonance was arbitrary and physically unmotivated with limited significance.

There is at least one missing point in a reply to this criticism: however it is defined, a resonance must correspond to a pole of the scattering amplitude. While this is surely true for resonances defined via complex scaling, as of this day, it has not been proven for the models of greatest interest. So far, resonance poles of scattering amplitudes in quantum systems have only been proven for two and three cluster scattering with potentials decaying faster (often much faster) than Coulomb and not for Stark scattering; see Babbitt–Balslev [33], Balslev [39,40,41], Hagedorn [214], Jensen [285] and Sigal [555, 558]. This is a technically difficult problem which hasn’t drawn much attention. That said, following [222] and others, we note the following in support of the notion that eigenvalues of \(H(\theta )\) that lie in \({\mathbb {C}}_-\) are resonances:

(1) Going back to Titchmarsh [647,648,649,650,651, 654], poles of the diagonal (i.e. \(x=y\)) Green’s function (integral kernel, G(xyz) of \((H-z)^{-1})\) are viewed as resonances for one dimensional problems. In dimension \(\nu \ge 2\), G(xyz) diverges as \(x \rightarrow y\) so it is natural to consider poles of \(\langle \varphi ,(H-z)^{-1}\varphi \rangle \). Howland’s razor implies that you can’t look at all \(\varphi \in L^2({\mathbb {R}}^\nu , d^\nu x)\) but a special class of functions which are smooth in x and p space would be a reasonable replacement for \(x=y\). One can show (see [497, Section XIII.10]) that if \(\varphi \) is a polynomial times a Gaussian, then \(\langle \varphi ,(H-z)^{-1}\varphi \rangle \) has a meromorphic continuation across \({\mathbb {R}}\) between thresholds with poles exactly at the eigenvalues of \(H(\theta )\).

(2) In the autoionizing case, E is an analytic function of 1 / Z and in the Stark case, analytic for \(-F^2\) in a cut disk about 0. For the physically relevant values, 1 / Z real or F real, E has \(\text {Im}\,E < 0\) and these resonances are on the second sheet and disappear at \(\theta =0\). But for 1 / Z or F pure imaginary, the corresponding E is in \({\mathbb {C}}_+\) and so persists when \(\text {Im}\,\theta \downarrow 0\), i.e. E for these unphysical values of the parameters is an eigenvalue of these corresponding H. Thus resonances can be viewed as analytic continuations of actual eigenvalues from unphysical to physical values of the parameters.

(3) It is connected to the sum or Borel sum of a suitable perturbation series, see [78, 79].

(4) It yields information on asymptotic series and spectral concentration in a particularly clean way and, in particular, a proof of a Bender–Wu type formula for the asymptotics of the perturbation coefficients in the Stark problem.

While we’ve focused on the complex scaling approach to resonances, there are other methods. One called distortion analyticity works sometimes for potentials which are the sum of a dilation analytic potential and a potential with exponential decay (but not necessarily any x-space analyticity). The basic papers include Jensen [285], Sigal [557], Cycon [100], and Nakamura [456, 457]. Some approaches for non-analytic potentials include Cattaneo–Graf–Hunziker [85], Cancelier–Martinez–Ramond [80] and Martinez–Ramond–Sjöstrand [443]. There is an enormous literature on the theory of resonances from many points of view. It would be difficult to attempt a comprehensive discussion of this literature and given that the subject is not central to Kato’s work, I won’t even try. But I should mention a beautiful set of ideas about counting asymptotics of resonances starting with Zworski [714]; see Sjöstrand [619] for unpublished lectures that include lots of references, a recent review of Zworski [715] and forthcoming book of Dyatlov–Zworski [127]. The form of the Fermi Golden Rule at Thresholds is discussed in Jensen–Nenciu [290] (see Sect. 16). A review of the occurrence of resonances in NR Quantum Electrodynamics and of the smooth Feshbach–Schur map is Sigal [563] and a book on techniques relevant to some approaches to resonances is Martinez [442].

5 Eigenvalue perturbation theory, IV: pairs of projections

Recall [616, Section 2.1] that a (bounded) projection on a Banach space, X, is a bounded operator with \(P^2=P\). If \(Y = {\mathrm{ran}}(P)=\ker (1-P)\) and \(Z = {\mathrm{ran}}(1-P)=\ker (P)\), then Y and Z are disjoint closed subspaces and \(Y+Z=X\) and that \((y,z) \mapsto y+z\) is a Banach space linear homeomorphism of \(Y \oplus Z\) and X. There is a one-one correspondence between such direct sum decompositions and bounded projections. We saw in Sect. 2 that the following is important in eigenvalue perturbation theory:

Theorem 5.1

Fix a Banach space, X. For any pair of bounded projections, PQ on X with \(||P-Q|| < 1\), there exists an invertible map, U so that

$$\begin{aligned} UPU^{-1} = Q \end{aligned}$$
(5.1)

Moreover, U can be chosen so that

  1. (a)

    For P fixed, U(PQ) is analytic in Q in that it is a norm limit, uniformly in each ball \(\{Q\,|\,||P-Q|| < 1-\epsilon \}\), of polynomials in Q.

  2. (b)

    If X is a Hilbert space and PQ are self-adjoint projections, then U is unitary.

Remarks

  1. 1.

    We don’t require \(U(P,P) = {\varvec{1}}\) which might seem natural because, below, when P and Q are self-adjoint, we’ll find a U for which (5.19) holds and it can be shown that is inconsistent with \(U(P,P) = {\varvec{1}}\). Of course, given any \(U_0(P,Q)\) obeying (5.1), \(U(P,Q)=U_0(P,Q)U_0(P,P)^{-1}\) also obeys (5.1) and has \(U(P,P)={\varvec{1}}\) so it is no great loss. Both the U’s we construct below also obey \(U(Q,P) = U(P,Q)^{-1}\).

  2. 2.

    U is actually jointly analytic in PQ and the proof easily implies if P is fixed and \(\beta \mapsto Q(\beta )\) is analytic (resp. continuous, \(C^k\), \(C^\infty \)) in \(\beta \), then so is U.

A first guess for U might be

$$\begin{aligned} W = QP+(1-Q)(1-P) \end{aligned}$$
(5.2)

which obeys

$$\begin{aligned} WP=QP=QW \end{aligned}$$
(5.3)

so if W is invertible, we get (5.1). Of course (5.3) is also true of \(W=QP\) but it is easy to see if \({\mathrm{ran}}\, P \ne X\), then QP can’t be invertible. (5.2) isn’t invertible for an arbitrary pair of projections, for if \(\varphi \in ({\mathrm{ran}}\, P\cap \ker Q)\cup (\ker P\cap {\mathrm{ran}}Q)\), then \(W\varphi =0\). But when \(||P-Q||<1\), this space is trivial, so under the norm condition, W might be (and as we’ll see is) invertible.

Define

$$\begin{aligned}&\widetilde{W} = PQ+(1-P)(1-Q) \end{aligned}$$
(5.4)
$$\begin{aligned}&A = P-Q; \qquad B=1-P-Q \end{aligned}$$
(5.5)

The following easy algebraic calculations are basic to the rich structure of pairs of projections

$$\begin{aligned} A^2 + B^2 = {\varvec{1}}; \qquad AB+BA=0 \end{aligned}$$
(5.6)

(which Avron [21] calls the anticommutative Pythagorean Theorem). Moreover

$$\begin{aligned} PA^2 = P-PQP = A^2P \end{aligned}$$
(5.7)

so

$$\begin{aligned}{}[P,A^2]=[Q,A^2]=[P,B^2]=[Q,B^2]=0 \end{aligned}$$
(5.8)

In addition

$$\begin{aligned} (PQ-QP) = BA; \qquad (PQ-QP)^2 = A^4-A^2 \end{aligned}$$
(5.9)

Finally,

$$\begin{aligned} W\widetilde{W} = \widetilde{W}W = 1-A^2 \end{aligned}$$
(5.10)

This means that W is invertible if \(||A|| < 1\), so for (5.1), we could take \(U=W\) but that won’t be unitary when X is a Hilbert space and the two projections are self-adjoint, so, following Kato, we make a slightly different choice

First Proof of Theorem 5.1

If \(||A|| < 1\), we can define

$$\begin{aligned} (1-A^2)^{-1/2} = \sum _{n=0}^{\infty } (-1)^n \left( {\begin{array}{c}-\tfrac{1}{2}\\ n\end{array}}\right) A^{2n} \end{aligned}$$
(5.11)

where as usual

$$\begin{aligned} \left( {\begin{array}{c}-\tfrac{1}{2}\\ n\end{array}}\right) = \frac{(-\tfrac{1}{2})(-\tfrac{3}{2})\ldots (\tfrac{1}{2}-n)}{n!} \end{aligned}$$
(5.12)

Since \(j^{-1}|\tfrac{1}{2} -j| < 1\) for \(j=1,2,\ldots \), we have that \(\sup _n |\left( {\begin{array}{c}-1/2\\ n\end{array}}\right) | < 1\), so if \(||A|| < 1\), the series in (5.11) converges and series manipulation proves that

$$\begin{aligned} \left[ (1-A^2)^{-1/2}\right] ^2 = (1-A^2)^{-1} \end{aligned}$$
(5.13)

which in turn implies that if we define

$$\begin{aligned} U = W(1-A^2)^{-1/2} = (1-A^2)^{-1/2}W, \qquad \widetilde{U} = (1-A^2)^{-1/2}\widetilde{W} \end{aligned}$$
(5.14)

then, by (5.9)

$$\begin{aligned} U\widetilde{U} = \widetilde{U}U = {\varvec{1}}, \qquad UP=QU \end{aligned}$$
(5.15)

so U is invertible and (5.1) holds.

Since \((1-A^2)^{-1/2}\) is a norm limit of polynomials in P and Q, so is U proving (a). If X is a Hilbert space and \(P^*=P, Q^*=Q\), then \(\widetilde{U} = U^*\), so by (5.15) U is unitary, proving (b). \(\square \)

Theorem 5.1 for the self-adjoint Hilbert space case goes back to Sz-Nagy [454] who was interested in the result because of its application to the convergent perturbation theory of eigenvalues. His formula for U looks more involved than (5.2)/(5.14). Wolf [689] then extended the result to general Banach spaces but needed \(||P||^2||P-Q|| < 1\) and \(||1-P||^2||P-Q|| < 1\) which is a strictly stronger hypothesis.

In [320], Kato proved that if \(\beta \mapsto P(\beta )\) is a real analytic family of projections on a Banach space for \(\beta \in [0,B]\), then there exists a real analytic family of invertible maps, \(U(\beta )\) so that \(U(\beta )P(\beta )U(\beta )^{-1}=P(0)\). He did this using the same formalism he had developed for his treatment of the adiabatic theorem (Kato [313] and Sect. 17 in Part 2). In 1955, in an unpublished report [324], Kato presented all of the algebra above (except for \(AB+BA=0\)) and used it to prove Theorem 5.1 exactly as we do above.

After Avron et al [31] found and exploited \(AB+BA=0\) (see below), Kato told me that he had found this relation about 1972 but didn’t have an application. Because [324] isn’t widely available, the standard reference for his approach to pairs of projections is his book [345]. In [324], Kato noted that his expression was equal to the object found by Sz-Nagy [454] but in the Banach space case, one could get better estimates from his formula for the object. In that note, he also remarked that when \(||P-Q|| < 1\), one can find a smooth, one parameter family of projections, \(P(t),\, 0 \le t \le 1\) with \(P(0) = P\) and \(P(1) = Q\) so that the U obtained via his earlier method of solving a differential equation was identical to the U of (5.2)/(5.14).

While this concludes Kato’s contribution to the subject of pairs of projections, I would be remiss if I didn’t say more about the rich structure of this simple setting, especially when \(||P-Q|| \ge 1\) (in the self-adjoint Hilbert space setting one has that \(||P-Q|| \le 1\) but for non-self-adjoint projections and the general case of Banach spaces, one often has \(||P-Q|| > 1\)). There are two approaches. The one we’ll discuss first is due to Avron–Seiler–Simon [31] and uses algebraic relations, especially (5.6). Since \(AB+BA=0\) is the signature of supersymmetry, we’ll call this the supersymmetric approach. Here is a typical use of this method:

Theorem 5.2

(Avron et. al. [31]) Let P and Q be self-adjoint projections so that \(P-Q\) is compact. For \(\lambda \in [-1,1]{\setminus }\{0\}\), let \(P_\lambda \) be the projection onto the eigenspace \({\mathcal {H}}_\lambda \equiv \{\varphi \,|\,A\varphi = \lambda \varphi \}\)

  1. (a)

    If \(\lambda \ne \pm 1\), then

    (5.16)

    is a unitary map of \({\mathcal {H}}_\lambda \) onto \({\mathcal {H}}_{-\lambda }\).

  2. (b)

    For such \(\lambda \), we have that

    $$\begin{aligned} \dim {\mathcal {H}}_{-\lambda } = \dim {\mathcal {H}}_{\lambda } \end{aligned}$$
    (5.17)
  3. (c)

    If \(P-Q\) is trace class, then

    $$\begin{aligned} {\mathrm{Tr}}(P-Q) \in {\mathbb {Z}}\end{aligned}$$
    (5.18)
  4. (d)

    If \(||P-Q|| < 1\), then \(U \equiv \mathrm {sgn}(B)\) is a unitary operator obeying (5.1). Indeed,

    $$\begin{aligned} UPU^{-1} = Q, \qquad UQU^{-1} = P \end{aligned}$$
    (5.19)

Remarks

  1. 1.

    By \(\mathrm {sgn}(B)\), we mean f(B) defined by the functional calculus [616, Section 5.1] where

    $$\begin{aligned} f(x) = \left\{ \begin{array}{ll} \,\,\,\, 1, &{} x > 0\\ -1, &{} x<0 \\ \,\,\,\, 0, &{} x=0 \end{array} \right. \end{aligned}$$

    This is unitary because \(||A|| < 1\) and \(B^2 = 1-A^2\) implies that \(\ker B = \{0\}\). One can also write

    $$\begin{aligned} U = B(1-A^2)^{-1/2} \end{aligned}$$
    (5.20)
  2. 2.

    If we use (5.20) to define U in the general Banach space case when \(||P-Q|| < 1\), the same proof shows that we have (5.19). Indeed, since \([A^2,B]=0\), we have that \(U^2={\varvec{1}}\) so (5.1) implies \(UQU^{-1}=P\). So we get another proof of Theorem 5.1 in the general Banach space case. However if \(P=Q\), then \(B=1-2P\) and \(A=0\) so by (5.20)

    $$\begin{aligned} U = {\varvec{1}}- 2P \end{aligned}$$
    (5.21)

    Thus, \(U(P,P) \ne {\varvec{1}}\) but see the remarks after Theorem 5.1.

  3. 3.

    That \({\mathrm{Tr}}(P-Q) \in {\mathbb {Z}}\) was first proven by Effros [134] and can also be proven using the Krein spectral shift [616, Problem 5.9.1]. It is also true if PQ are not necessarily self-adjoint projections in a Hilbert space and for suitable Banach space cases; see below.

Proof

(a) If \(A\varphi = \lambda \varphi \), then

$$\begin{aligned} AB\varphi = -BA\varphi = -\lambda B\varphi \end{aligned}$$
(5.22)

so B maps \({\mathcal {H}}_\lambda \) to \({\mathcal {H}}_{-\lambda }\). Since

$$\begin{aligned} ||B\varphi ||^2=\langle \varphi ,B^2\varphi \rangle =\langle \varphi ,(1-A^2)\varphi \rangle =(1-\lambda ^2)||\varphi ||^2 \end{aligned}$$

we see that V is norm preserving.

If \(\psi \in {\mathcal {H}}_{-\lambda }\), then, by the above, \(\varphi \equiv (1-\lambda ^2)^{-1}B\psi \in {\mathcal {H}}_\lambda \) and \(B\varphi = \psi \) so is all of \({\mathcal {H}}_{-\lambda }\) and thus V is unitary.

(b) is immediate from (a)

(c) Lidskii’s Theorem for self-adjoint operators says that if C is a self-adjoint trace class operator, and for any \(\lambda \ne 0\), we define \({\mathcal {H}}_\lambda = \{\varphi \,|\,C\varphi = \lambda \varphi \}\), then

$$\begin{aligned} {\mathrm{Tr}}(C) = \sum _{\lambda \ne 0} \lambda \dim ({\mathcal {H}}_\lambda ) \end{aligned}$$
(5.23)

For the self-adjoint case this is easy since \({\mathrm{Tr}}(C) = \sum _{n=0}^{\infty } \langle \psi _n,C\psi _n \rangle \) for any trace class operator and any orthonormal basis (see [616, Theorem 3.6.7]) and any self-adjoint compact operator has an orthonormal basis of eigenvectors (see [616, Theorem 3.2.1]). By (b), the terms of (5.23) for \(\lambda \) and \(-\lambda \) when \(C=A\) cancel so long as \(\lambda \ne \pm 1\), so

$$\begin{aligned} {\mathrm{Tr}}(A) = \dim {\mathcal {H}}_1 - \dim {\mathcal {H}}_{-1} \in {\mathbb {Z}}\end{aligned}$$
(5.24)

(d) Since \(||A|| < 1\), \(B^2 = {\varvec{1}}- A^2 \ge \epsilon >0\) for \(\epsilon = 1-||A||^2\). Thus, |B| is invertible and

$$\begin{aligned} U=B|B|^{-1} \end{aligned}$$
(5.25)

is unitary since \(U=U^*\) and \(U^2=B^2|B|^{-2} = {\varvec{1}}\).

Moreover, since |B| commutes with A and B (since \([B^2,P]=[B^2,Q]=0\)) and B anticommutes with A, we see that

$$\begin{aligned} UBU^{-1} = B, \qquad UAU^{-1} = -A \end{aligned}$$
(5.26)

Since

$$\begin{aligned} P = \tfrac{1}{2}(A-B+{\varvec{1}}), \qquad Q = \tfrac{1}{2}(-A-B+{\varvec{1}}) \end{aligned}$$
(5.27)

(5.26) implies (5.19). \(\square \)

We can also say something about non-self-adjoint projections on Hilbert spaces and also about the general Banach space case. The spectral theory of general compact operators, A, is more subtle than the self-adjoint case ([616, Section 3.3]). One has that \(\sigma (A){\setminus }\{0\}\) is discrete, a notion explained in Sect. 2. Thus, if we define for \(\lambda \in \sigma (A){\setminus }\{0\}\)

$$\begin{aligned} P_\lambda = \frac{1}{2\pi i}\oint _{|z-\lambda |=\delta } \frac{dz}{z-A} \end{aligned}$$
(5.28)

for \(\delta < {\mathrm{dist}}(\lambda ,\sigma (A){\setminus }\{\lambda \})\) and \({\mathcal {H}}_\lambda = {\mathrm{ran}}\, P_\lambda \), then \(\dim ({\mathcal {H}}_\lambda ) < \infty \) and is called the algebraic multiplicity of \(\lambda \). Also, as explained in Sect. 2,

$$\begin{aligned} AP_\lambda = \lambda P_\lambda + N \end{aligned}$$
(5.29)

where N is nilpotent, indeed \(N^{\dim ({\mathcal {H}}_\lambda )} = 0\) so

$$\begin{aligned} \varphi \in {\mathcal {H}}_\lambda \Rightarrow (A-\lambda )^{\dim ({\mathcal {H}}_\lambda )}\varphi = 0 \end{aligned}$$
(5.30)

Lidskii’s Theorem says that for trace class Hilbert space operators, (5.23) still holds. Its proof [616, Section 3.12] is more subtle. Lidskii’s Theorem doesn’t hold on all Banach spaces (where there is an analog of the trace on a class known as nuclear operators). We say that an operator, C on a Banach space, X, obeys Lidskii’s Theorem if C is nuclear and obeys (5.23)—see [151, 484] for discussions on when this holds.

Theorem 5.3

Let PQ be two projections on a Banach space, X, so that \(A=P-Q\) is compact. Then

  1. (a)

    \(\lambda \in \sigma (A){\setminus }\{1,-1\} \Rightarrow -\lambda \in \sigma (A)\)

  2. (b)

    For such \(\lambda \), we have that

    $$\begin{aligned} \dim {\mathcal {H}}_{\lambda } = \dim {\mathcal {H}}_{-\lambda } \end{aligned}$$
    (5.31)
  3. (c)

    If \(\pm 1 \notin \sigma (A)\), then there exists an invertible map U so that (5.19) holds.

  4. (d)

    If A obeys Lidskii’s theorem, then \({\mathrm{Tr}}(P-Q) \in {\mathbb {Z}}\).

Remark

(d) was proven by Kalton [304] using different methods. The results (a)–(c) and the proof we give of (d) is new in the present paper.

Proof

(a),(b) For any \(z \in {\mathbb {C}}\), we have that \(B(A-z) = -(A+z)B\) so, if \(z,-z \notin \sigma (A)\), we see that

$$\begin{aligned} B(A-z)^{-1} = -(A+z)^{-1}B \end{aligned}$$
(5.32)

Since \(\sigma (A){\setminus }\{0\}\) is a set of isolated points, for any \(\lambda \ne 0\), we can find \(\epsilon _\lambda > 0\) so that \(\sigma (A) \cap \{z\,|\,0<|z-\lambda | \le \epsilon _\lambda \} = \emptyset \). Taking into account that \(z \mapsto -z\) reverses the direction of a contour, by picking \(0<\delta <\min (\epsilon _\lambda ,\epsilon _{-\lambda })\) in (5.28) and using (5.32), we see that

$$\begin{aligned} BP_\lambda = P_{-\lambda }B \end{aligned}$$
(5.33)

where \(P_\lambda \) is defined by (5.28) with \(\delta \) small even if \(\lambda \notin \sigma (A)\) (in which case \(P_\lambda =0\)).

Suppose \(\lambda \ne \pm 1\). Since A leaves \({\mathcal {H}}_\lambda \) invariant and , we have that \((1-A^2)=(1-A)(1+A)\) restricted to \({\mathcal {H}}_\lambda \) has an inverse R. Thus RB is a left inverse to B as a map of \({\mathcal {H}}_\lambda \rightarrow {\mathcal {H}}_{-\lambda }\) so B as a map between those spaces is 1–1. This implies that \(\dim {\mathcal {H}}_\lambda \le \dim {\mathcal {H}}_{-\lambda }\). By interchanging \(\lambda \) and \(-\lambda \), we see that (5.31) holds which implies (a) and (b).

(c) Since

$$\begin{aligned} BB=BB, \qquad BA=-AB \end{aligned}$$

(5.27) implies that

$$\begin{aligned} BP=QB,\qquad BQ=PB \end{aligned}$$
(5.34)

We can take \(U=B\) if we show that B is invertible. Since \(\pm 1 \notin \sigma (A)\), we see that \((1-A)^{-1}(1+A)^{-1}B\) is a two sided inverse for B.

(d) From Lidskii’s theorem and (5.31), we see that (5.24) holds.

Our final result from the supersymmetric approach returns to the self-adjoint case. We define for projections PQ:

$$\begin{aligned} {\mathcal {K}}_{P,Q} = {\mathrm{ran}}\, P \cap \ker Q = \{\varphi \,|\, P\varphi = \varphi , \, Q\varphi = 0\} \end{aligned}$$
(5.35)

Theorem 5.4

Let PQ be two self-adjoint projections on a Hilbert space, \({\mathcal {H}}\). Then there exists a unitary map, U, obeying (5.19) if and only if

$$\begin{aligned} \dim {\mathcal {K}}_{P,Q} = \dim {\mathcal {K}}_{1-P,1-Q} \end{aligned}$$
(5.36)

Moreover, if such a U exists, one can choose it so that

$$\begin{aligned} U=U^*, \qquad U^2 = {\varvec{1}}\end{aligned}$$
(5.37)

Remarks

  1. 1.

    In (5.36), both sides may be infinite.

  2. 2.

    If \(\pm 1\) are isolated points of the spectrum of A and are discrete eigenvalues, then \(K:{\mathrm{ran}}\, P \rightarrow {\mathrm{ran}}Q\) by is a Fredholm operator [616, Section 3.15], both sides of (5.36) are finite and their difference is the index of K. So, in this case, the theorem says that U obeying (5.19) exists if and only if index\((K)=0\). This special case is in [31].

  3. 3.

    The general case of this theorem is due to Wang, Du and Dou [694] whose proof used the Halmos representation discussed below. Our proof here is from Simon [617]. Two recent papers [70, 124] classify all solutions of (5.19)

  4. 4.

    Operators obeying (5.37) are called symmetries by Halmos–Kakutani [217]

Proof

If U exists, it is easy to see that U must be a unitary map of \({\mathcal {K}}_{P,Q}\) to \({\mathcal {K}}_{1-P,1-Q}\), so (5.36) must hold.

For the converse, suppose that (5.36) holds. Clearly, PQ leave both \({\mathcal {K}}_{P,Q}\) and \({\mathcal {K}}_{1-P,1-Q}\) invariant and so \({\mathcal {H}}_1 = {\mathcal {K}}_{P,Q} \oplus {\mathcal {K}}_{1-P,1-Q}\). Let \({\mathcal {H}}_2 = {\mathcal {H}}_1^\perp \) so \({\mathcal {H}}={\mathcal {H}}_1\oplus {\mathcal {H}}_2\). Since (5.36) is assumed, there exists \(W:{\mathcal {K}}_{P,Q} \rightarrow {\mathcal {K}}_{1-P,1-Q} \) unitary and onto. Define on \({\mathcal {H}}_1\) as a direct sum

$$\begin{aligned} U_1 = \left( \begin{array}{cc} 0 &{} W \\ W^* &{} 0 \\ \end{array} \right) \end{aligned}$$

Then \(U_1^2={\varvec{1}}\) and \(U_1^* = U_1\) and for the restrictions of PQ to \({\mathcal {H}}_1\), we have that \(U_1P_1U_1^{-1} = Q_1, \, U_1Q_1U_1^{-1} = P_1\).

So it suffices to prove the result for \({\mathcal {H}}_2\), i.e. in the special case that \({\mathcal {K}}_{P,Q} = {\mathcal {K}}_{1-P,1-Q} = \{0\}\). If that holds, we have that \(\ker (1-A^2) = \{0\}\), so \(\ker (B) = \{0\}\) and \(U_2 \equiv \mathrm {sgn}(B)\) is unitary. Since \(U_2A_2=-A_2U_2,\, U_2B_2=B_2U_2\), we get that \(U_2P_2U_2^{-1} = Q_2,\, U_2Q_2U_2^{-1} = P_2\) by (5.27). Clearly, also \(U_2^2={\varvec{1}},\, U_2^*=U_2\).

Our final big topic in this section concerns the Halmos representation. As a first step, we note that

Proposition 5.5

Let PQ be two orthogonal projections on a Hilbert space, \({\mathcal {H}}\) and let AB be given by (5.5). Then:

  1. (a)

    \({\mathcal {K}}_{P,Q} = \{\varphi \,|\,A\varphi =\varphi \}, \qquad {\mathcal {K}}_{1-P,1-Q} = \{\varphi \,|\,A\varphi =-\varphi \}\)

  2. (b)

    \({\mathcal {K}}_{P,1-Q} = \{\varphi \,|\,B\varphi =-\varphi \},\qquad {\mathcal {K}}_{1-Q,P} = \{\varphi \,|\,B\varphi =\varphi \}\)

  3. (c)

    \({\mathcal {K}}_{P,1-Q}\oplus {\mathcal {K}}_{1-Q,P}= \{\varphi \,|\,A\varphi =0\}\)          \({\mathcal {K}}_{P,Q}\oplus {\mathcal {K}}_{1-P,1-Q}= \{\varphi \,|\,B\varphi =0\}\)

  4. (d)

    These four spaces are mutually orthogonal.

  5. (e)

    All four spaces are \(\{0\}\) if and only if \(\ker A=\ker B = \{0\}\).

Proof

  1. (a)

    \(P \le {\varvec{1}},\,Q \ge 0\) so \(A\varphi =\varphi \Rightarrow ||\varphi ||^2 \ge \langle \varphi ,P\varphi \rangle = ||\varphi ||^2+\langle \varphi ,Q\varphi \rangle \Rightarrow \langle \varphi ,Q\varphi \rangle = 0 \Rightarrow \langle Q\varphi ,Q\varphi \rangle = 0 \Rightarrow Q\varphi = 0 \Rightarrow \) (since \((P-Q)\varphi =\varphi \)) \(P\varphi =\varphi \Rightarrow \varphi \in {\mathcal {K}}_{P,Q}\). Conversely, \( \varphi \in {\mathcal {K}}_{P,Q} \Rightarrow P\varphi =\varphi \& Q\varphi =0 \Rightarrow A\varphi =\varphi \). The proof of the second statement is similar.

  2. (b)

    Similar to (a) using \(B=(1-P)-Q\).

  3. (c)

    The two spaces in the first statement are orthonormal by (b) and the mutual orthogonality of eigenspaces. Since \(A^2\varphi =(1-B^2)\varphi \), that direct sum is \(\ker A^2=\ker A\). Conversely, if \(A\varphi = 0\), then \((1-B^2)\varphi =A^2\varphi =0\). If \(\varphi _\pm = \tfrac{1}{2}(1 {\mp } B)\varphi \), then \(\varphi _\pm \in \ker (1\pm B)\) and \(\varphi =\varphi _+ + \varphi _-\), so by (b), \(\varphi \in {\mathcal {K}}_{P,1-Q}\oplus {\mathcal {K}}_{1-Q,P}\). The second relation has a similar proof.

  4. (d)

    Immediate from the orthogonality of different eigenspaces of a self-adjoint operator.

  5. (e)

    Immediate from (c). \(\square \)

We say that two orthogonal projections are in generic position if \(\ker A=\ker B = \{0\}\), equivalently if \({\mathcal {K}}_{P,Q}, {\mathcal {K}}_{1-P,1-Q}, {\mathcal {K}}_{P,1-Q}, {\mathcal {K}}_{1-Q,P}\) are all \(\{0\}\). The Halmos two projection theorem says

Theorem 5.6

(Halmos Two Projection Theorem) Let PQ be self-adjoint projections on a Hilbert space, \({\mathcal {H}}\) which are in generic position. Let \({\mathcal {B}}_1={\mathrm{ran}}\, P,\,{\mathcal {B}}_2 = {\mathrm{ran}}(1-P)\). Then there exists a unitary map W from \({\mathcal {B}}_1\) onto \({\mathcal {B}}_2\) and self-adjoint operators \(C>0, \, S>0\) on \({\mathcal {B}}_1\) with

$$\begin{aligned} C^2+S^2 = {\varvec{1}}, \qquad [C,S] = 0 \end{aligned}$$
(5.38)

so that under \({\mathcal {H}}= {\mathcal {B}}_1\oplus {\mathcal {B}}_2\),

$$\begin{aligned} P= & {} \left( \begin{array}{cc} {\varvec{1}}&{} 0 \\ 0 &{} 0 \\ \end{array} \right) \end{aligned}$$
(5.39)
$$\begin{aligned} Q= & {} \left( \begin{array}{cc} C^2 &{} CSW^{-1} \\ WCS &{} WS^2W^{-1} \\ \end{array} \right) \end{aligned}$$
(5.40)

Remarks

1. There are alternate ways that this theorem is often expressed. Rather than state it for pairs with generic position, the theorem says that the space is a direct sum of six spaces, two of the form just given and the other four simultaneous eigenspaces with \(A\varphi = \lambda \varphi ,\, B\varphi =\kappa \varphi \) with \(\lambda ,\kappa \in \{0,1\}\). Sometimes, (5.40) is written:

$$\begin{aligned} Q = \left( \begin{array}{cc} {\varvec{1}}&{} 0 \\ 0 &{} W \\ \end{array} \right) \left( \begin{array}{cc} C^2 &{} CS \\ CS &{} S^2 \\ \end{array} \right) \left( \begin{array}{cc} {\varvec{1}}&{} 0 \\ 0 &{} W \\ \end{array} \right) ^{-1} \end{aligned}$$

where the first factor maps \({\mathcal {B}}_1\oplus {\mathcal {B}}_1\) to \({\mathcal {B}}_1\oplus {\mathcal {B}}_2\) and the middle factor is an operator on \({\mathcal {B}}_1\oplus {\mathcal {B}}_1\). Some authors even implicitly use the first matrix above to identify \({\mathcal {H}}\) with \({\mathcal {B}}_1\oplus {\mathcal {B}}_1\) and only write the middle factor above.

2. C and S stand, of course, for \(\text {cosine}\) and \(\text {sine}\). One often defines an operator, \(\Theta \) with spectrum in \([0,\pi /2]\) so that \(C=\cos (\Theta ),\,S=\sin (\Theta )\). While 0 and/or 1 may lie in the spectrum of \(\Theta \), they cannot be eigenvalues.

3. This result is due to Halmos [216]. There were earlier related results by Krein et. al. [391], Dixmier [120] and Davis [106]. The proof we give here is due to Amrein–Sinha [13].

Proof

By the above

$$\begin{aligned} \ker A = \ker B = \{0\} \end{aligned}$$
(5.41)

Write the polar decompositions [616, Section 2.4]

$$\begin{aligned} A = U_A|A|,\qquad B=U_B|B| \end{aligned}$$
(5.42)

By (5.41), \(U_A\) and \(U_B\) are unitary and as functions of A and B respectively, they commute with A and B respectively. It also holds that they each commute with both |A| and |B| (since, for example, |B| commutes with A and so |A| and so \(U_A = s-\lim A(|A|+\epsilon )^{-1}\)). Multiplying \(AB+BA\) by \((|A|+\epsilon )^{-1}\) and \((|B|+\epsilon )^{-1}\) and taking \(\epsilon \) to zero, we see that

$$\begin{aligned} U_AU_B=-U_BU_A \Rightarrow (U_AU_B)^2 = -{\varvec{1}}\end{aligned}$$
(5.43)

We’ve already seen that \(\cdot \mapsto U_A \cdot U_A^{-1}\) interchanges P and Q. Since B is the A when P is replaced by \(1-P\), we see that \(\cdot \mapsto U_B \cdot U_B^{-1}\) interchanges Q and \(1-P\) and similarly, it interchanges \(1-Q\) and P.

Let \(U=U_AU_B\). Then we have that

$$\begin{aligned} UPU^{-1}=(1-P), \qquad&U(1-P)U^{-1}=P \nonumber \\ UQU^{-1}=(1-Q), \qquad&U(1-Q)U^{-1}=Q \end{aligned}$$
(5.44)

which, in particular, implies that \(U[{\mathcal {B}}_1]\) is all of \({\mathcal {B}}_2\) (so they have the same dimension).

Define which we’ve just seen is a unitary map from \({\mathcal {B}}_1\) onto \({\mathcal {B}}_2\). In the \({\mathcal {B}}_1\oplus {\mathcal {B}}_2\) decomposition, (5.39) is obvious. Moreover the decomposition of Q is

(5.45)

By the formula for B, \(BP=-QP\), so \(P|B|^2P=PB^2P=PQP\). Similarly \((1-P)|A|^2(1-P) = (1-P)Q(1-P)\), \(PBA(1-P)=PQ(1-P)\) and \((1-P)ABP=(1-P)QP\).

\(P|B|^2P\) is already an operator on \({\mathcal {B}}_1\). Using \([U,|A|^2] = 0\), we can write

$$\begin{aligned} (1-P)|A|^2(1-P)=UPU^{-1}|A|^2UPU^{-1}=UP|A|^2PU^{-1} \end{aligned}$$

Next note that . If we define

(5.46)

then the above calculation and similar calculations on the off-diagonal piece implies (5.40).

Böttcher–Spitkovsky [69] is a review article on lots of applications of the Halmos representation. We mention also Lenard [421] who computes the joint numerical range (i.e. \(\{(\langle \varphi ,P\varphi \rangle ,\langle \varphi ,Q\varphi \rangle )\,|\,||\varphi ||=1\}\)) for pairs of projections in terms of the operator \(\Theta \) of remark 2 to Theorem 5.6. This range is a union of certain ellipses.

Finally, we mention one result that Kato proved in 1960 [332] that turns out to be connected to pairs of self-adjoint projections, although Kato didn’t himself mention or exploit this connection.

Theorem 5.7

Let \(\Pi \) be a general (i.e. not necessarily self-adjoint) projection in a Hilbert space, \({\mathcal {H}}\). Suppose that \(\Pi \ne 0,{\varvec{1}}\). Then

$$\begin{aligned} ||\Pi ||=||{\varvec{1}}-\Pi || \end{aligned}$$
(5.47)

Kato has this as a Lemma in a technical appendix to [332], but it is now regarded as a significant enough result that Szyld [632] wrote an article to advertise it and explain myriad proofs ([69] also discusses proofs). Del Pasqua [111] and Ljance [433] found proofs slightly before Kato but the methods are different and independent; indeed, for many years, no user of the result seemed to know of more than one of these three papers.

Ljance’s proof [433] shows a close connection to pairs of projections. Let P be the orthogonal projection on \({\mathrm{ran}}(\Pi )\) and Q the orthogonal projection onto \({\mathrm{ran}}({\varvec{1}}-\Pi )\) (P and Q must obey \(\ker (P)\cap \ker (Q) = \ker (1-P)\cap \ker (1-Q)=\{0\}\) and every such pair of orthogonal projections corresponds to an oblique projection \(\Pi \)). Then one can show Ljance’s formula (see [69])

$$\begin{aligned} ||\Pi || = \frac{1}{(1-||PQ||^2)^{1/2}} \end{aligned}$$
(5.48)

so that (5.47) follows from \(||QP||=||(QP)^*||=||PQ||\).

Del Pasqua [111] noted that (5.47) might fail in general Banach spaces—indeed, it is now known [211] that if (5.47) holds for all projections in a Banach space, X, then its norm comes from an inner product.

6 Eigenvalue perturbation theory, V: Temple–Kato inequalities

While strictly speaking the central material in this section is not so much about perturbation theory as variational methods, the subjects are related as Kato mentioned in several places, so we put it here. In fact, following Kato, we’ll see the inequalities proven here can be used to prove certain irregular perturbations yield asymptotic perturbation series. Kato also had several other papers about variational methods for scattering phase shifts [311, 317, 318] and for an aspect of Thomas–Fermi theory [267] (not the energy variational principle central to TF theory but one concerning a technical issue connected to the density at the nucleus). But none of these other papers had the impact of the work we discuss in this review, so we will not discuss them further.

Let A be a self-adjoint operator bounded from below and \(||\varphi ||=1\) with \(\varphi \in D(A)\). Then Rayleigh’s principle says that

$$\begin{aligned} \lambda \equiv \inf \sigma (A) \le \langle \varphi ,A\varphi \rangle \equiv \eta _\varphi \end{aligned}$$
(6.1)

In 1928, Temple [637, 638] proved a complementary lower bound in case

$$\begin{aligned} \sigma (A) \subset \{\lambda \} \cup [\mu ,\infty ) \end{aligned}$$
(6.2)

with \(\mu > \lambda \) and \(\lambda \) a simple eigenvalue. So long as

$$\begin{aligned} \eta _\varphi < \mu \end{aligned}$$
(6.3)

we have Temple’s inequality

$$\begin{aligned} \lambda \ge \eta _\varphi - \frac{\epsilon _\varphi ^2}{\mu -\eta _\varphi } \end{aligned}$$
(6.4)

where \(\epsilon _\varphi \ge 0\) and

$$\begin{aligned} \epsilon _\varphi ^2 \equiv ||(A-\eta _\varphi )\varphi ||^2 = \langle \varphi ,A^2\varphi \rangle -\langle \varphi ,A\varphi \rangle ^2 \end{aligned}$$
(6.5)

Temple’s inequality had historical importance. Before the advent of modern computers, variational calculations were difficult and estimating their accuracy was important. If \(\mu ^* \le \mu \) (i.e. if one had a possibly crude lower bound on the second eigenvalue), then (6.2)/(6.4) \(\Rightarrow |\lambda -\eta _\varphi | \le \epsilon _\varphi ^2(\mu ^*-\eta _\varphi )^{-1}\) so long as \(\eta _\varphi < \mu ^*\). One of the early success of perturbation theoretic quantum electrodynamics was the calculation of the Lamb shift in Hydrogen. That was possible because the unshifted Hydrogen ground state was known precisely. To check the Lamb shift in Helium, one needed to know its ground state to very high order (the Lamb shift is about one hundred thousandth of that binding energy). The necessary calculations were done by Kinoshita [371, 372] and Pekeris [478,479,480] using variational calculations which in Pekeris’ case involved 1078 parameter trial functions. They used Temple’s inequality to estimate how accurately they had computed this ground state energy. In fact, Kinoshita sketched a proof of Temple’s inequality in his paper using Kato’s method (he quoted Kato’s paper). The result was the verification of the Lamb shift in Helium to within experimental error.

In 1949, Kato [307] (with an announcement in Physical Review [312]) in one of his little gems found a simple proof of Temple’s inequality and also extended the result to any eigenvalue. Here is his theorem:

Theorem 6.1

(Temple–Kato inequality) Let A be any self-adjoint operator and let \(\varphi \in D(A)\). Let \((\alpha ,\zeta ) \subset {\mathbb {R}}\) so that

$$\begin{aligned} \alpha< \eta _\varphi < \zeta \end{aligned}$$
(6.6)

and so that

$$\begin{aligned} \epsilon _\varphi ^2 < (\eta _\varphi - \alpha )(\zeta -\eta _\varphi ) \end{aligned}$$
(6.7)

Then:

$$\begin{aligned} \text {(a)} \qquad \qquad \sigma (A) \cap (\alpha ,\zeta ) \ne \emptyset \end{aligned}$$

If \(\sigma (A) \cap (\alpha ,\zeta )\) contains only a single point, \(\lambda \), then

$$\begin{aligned} \text {(b)} \qquad \qquad \eta _\varphi -\frac{\epsilon _\varphi ^2}{\zeta -\eta _\varphi } \le \lambda \le \eta _\varphi + \frac{\epsilon _\varphi ^2}{\eta _\varphi -\alpha } \end{aligned}$$
(6.8)

If, in addition, \(\lambda \) is a simple eigenvalue with associated eigenvector, \(\psi \), with \(||\psi || = 1\) and \(\langle \psi ,\varphi \rangle \ge 0\) and if \(\epsilon _\varphi < \delta \equiv \min (\eta _\varphi -\alpha ,\zeta -\eta _\varphi )\), then

$$\begin{aligned} \text {(c)} \qquad \qquad ||\varphi -\psi || \le \left[ 2-2\left( 1-\frac{\epsilon _\varphi ^2}{\delta ^2}\right) ^{1/2}\right] ^{1/2} \end{aligned}$$
(6.9)

Remarks

  1. 1.

    As we’ll see, a version of (6.8) holds even if we don’t suppose there is only one point in \(\sigma (A) \cap (\alpha ,\zeta )\), namely if

    $$\begin{aligned} \gamma _0 = \eta _\varphi - \frac{\epsilon _\varphi ^2}{\zeta -\eta _\varphi }; \qquad \kappa _0 = \eta _\varphi + \frac{\epsilon _\varphi ^2}{\eta _\varphi -\alpha } \end{aligned}$$
    (6.10)

    then \(\sigma (A) \cap (\alpha ,\kappa _0] \ne \emptyset \) and \(\sigma (A) \cap [\gamma _0,\zeta ) \ne \emptyset \)

  2. 2.

    If we take \(\alpha \rightarrow -\infty \) and \(\zeta = \eta _\varphi +1\), the upper bound in (6.8) is just the Rayleigh bound (6.1) and if we take \(\zeta =\mu \), then the lower bound in (6.8) is just Temple’s inequality (6.4).

  3. 3.

    If \(0<\alpha < 1\), then

    $$\begin{aligned} 2-2(1-\alpha ^2)^{1/2}&= \left[ \frac{4-4(1-\alpha ^2)}{2+2(1-\alpha ^2)^{1/2}}\right] \\&\le \frac{4\alpha ^2}{4(1-\alpha ^2)^{1/2}} = \left[ \frac{\alpha }{(1-\alpha ^2)^{1/4}}\right] ^2 \end{aligned}$$

    so (6.9) implies that

    $$\begin{aligned} ||\varphi -\psi || \le \frac{\epsilon }{\delta }\left( 1-\frac{\epsilon ^2}{\delta ^2}\right) ^{-1/4} \end{aligned}$$
    (6.11)

    which is how Kato writes it in Kato [321] (see Knyazev [382] for refined versions of these types of estimates).

The proof we’ll give follows Kato’s approach (see also Harrell [221]). The key to this proof is what Temple [640] calls Kato’s Lemma:

Lemma 6.2

Let A be a self-adjoint operator and \(\varphi \in D(A)\) with \(||\varphi || = 1\). Then

$$\begin{aligned} \sigma (A) \cap (\alpha ,\zeta ) = \emptyset \Rightarrow \langle \varphi ,(A-\alpha )(A-\zeta )\varphi \rangle \ge 0 \end{aligned}$$
(6.12)

Proof

The spectral theorem (see [616, Chapter V and Section 7.2]) says that A is a direct sum of multiplications by x on \(L^2({\mathbb {R}}{\setminus } (\alpha ,\zeta ), d\mu (x))\). Since \((x-\alpha )(x-\zeta ) \ge 0\) for \(x \in {\mathbb {R}}{\setminus } (\alpha ,\zeta )\), we see that \((A-\alpha )(A-\zeta ) \ge 0\).

Remark

While we use the Spectral Theorem (as Kato did), all we need is a spectral mapping theorem, i.e. if \(f(x) = (x-\alpha )(x-\zeta )\), then \(\sigma (f(A))=f[\sigma (A)]\) and the fact that an operator with spectrum in \([0,\infty )\) is positive. The spectral mapping theorem for polynomials holds for elements of any Banach algebra and the proof in [616, Theorem 2.2.6] extends to unbounded operators. That this lemma follows from considerations of resolvents only was noted by Temple [640].

Taking contrapositives in (6.12), we get the following Corollary (if Lemmas are allowed to have Corollaries):

Corollary 6.3

Let A be a self-adjoint operator and \(\varphi \in D(A)\) with \(||\varphi || = 1\). Then

$$\begin{aligned} \langle \varphi ,(A-\alpha )(A-\zeta )\varphi \rangle < 0 \Rightarrow \sigma (A) \cap (\alpha ,\zeta ) \ne \emptyset \end{aligned}$$
(6.13)

The final preliminary of the proof is

Lemma 6.4

Suppose that A is self-adjoint and that \(\lambda \in {\mathbb {R}}\) is an isolated simple eigenvalue with \(A\psi = \lambda \psi \) and \(||\psi ||=1\). If \(\varphi \in D(A)\) with \(||\varphi || = 1\) and

$$\begin{aligned} \epsilon _\varphi < \delta \equiv {{\mathrm{dist}}}(\eta _\varphi ,\sigma (A){\setminus } \{\lambda \}) \end{aligned}$$
(6.14)

and if the phase of \(\psi \) is changed so that \(\langle \varphi ,\psi \rangle \ge 0\), then

$$\begin{aligned} ||\varphi -\psi ||^2 \le 2 - 2\left( 1-\frac{\epsilon _\varphi ^2}{\delta ^2}\right) ^{1/2} \end{aligned}$$
(6.15)

Proof

Let P be the projection onto multiples of \(\psi \). Since \((A-\eta _\varphi )^2 \ge \delta ^2\) on the A-invariant subspace \({\mathrm{ran}}(1-P)\) (by the spectral theorem as in the proof of Lemma 6.2), we have that

$$\begin{aligned} \epsilon _\varphi ^2 = ||(A-\eta _\varphi )\varphi ||^2 \ge \delta ^2 ||(1-P)\varphi ||^2 \end{aligned}$$
(6.16)

so

$$\begin{aligned} ||(1-P)\varphi ||^2 \le \epsilon _\varphi ^2/\delta ^2 < 1 \end{aligned}$$
(6.17)

by (6.14). Since \(||(1-P)\varphi ||^2+||P\varphi ||^2=1\), we see that (if \(\langle \psi ,\varphi \rangle \ge 0\))

$$\begin{aligned} \langle \psi ,\varphi \rangle = ||P\varphi || \ge \left( 1-\frac{\epsilon _\varphi ^2}{\delta ^2}\right) ^{1/2} \end{aligned}$$
(6.18)

Since \(||\psi -\varphi ||^2=2-2\langle \psi ,\varphi \rangle \), (6.15) is immediate.

Proof of Theorem 6.1

  1. (a)

    We have that

    $$\begin{aligned} \langle \varphi ,(A-\alpha )(A-\zeta )\varphi \rangle&= \langle \varphi ,(A-\eta _\varphi )^2\varphi \rangle +\langle \varphi ,\left[ \eta _\varphi ^2+\alpha \zeta -(\alpha +\zeta )A\right] \varphi \rangle \nonumber \\&= \epsilon _\varphi ^2-(\eta _\varphi -\alpha )(\zeta -\eta _\varphi ) < 0 \end{aligned}$$
    (6.19)

    by (6.7). By Corollary 6.3, we see that \(\sigma (A) \cap (\alpha ,\zeta ) \ne \emptyset \).

  2. (b)

    As in the proof of (6.19), for any \(\gamma , \kappa \), we have that

    $$\begin{aligned} \langle \varphi ,(A-\gamma )(A-\kappa )\varphi \rangle = \epsilon _\varphi ^2-(\eta _\varphi -\gamma )(\kappa -\eta _\varphi ) \end{aligned}$$
    (6.20)

    Fix \(\kappa =\zeta \). Then, using \(\zeta > \eta _\varphi \):

    $$\begin{aligned} \text {RHS of } (6.20)< 0 \iff \gamma < \gamma _0 \end{aligned}$$
    (6.21)

    (with \(\gamma _0\) given by (6.10)) so by Corollary 6.3,

    $$\begin{aligned} \gamma < \gamma _0 \Rightarrow \sigma (A) \cap (\gamma ,\zeta ) \ne \emptyset \end{aligned}$$
    (6.22)

    Since \(\sigma (A)\) is closed, this implies that

    $$\begin{aligned} \sigma (A) \cap [\gamma _0,\zeta ) \ne \emptyset \end{aligned}$$
    (6.23)

    Similarly,

    $$\begin{aligned} \sigma (A) \cap (\alpha ,\kappa _0] \ne \emptyset \end{aligned}$$
    (6.24)

    In particular, if there is a single point, \(\lambda \), in \((\alpha ,\zeta )\), we must have that \(\lambda \in (\alpha ,\kappa _0] \cap [\gamma _0,\zeta ) = [\gamma _0,\kappa _0]\) which is (6.8).

  3. (c)

    This is Lemma 6.4.

\(\square \)

Kato exploited what are now called the Temple–Kato inequalities in his thesis to prove results on asymptotic perturbation theory. Below are two typical results whose proofs are very much in the spirit of this work of Kato—see Sect. 3 for what it means for an eigenvalue to be stable.

Theorem 6.5

Let \(A_0\) be a self-adjoint operator on a Hilbert space, \({\mathcal {H}}\). Let B be a symmetric operator with \(D(A_0) \cap D(B) \equiv {\mathcal {D}}\) dense in \({\mathcal {H}}\) and a core for \(A_0\). For each \(\beta >0\) (perhaps only for sufficiently small such \(\beta \)), let \(A(\beta )\) be a self-adjoint extension of . Let \(E_0\) be a simple, discrete eigenvalue for \(A_0\) which is stable for \(A(\beta )\). Let \(\varphi \in D(A_0), \, ||\varphi || = 1\) and \(A_0\varphi =E_0\varphi \). Suppose that \(\varphi \in D(B)\). Then the eigenvalue, \(E(\beta )\) of \(A(\beta )\) near \(E_0\) obeys

$$\begin{aligned} E(\beta ) = E_0 + \beta \langle \varphi ,B\varphi \rangle +\text {O}(\beta ^2) \end{aligned}$$
(6.25)

Proof

Since \({\mathcal {D}}\) is a core and for \(\eta \in {\mathcal {D}}, \, z \in {\mathbb {C}}{\setminus }{\mathbb {R}}\), \([(A(\beta )-z)^{-1}-(A_0-z)^{-1}](A_0-z)\eta =-\beta (A(\beta )-z)^{-1}B\eta \) we see that \(A(\beta ) \rightarrow A_0\) in strong resolvent sense as \(\beta \downarrow 0\). By the definition of stability, there is an interval \((\alpha ,\zeta )\) containing \(E_0\), so that for small \(\beta \), \(A(\beta )\) has a unique eigenvalue, \(E(\beta )\), in \((\alpha ,\zeta )\). Showing the operator involved in a superscript, we see that

$$\begin{aligned} \eta _\varphi ^{A(\beta )} = E_0+\beta \langle \varphi ,B\varphi \rangle \rightarrow E_0 \end{aligned}$$

Since \((A(\beta )-\eta _\varphi ^{A(\beta )})\varphi =\beta (B-\langle \varphi ,B\varphi \rangle )\varphi \), we see that

$$\begin{aligned} \left( \epsilon _\varphi ^{A(\beta )}\right) ^2 = \beta ^2(||B\varphi ||-\langle \varphi ,B\varphi \rangle ^2)=\text {O}(\beta ^2) \end{aligned}$$

so, by the Temple–Kato inequalities, \(E(\beta )-\eta _\varphi ^{A(\beta )} = \text {O}(\beta ^2)\) which is (6.25)

To go to the next order, we need the reduced resolvent, S, of \(A_0\) at \(E_0\), defined in Sect. 2 (see (2.8)). In his thesis, Kato realized that contour integrals of \(B(A_0-z)^{-1}\ldots B(A_0-z)^{-1}\varphi \) could be expressed in terms of S. In particular, the first order formal eigenvector for \(A(\beta )\) is

$$\begin{aligned} \psi _1(\beta )=\varphi -\beta SB\varphi \end{aligned}$$
(6.26)

Since \({\mathrm{ran}}S \subset {\mathrm{ran}}(1-P)\) is orthogonal to \(\varphi \), we see that

$$\begin{aligned} ||\psi _1(\beta )||^2=1+\beta ^2||SB\varphi ||^2 \end{aligned}$$
(6.27)

For \(\psi _1(\beta )\) to be in D(B), we will need to suppose that

$$\begin{aligned} \varphi \in D(B), \qquad SB\varphi \in D(B) \end{aligned}$$
(6.28)

We can also write down the first three perturbation coefficients for the energy (see for example [497, pg 7]):

$$\begin{aligned} E_1= & {} \langle \varphi ,B\varphi \rangle , \qquad E_2=-\langle B\varphi ,SB\varphi \rangle \end{aligned}$$
(6.29)
$$\begin{aligned} E_3= & {} E_1 E_2 + \langle B\varphi ,SBSB\varphi \rangle \end{aligned}$$
(6.30)

Straightforward calculations show that

$$\begin{aligned} (A_0-E_0)\psi _1(\beta )&= -\beta (1-P)B\varphi \\ (A(\beta )-E_0)\psi _1(\beta )&= \beta E_1\varphi -\beta ^2BSB\varphi \end{aligned}$$

since \(\beta PB\varphi =\beta E_1\varphi \). Thus:

$$\begin{aligned} (A(\beta )-E_0-\beta E_1)\psi _1(\beta ) = \beta ^2 E_1 SB\varphi - \beta ^2BSB\varphi \end{aligned}$$

From this, using (6.27), one sees easily that

$$\begin{aligned}&\langle \psi _1(\beta ),A(\beta )\psi _1(\beta ) \rangle =(E_0+\beta E_1+ \beta ^2 E_2+ \beta ^3 E_3)||\psi _1(\beta )||^2 + \text {O}(\beta ^4) \end{aligned}$$
(6.31)
$$\begin{aligned}&||\left[ A(\beta )-(E_0+\beta E_1+ \beta ^2 E_2+ \beta ^3 E_3)\right] \psi _1(\beta )||^2 = \text {O}(\beta ^4) \end{aligned}$$
(6.32)

Thus, we have, using \(\psi _1(\beta )/||\psi _1(\beta )||\) as a trial vector

Theorem 6.6

Under the hypotheses of Theorem 6.5 if also (6.28) holds, then

$$\begin{aligned}&E(\beta ) = E_0+\beta E_1+ \beta ^2 E_2+ \beta ^3 E_3 + \text {O}(\beta ^4) \end{aligned}$$
(6.33)
$$\begin{aligned}&||\varphi (\beta ) - \psi _1(\beta )|| = \text {O}(\beta ^2) \end{aligned}$$
(6.34)

where \(\varphi (\beta )\) is the normalized eigenvector for \(A(\beta )\) chosen so that for small \(\beta \), \(\langle \varphi (\beta ),\varphi \rangle > 0\).

As Kato noted in his thesis, this idea shows if all the terms for the nth order formal series for the eigenvector lie in \({\mathcal {H}}\), then one gets asymptotic series for the energy with errors of order O\((\beta ^{2n})\), i.e. the 2n coefficients \(E_0,\ldots ,E_{2n-1}\) but the method doesn’t handle odd powers. Indeed in [316], he said: “However, there has been a serious gap in the series of these conditions; for all of them had in common the property that they give the expansion of the eigenvalues up to even orders of approximation, and there was no corresponding theorem giving an expansion up to an odd order.” Personally, I think “serious” is a bit strong given that he handles the case of infinite order (for me the most important) and first order results but it shows he was frustrated by a problem he tried to solve without initial success. But in [309], he put in a Note Added in Proof announcing he had solved the problem! The solution appeared in [322]. For example, if \(A_0 \ge 0, B \ge 0\), he proved that if \(\varphi \in Q(B)\), then \(E(\beta ) = E_0+E_1\beta +\text {o}(\beta )\) and if \(B^{1/2}\varphi \in Q(B^{1/2}A_0^{-1}B^{1/2})\), he proved that \(E(\beta ) = E_0+E_1\beta +E_2\beta ^2+\text {o}(\beta ^2)\). Not surprisingly, in addition to estimates of Temple–Kato type, the proofs use a variant of quadratic form methods. I note that Kato did not put any of these results in his book where his discussion of asymptotic series applies to general Banach space settings and not just positive operators and the ideas are closer to what we put in Sect. 3.

Besides the original short paper on Temple–Kato inequalities, Kato returned to the subject several times. In two papers [321, 357], he considered the fact that in some applications of interest, the natural trial vector has \(\varphi \in Q(A)\), not D(A). Trial functions only in Q(A) are fine for the Rayleigh upper bound but if \(\varphi \notin D(A)\), then \(\epsilon _\varphi ^A = \infty \), so \(\varphi \) cannot be used for Temple’s inequality or the Temple–Kato inequality. Of course, one could look at the Temple–Kato inequality for \(\sqrt{A}\) if \(A \ge 0\) but calculation of \(\langle \varphi ,\sqrt{A}\varphi \rangle \) may not be easy for, say, a second order differential operator where \(\sqrt{A}\) is a pseudo-differential operator. But such operators can often be written \(A=T^*T\) where T is a first order differential operator. Variants of the Temple–Kato inequality for operators of this form are the subject of two papers of Kato [321, 357]. Kato et al. [359] studies an application of these ideas.

Interesting enough, while Kato’s work was 20 years after Temple, Temple was young when he did that work and was still active in 1949 and he reacted to Kato’s paper with two of his own [639, 640]. George Frederick James Temple (1901–1992) was a mathematician with a keen interest in physics—he wrote two early books on quantum mechanics in 1931 and 1934. He spent much of his career at King’s College, London although for the last fifteen years of it, he held the prestigious Sedleian Chair of Natural Philosophy at Oxford, the chair going back to 1621. He was best known in British circles for a way of discussing distributions as equivalence classes of approximating smooth functions, an idea that was popular because the old guard didn’t want to think about the theory of topological vector spaces central to Schwartz’ earlier approach. His other honors include a knighthood (CBE, for War work), a fellowship in and the Sylvester Medal of the Royal Society. At age 82, he became a benedictine monk and spent the last years of his life in a monastery on the Isle of Wright. The long biographical note of his life written for the Royal Society [370] doesn’t even mention Temple’s inequality!

Davis [105] extended what he calls “the ingenious method of Kato” by replacing the single interval \((\alpha ,\zeta )\) by a finite union of intervals. Thirring [643] has discussed Temple’s inequality as a consequence of the Feshbach [149, 150] projection method (which mathematicians call the method of Schur [545] complements). Turner [658] and Harrell [221] have extensions to the case where A is normal rather than self-adjoint and Kuroda [400] to n commuting self-adjoint operators (and so including the normal case). Cape et al. [81] apply Temple–Kato inequalities to graph Laplacians. Golub–van der Vost [195] have a long review on eigenvalue values bounds mentioning that by the time of their review in 2000, Temple–Kato inequalities had become a standard part of linear algebra.

7 Self-adjointness, I: Kato’s theorem

This is the first of four sections on self-adjointness issues. We assume the reader knows the basic notions, including what an operator closure and an operator core are and the meaning of essential self-adjointness. A reference for these things is [616, Section 7.1].

This section concerns the Kato–Rellich theorem and its application to prove the essential self-adjointness of atomic and molecular Hamiltonians. The quantum mechanical Hamiltonians typically treated by this method are bounded from below. Section 8 discusses cases where \(V(x) \ge -cx^2-d\) like Stark Hamiltonians. Section 9 discusses Kato’s contribution to the realization that the positive part of V can be more singular than the negative part without destroying essential self-adjointness and Sect. 10 turns to Kato’s contribution to the theory of quadratic forms. To save ink, in this article, I’ll use “esa” as an abbreviation for “essentially self-adjoint” or “essential self-adjointness” and “esa-\(\nu \)” for “essentially self-joint on \(C_0^\infty ({\mathbb {R}}^\nu )\).”.

As we’ve mentioned, Kato’s 1951 paper [314] is a pathbreaking contribution of great significance. He considered N-body Hamiltonians on \(L^2({\mathbb {R}}^{\nu N})\) of the formal form

$$\begin{aligned} H = -\sum _{j=1}^{N}\frac{1}{2m_j}\Delta _j + \sum _{i<j} V_{ij}(x_i-x_j) \end{aligned}$$
(7.1)

where \(x \in {\mathbb {R}}^{\nu N}\) is written \(\varvec{x} = (x_1,\ldots ,x_N)\) with \(x_j \in {\mathbb {R}}^\nu \), \(\Delta _j\) is the \(\nu \)-dimensional Laplacian in \(x_j\) and each \(V_{ij}\) is a real valued function on \({\mathbb {R}}^\nu \). In 1951, Kato considered only the physically relevant case \(\nu =3\).

If there are \(N+k\) particles in the limit where the masses of particles \(N+1,\ldots ,N+k\) are infinite, one considers an operator like H but adds terms

$$\begin{aligned} \sum _{j=1}^{N} V_j(x_j), \qquad V_j(x) = \sum _{\ell =N+1}^{N+k} V_{j\ell }(x-x_\ell ) \end{aligned}$$
(7.2)

where \(x_{N+1},\ldots ,x_{N+k}\) are fixed points in \({\mathbb {R}}^\nu \).

More generally, one wants to consider (as Kato did) Hamiltonians with the center of mass removed. We discuss the kinematics of such removal in Sect. 11 in Part 2. We note that the self-adjointness results on the Hamiltonians of the form (7.1) easily imply results on Hamiltonians (on \(L^2({\mathbb {R}}^{(N-1)\nu })\)) with the center of mass motion removed. Of especial interest is the Hamiltonian of the form (7.2) with \(N=1\), i.e.

$$\begin{aligned} H=-\Delta +W(x) \end{aligned}$$
(7.3)

on \(L^2({\mathbb {R}}^\nu )\) which we’ll call reduced two body Hamiltonians (since, except for a factor of \((2\mu )^{-1}\) in front of \(-\Delta \), it is the two body Hamiltonian with the center of mass removed).

Kato’s big 1951 result was

Theorem 7.1

(Kato’s Theorem [314], First Form) Let \(\nu =3\). Let each \(V_{ij}\) in (7.1) lie in \(L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\). Then the Hamiltonian of (7.1) is self-adjoint on \(D(H) = D(-\Delta )\) and esa-(3N).

Remarks

  1. 1.

    The same results holds with the terms in (7.2) added so long as each \(V_j\) lies in \(L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\).

  2. 2.

    Kato also notes the exact description of \(D(-\Delta )\) on \(L^2({\mathbb {R}}^\nu )\) in terms of the Fourier transform (see [612, Chapter 6]) \(\hat{\varphi }(k) = (2\pi )^{-\nu /2} \int e^{-ik\cdot x} \varphi (x) d^\nu x\):

    $$\begin{aligned} D(-\Delta ) = \{ \varphi \in L^2({\mathbb {R}}^\nu ) \,|\, \int (1+k^2)^2 |\hat{\varphi }(k)|^2 d^\nu k < \infty \} \end{aligned}$$
    (7.4)
  3. 3.

    The proof shows that the graph norms of H and \(-\Delta \) on \(D(-\Delta )\) are equivalent, so any operator core for \(-\Delta \) is a core for H. Since it is easy to see that \(C_0^\infty ({\mathbb {R}}^{3N})\) is a core for \(-\Delta \), the esa result follows from the self-adjointness claim, so we concentrate on the latter.

  4. 4.

    Kato didn’t assume that \(V \in L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\) but rather the stronger hypothesis that for some \(R < \infty \), one has that \(\int _{|x|< R} |V(x)|^2 d^3x < \infty \) and \(\sup _{|x| \ge R} |V(x)| < \infty \), but his proof extends to \(L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\).

  5. 5.

    Kato didn’t state that \(C_0^\infty ({\mathbb {R}}^{3N})\) is a core but rather that \(\psi \)’s of the form \(P(x) e^{-\tfrac{1}{2}x^2}\) with P a polynomial in the coordinates of x is a core (He included the \(\tfrac{1}{2}\) so the set was invariant under Fourier transform.) His result is now usually stated in terms of \(C_0^\infty \).

If \(v(x) = 1/|x|\) on \({\mathbb {R}}^3\), then \(v \in L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\), so Theorem 7.1 has the important Corollary, which includes the Hamiltonians of atoms and molecules:

Theorem 7.2

(Kato’s Theorem [314], Second Form) The Hamiltonian, H, of (7.1) with \(\nu = 3\) and each

$$\begin{aligned} V_{ij}(x) = \frac{z_{ij}}{|x|} \end{aligned}$$
(7.5)

and this Hamiltonian with terms of the form (7.2) where

$$\begin{aligned} V_j(x) = \sum _{\ell = N+1}^{N+k} \frac{z_{j\ell }}{|x-x_\ell |} \end{aligned}$$
(7.6)

are self-adjoint on \(D(-\Delta )\) and esa-3N

Remark

This result assures that the time dependent Schrödinger equation \(\dot{\psi }_t = -iH\psi _t\) has solutions (since self-adjointness means that \(e^{-itH}\) exists as a unitary operator). The analogous problem for Coulomb Newton’s equation (i.e. solvability for a.e. initial condition) is open for \(N \ge 5\)!

As Kato remarks in [1], “the proof turned out to be rather easy”. It has three steps:

  1. (1)

    The Kato–Rellich theorem which reduces the proof to showing that each \(V_{ij}\) is relatively bounded for Laplacian on \({\mathbb {R}}^3\) with relative bound 0.

  2. (2)

    A proof that any function in \(L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\), as an operator on \(L^2({\mathbb {R}}^3)\), is \(-\Delta \)-bounded with relative bound 0. This relies on a simple Sobolev estimate.

  3. (3)

    A piece of simple kinematics that says that the two body estimate in step 2 extends to one for \(v_{ij}(x_i-x_j)\) as an operator on \(L^2({\mathbb {R}}^{3N})\).

Step 1. The needed result (recall that A-bounded is defined in (2.14)):

Theorem 7.3

(Kato–Rellich Theorem) Let A be self-adjoint, B symmetric and let B be A-bounded with relative bound \(a<1\), i.e. \(D(A) \subset D(B)\) and for some fixed b and all \(\varphi \in D(A)\)

$$\begin{aligned} ||B\varphi || \le a||A\varphi ||+b||\varphi || \end{aligned}$$
(7.7)

Then \(A+B\) is self-adjoint on D(A) and any operator core for A is one for \(A+B\).

Remarks

  1. 1.

    This result is due to Rellich [504,505,506,507,508, Part III]. Kato found it in 1944, when he was unaware of Rellich’s work, so it is independently his.

  2. 2.

    The proof uses von Neumann’s criteria: a closed symmetric operator, C, on D(C) is self-adjoint if and only if for some \(\kappa \in (0,\infty )\), one has that \({\mathrm{ran}}(C\pm i\kappa )={\mathcal {H}}\). For C closed implies that \({\mathrm{ran}}(C\pm i\kappa )\) are closed subspaces with \({\mathrm{ran}}(C\pm i\kappa )^\perp = \ker (C^*{\mp } i\kappa )\). Thus, if C is self-adjoint, then \(\ker (C^*{\mp } i\kappa ) = \{0\}\) proving one direction. For the other direction, suppose that \({\mathrm{ran}}(C\pm i\kappa ) = {\mathcal {H}}\). Given \(\psi \in D(C^*)\), find \(\varphi \in D(C)\) with \((C+i\kappa )\varphi = (C^*+i\kappa )\psi \) (since \({\mathrm{ran}}(C+i\kappa )={\mathcal {H}}\)). Thus \((C^*+i\kappa )(\varphi -\psi ) = 0\). Since \({\mathrm{ran}}(C-i\kappa ) = {\mathcal {H}}= \ker (C^*+i\kappa )^\perp \), we have that \(\varphi -\psi = 0\). Thus \(D(C^*) = D(C)\) and C is self-adjoint.

  3. 3.

    For the rest of the proof, use \(||(C+i\kappa )\varphi ||^2 = ||C\varphi ||^2+|\kappa |^2||\varphi ||^2\) to see that

    $$\begin{aligned} ||C(C\pm i\kappa )^{-1}|| \le 1, \qquad ||(C\pm i\kappa )^{-1}|| \le |\kappa |^{-1} \end{aligned}$$
    (7.8)

    It follows from this (with \(C=A\)) that when (7.7) holds, one has that

    $$\begin{aligned} ||B(A\pm i\kappa )^{-1}|| \le a + b|\kappa |^{-1} \end{aligned}$$
    (7.9)

    Since \(a < 1\), we can be sure that if \(|\kappa |\) is very large, then \({||B(A\pm i\kappa )^{-1}|| < 1}\) so using a geometric series, we have that \({1 + B(A\pm i\kappa )^{-1}}\) is invertible which implies that it maps \({\mathcal {H}}\) onto \({\mathcal {H}}\). Since \((A\pm i\kappa )\) maps D(A) onto \({\mathcal {H}}\), we see that

    $$\begin{aligned} (A+B\pm i\kappa ) =(1+B(A\pm i\kappa )^{-1})(A\pm i\kappa ) \end{aligned}$$
    (7.10)

    maps D(A) onto \({\mathcal {H}}\). Thus by von Neumann’s criterion, \(A+B\) is self-adjoint on D(A). By a simple argument, \(||A\cdot ||+||\cdot ||\) is an equivalent norm to \(||(A+B)\cdot ||+||\cdot ||\) which proves the esa result.

  4. 4.

    The case \(B=-A\) shows that one can’t conclude self-adjointness of \(A+B\) on D(A) if (7.7) holds with \(a=1\) but Kato [345] proved that \(A+B\) is esa on D(A) in that case and Wüst [690] proved the stronger result of esa on D(A) if one has for all \(\varphi \in D(A)\)

    $$\begin{aligned} ||B\varphi ||^2 \le ||A\varphi ||^2 + b||\varphi ||^2 \end{aligned}$$
    (7.11)
  5. 5.

    In some of my early papers, I called B Kato small if B was \(A-bounded\) with relative bound less than 1 and Kato tiny if the relative bound was 0. I am pleased to say that while many of my names (hypercontractive, almost Mathieu, Berry’s phase, Kato class,...) have stuck, this one has not!

Step 2. Kato began by considering \(\varphi \in L^2({\mathbb {R}}^3)\) with \(\varphi \in D(-\Delta )\), i.e. \(\int (1+k^2)^2|\hat{\varphi }(k)|^2 d^3k < \infty \). He noted that this implied that

$$\begin{aligned} \int |\hat{\varphi }(k)| d^3k&= \int (1+k^2)^{-1}(1+k^2)|\hat{\varphi }(k)| d^3k \nonumber \\&\le ||(1+k^2)^{-1}||_2 ||(1-\Delta )\varphi ||_2 \end{aligned}$$
(7.12)

by the Schwarz inequality and Plancherel theorem. Thus

$$\begin{aligned} ||\varphi ||_\infty&\le (2\pi )^{-3/2} \int |\hat{\varphi }(k)| d^3k \end{aligned}$$
(7.13)
$$\begin{aligned}&\le C\left( ||\Delta \varphi ||_2 + ||\varphi ||_2\right) \end{aligned}$$
(7.14)

It follows that if \(V=V_1+V_2\) with \(V_1 \in L^2({\mathbb {R}}^3), V_2 \in L^\infty ({\mathbb {R}}^3)\), then as operators on \(L^2({\mathbb {R}}^3)\)

$$\begin{aligned} ||V\varphi ||_2&\le ||V_1\varphi ||_2+ ||V_2\varphi ||_2 \nonumber \\&\le ||V_1||_2 ||\varphi ||_\infty + ||V_2||_\infty ||\varphi ||_2 \nonumber \\&\le C||V_1||_2 ||\Delta \varphi ||_2 + \left( C||V_1||_2+||V_2||_\infty \right) ||\varphi ||_2 \end{aligned}$$
(7.15)

If \(f \in L^2\) and

$$\begin{aligned} f^{(n)}(x) = \left\{ \begin{array}{ll} f(x), &{}\quad \hbox { if } |f(x)| > n\\ 0, &{}\quad \hbox { if } |f(x)| \le n \end{array} \right. \end{aligned}$$
(7.16)

then \(||f^{(n)}||_2 \rightarrow 0\) as \(n \rightarrow \infty \) by the dominated convergence theorem and for all n, \(||f-f^{(n)}||_\infty < \infty \). It follows from (7.15) that any \(V \in L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\) is \(-\Delta \)-bounded with relative bound zero as operators on \(L^2({\mathbb {R}}^3)\).

Step 3. In modern language, one shows that if \({\mathcal {H}}= {\mathcal {H}}_1\otimes {\mathcal {H}}_2\) (tensor products are defined, for example, in [612, Section 3.8]) and (7.7) holds, then

$$\begin{aligned} ||(B\otimes {\varvec{1}})\varphi || \le a||(A\otimes {\varvec{1}})\varphi ||+ b||\varphi || \end{aligned}$$
(7.17)

Thus, if V is a function of \(x_1\) alone, \(V(x_1,\ldots ,x_N) = v(x_1),\, v \in L^2({\mathbb {R}}^3)+L^\infty ({\mathbb {R}}^3)\) so that (7.7) holds for v on \(L^2({\mathbb {R}}^3)\), then it also holds for \(B = V(x)\) and \(A=-\Delta _1\) on \(L^2({\mathbb {R}}^{3N})\). Since \(|k_1|^2 \le |k|^2\), we conclude that V is \(-\Delta \)-bounded with relative bound zero on \(L^2({\mathbb {R}}^{3N})\). By a coordinate change, the same is true for \(v(x_i-x_j)\).

Rather than talk about tensor products, Kato used iterated Fourier transforms and states inequalities like

$$\begin{aligned} \sup _{x_1} \left[ \int |\varphi (x_1,\ldots ,x_N)|^2 d^3x_2\ldots d^3x_N\right] \le C \int (1+k_1^2)^2 |\hat{\varphi }(k)|^2 d^{3N}k \end{aligned}$$
(7.18)

which is equivalent to the tensor product results. This concludes our sketch of Kato’s proof of his great theorem.

Kato states in the paper that he had found the results by 1944. Kato originally submitted the paper to Physical Review. Physical Review transferred the manuscript to the Transactions of the AMS where it eventually appeared. They had trouble finding a referee and in the process the manuscript was lost (a serious problem in pre-Xerox days!). Eventually, von Neumann got involved and helped get the paper accepted. I’ve always thought that given how important he knew the paper was, von Neumann should have suggested Annals of Mathematics and used his influence to get it published there. The receipt date of October 15, 1948 on the version published in the Transactions shows a long lag compared to the other papers in the same issue of the Transactions which have receipt dates of Dec., 1949 through June, 1950. Recently after Kato’s widow died and left his papers to some mathematicians (see the end of Sect. 1) and some fascinating correspondence of Kato with Kemble and von Neumann came to light. There are plans to publish an edited version [181].

It is a puzzle why it took so long for this theorem to be found. One factor may have been von Neumann’s attitude. Bargmann told me of a conversation several young mathematicians had with von Neumann around 1948 in which von Neumann told them that self-adjointness for atomic Hamiltonians was an impossibly hard problem and that even for the Hydrogen atom, the problem was difficult and open. This is a little strange since, using spherical symmetry, Hydrogen can be reduced to a direct sum of one dimensional problems. For such ODEs, there is a powerful limit point–limit circle method named after Weyl and Titchmarsh (although it was Stone, in his 1932 book, who first made it explicit). Using this, it is easy to see (there is one subtlety for \(\ell =0\) since the operator is limit circle at 0) that the Hydrogen Hamiltonian is self-adjoint and this appears at least as early as Rellich [509]. Of course, this method doesn’t work for multielectron atoms. In any event, it is possible that von Neumann’s attitude may have discouraged some from working on the problem.

Still it is surprising that neither Friedrichs nor Rellich found this result. In exploring this, it is worth noting that there is an alternate to step 2:

Step 2\('\). On \({\mathbb {R}}^3\), there is the well known operator inequality (discussed further in Sect. 10 and in [615, Section 6.2]) known as Hardy’s inequality (\(A \le B\) for positive operators is discussed in Sect. 10 and [616, Section 7.5]; for this case, it means \(\langle \varphi ,A\varphi \rangle \le \langle \varphi ,B\varphi \rangle \) for all \(\varphi \in C_0^\infty ({\mathbb {R}}^3)\)):

$$\begin{aligned} \frac{1}{4r^2} \le -\Delta \end{aligned}$$
(7.19)

Since \(x \le \epsilon x^2 + \tfrac{1}{4} \epsilon ^{-1}\) for \(x \in (0,\infty )\), the spectral theorem implies that for any positive, self-adjoint operator, C, we have that

$$\begin{aligned} C \le \epsilon C^2 + \tfrac{1}{4} \epsilon ^{-1} \end{aligned}$$
(7.20)

so using this for \(C=-\Delta \), (7.19) implies that

$$\begin{aligned} \frac{1}{4r^2} \le \epsilon (-\Delta )^2 + \frac{1}{4} \epsilon ^{-1} \end{aligned}$$
(7.21)

equivalently, for \(\varphi \in C_0^\infty ({\mathbb {R}}^3)\)

$$\begin{aligned} ||r^{-1}\varphi ||^2 \le 4\epsilon ||-\Delta \varphi ||^2 + \epsilon ^{-1} ||\varphi ||^2 \end{aligned}$$
(7.22)

which implies that \(r^{-1}\) is \(-\Delta \)-bounded with relative bound zero.

Rellich used Hardy’s inequality in his perturbation theory papers [504,505,506,507,508] in a closely related context. Namely he used (7.19) and (7.20) for \(C=r^{-1}\) to show that \(r^{-1} \le 4\epsilon (-\Delta ) + \tfrac{1}{4} \epsilon ^{-1}\) to note the semiboundedness of the Hydrogen Hamiltonian. Since Rellich certainly knew the Kato–Rellich theorem, it appears that he knew steps 1 and 2\('\).

In a sense, it is pointless to speculate why Rellich didn’t find Theorem 7.2, but it is difficult to resist. It is possible that he never considered the problem of esa of atomic Hamiltonians, settling for a presumption that using the Friedrichs extension suffices (as Kato suggests in [1]) but I think that unlikely. It is possible that he thought about the problem but dismissed it as too difficult and never thought hard about it. Perhaps the most likely explanation involves Step 3: once you understand it, it is trivial, but until you conceive that it might be true, it might elude you.

Kato’s original paper required that the \(L^2\) piece have compact support (in the relevant variables). While it is easy to accommodate global \(L^2\), it is true that it is enough to be uniformly locally \(L^2\), i.e.

$$\begin{aligned} \sup _x \int _{|x-y| \le 1} |V(y)|^2 d^3y \end{aligned}$$
(7.23)

denoted \(L^2_{unif}({\mathbb {R}}^3)\). It was Stummel [631] who first realized this. There are general localization techniques, originally developed for form estimates by Ismagilov [273], Morgan [447] and Sigal [554] (and discussed as the IMS localization formula in [101, Section 3.1]) which have operator versions. For a recent paper on these techniques, see Gesztesy et. al. [189]. For example, [616, Problem 7.1.9] proves:

Theorem 7.4

For each \(\alpha \in {\mathbb {Z}}^\nu \), let \(\Delta _\alpha \) be the cube of side 3 centered at \(\alpha \) and \(\chi _\alpha \) its characteristic function. Let V be a measurable function on \({\mathbb {R}}^\nu \) so that for some positive ab and all \(\alpha \) and all \(\varphi \in C_0^\infty ({\mathbb {R}}^\nu )\)

$$\begin{aligned} ||V\chi _\alpha \varphi ||_2 \le a||-\Delta \varphi ||_2+b||\varphi ||_2 \end{aligned}$$
(7.24)

Then for any \(\epsilon > 0\), there is a \(b_\epsilon \) so that for all \(\varphi \in C_0^\infty ({\mathbb {R}}^\nu )\), we have that

$$\begin{aligned} ||V\varphi ||_2 \le (a+\epsilon )||-\Delta \varphi ||+b_\epsilon ||\varphi ||_2 \end{aligned}$$
(7.25)

In particular, any \(V \in L^2_{unif}({\mathbb {R}}^3)\) is \(-\Delta \)-bounded on \(L^2({\mathbb {R}}^3)\) with relative bound 0.

In exploring extensions of Theorem 7.1, it is very useful to have simple self-adjointness criteria for \(-\tfrac{d^2}{dx^2}+q(x)\) on \(L^2(0,\infty )\) which then translate to criteria for \(-\Delta +V(x)\) if \(V(x) = q(|x|)\) is a spherically symmetric potential. If \(q \in L^2_{loc}(0,\infty )\), for each \(z \in {\mathbb {C}}\), the set of solutions of \(-u''+qu=zu\) (in the sense that u is \(C^1\), \(u'\) is absolutely continuous and \(u''\) is its \(L^1_{loc}\) derivative) is two dimensional. If all solutions are \(L^2\) at \(\infty \) (resp. 0), we say that \(-\tfrac{d^2}{dx^2}+q(x)\) is limit circle at \(\infty \) (resp. 0). If it is not limit circle, we say it is limit point. It is a theorem that whether one is limit point or limit circle is independent of z. However, in the limit point case, whether the set of \(L^2\) solutions near infinity is 0 or 1 dimensional can be z dependent. One has the basic

Theorem 7.5

(Weyl limit point–limit circle theorem) Let \(q \in L^2_{loc}(0,\infty )\). Then \(-\tfrac{d^2}{dx^2}+q(x)\) is esa on \(C_0^\infty (0,\infty )\) if and only if \(-\tfrac{d^2}{dx^2}+q(x)\) is limit point at both 0 and \(\infty \).

Remarks

  1. 1.

    This result holds for any interval \((a,b) \subset {\mathbb {R}}\) where a can be \(-\infty \) and/or b can be \(\infty \).

  2. 2.

    If it is limit point at only one of 0 and \(\infty \) and limit circle at the other point, the deficiency indices (see [616, Section 7.1] for definitions) are (1, 1) and if it is limit circle at both 0 and \(\infty \), they are (2, 2). In particular if it is limit point at \(\infty \) and \(\int _{0}^{1} |V(x)| dx < \infty \), then the deficiency indices are (1,1) and the extensions are described by boundary conditions \(\cos \theta \, u'(0)+\sin \theta \, u(0) = 0\).

  3. 3.

    The ideas behind much of the theorem go back to Weyl [677, 678, 680] in 1910 and predate the notion of self-adjointness. It was Stone [630] who first realized the implications for self-adjointness and proved Theorem 7.5. [616, Thm 7.4.12] has a succinct proof. Titchmarsh [653] reworked the theory so much that it is sometimes called Weyl–Titchmarsh theory. For additional literature, see [94, 131, 423].

Example 7.6

(\(x^{-2}\) on \((0,\infty )\)) Let \(q(x) = \beta x^{-2}\). Trying \(x^\alpha \) in \(-u''+\beta x^{-2} u = 0\), one finds that \(\alpha (\alpha -1)=\beta \) is solved by \(\alpha _\pm = \tfrac{1}{2}(1 \pm \sqrt{1+4\beta })\). For \(\beta \ne -\tfrac{1}{4}\), this yields two linearly independent solutions, so a basis. The larger solution (and sometimes both) is not \(L^2\) at infinity, so it is always limit point there.

For \(\alpha \ge -\tfrac{1}{4}\), there is a positive solution which implies that \(H_\beta \equiv -\tfrac{d^2}{dx^2}+\beta x^{-2} \ge 0\). If \(\alpha < -\tfrac{1}{4}\), the solutions oscillate and the real solutions have infinitely many zeros which implies that the operator is not positive (see [616, Section 7.4]). Thus

$$\begin{aligned} -\tfrac{d^2}{dx^2}+\beta x^{-2} \ge 0 \text { on } C_0^\infty (0,\infty ) \iff \beta \ge -\tfrac{1}{4} \end{aligned}$$
(7.26)

This is Hardy’s inequality on \(L^2(0,\infty )\).

\(x^\alpha \notin L^2(0,1) \iff \alpha \le -\tfrac{1}{2}\). At \(\beta = \tfrac{3}{4},\, \alpha _- = -\tfrac{1}{2}\). Thus \(H_\beta \) is always limit point at \(\infty \) and is limit point at 0 if and only if \(\beta \ge \tfrac{3}{4}\), i.e.

$$\begin{aligned} -\tfrac{d^2}{dx^2}+\beta x^{-2} \text { is esa on } C_0^\infty (0,\infty ) \iff \beta \ge \tfrac{3}{4} \end{aligned}$$
(7.27)

A comparison theorem shows that if

$$\begin{aligned} q(x) \ge \tfrac{3}{4} x^{-2} - c \end{aligned}$$
(7.28)

for some real c, then \(-\tfrac{d^2}{dx^2}+q(x)\) is esa on \(C_0^\infty (0,\infty )\).

On \({\mathbb {R}}^\nu \), one defines spherical harmonics (see [615, Section 3.5]), \(\{Y_{\ell m}\}_{m=1;\ell =0,1,\ldots }^{D(\nu ,\ell )}\) on \(S^{\nu - 1}\), the unit sphere in \({\mathbb {R}}^\nu \), to be the restriction to the unit sphere of harmonic polynomials of degree \(\ell \). These polynomials are a vector space of dimension \(D(\nu ,\ell ) = \tfrac{\ell +\nu -2}{\nu -2}\left( {\begin{array}{c}\nu -3+\ell \\ \nu -3\end{array}}\right) \) and \(Y_{\ell m}\) are a convenient orthonormal basis. Any function \(f \in {\mathcal {S}}({\mathbb {R}}^\nu )\) can be expanded in the form (\(r \in (0,\infty ),\, \omega \in S^{\nu - 1}\))

$$\begin{aligned} f(r\omega ) = \sum _{\ell ,m} r^{-(\nu - 1)/2} f_{\ell m}(r) Y_{\ell m}(\omega ) \end{aligned}$$
(7.29)

(where for \(\nu \ge 2\), \(f_{\ell m}\) vanishes so rapidly at \(r=0\) that \(r^{-(\nu - 1)/2}f_{\ell m}(r)\) has a limit as \(r \downarrow 0\) which must be zero unless \((\ell m) = (0 1)\)). Moreover, if \(\sigma _\nu \) is the area of the unit sphere, then

$$\begin{aligned} ||f||^2_{L^2({\mathbb {R}}^\nu ,d^\nu x)} = \sigma _\nu \sum _{\ell , m} ||f_{\ell m}||^2_{L^2({\mathbb {R}},dr)} \end{aligned}$$
(7.30)

and

$$\begin{aligned} (\Delta f)_{\ell m} = \left[ \frac{d^2}{dr^2}-\frac{(\nu -1)(\nu -3)}{4r^2}-\frac{\ell (\ell +\nu -2)}{r^2}\right] f_{\ell m} \end{aligned}$$
(7.31)

If \(V(\varvec{r}) = q(r)\), then \(-\Delta +V\) is a direct sum of operators of the form

$$\begin{aligned} H_{\ell m}(V)&= -\frac{d^2}{dr^2}+q_\ell (r) \end{aligned}$$
(7.32)
$$\begin{aligned} q_{\ell }(x)&= \frac{(\nu -1)(\nu -3)}{4x^2}+\frac{\ell (\ell +\nu -2)}{x^2}+q(x) \end{aligned}$$
(7.33)

It is easy to see that such direct sums are bounded from below (resp. esa) on \(C_{00}^\infty ({\mathbb {R}}^\nu ) \equiv C_0^\infty ({\mathbb {R}}^\nu {\setminus }\{0\})\) if and only if each \(H_{\ell m}\) is bounded from below (resp. esa) on \(C_0^\infty (0,\infty )\) We conclude that

Proposition 7.7

On \({\mathbb {R}}^\nu \), \(H_\beta ^{(\nu )} \equiv -\Delta +\beta |x|^{-2}\) on \(C_{00}^\infty ({\mathbb {R}}^\nu )\) is

  1. (1)

    Bounded from below

    $$\begin{aligned} H_\beta ^{(\nu )} \ge 0 \iff \beta \ge -\frac{(\nu -2)^2}{4} \end{aligned}$$
    (7.34)
  2. (2)

    \(H_\beta ^{(\nu )}\) is esa on \(C_{00}^\infty ({\mathbb {R}}^\nu )\) if and only if

    $$\begin{aligned} \beta \ge -\frac{\nu (\nu -4)}{4} \end{aligned}$$
    (7.35)

Remarks

  1. 1.

    This uses \(-\tfrac{(\nu -1)(\nu -3)}{4}-\tfrac{1}{4} = -\tfrac{(\nu -2)^2}{4}\) and \(-\tfrac{(\nu -1)(\nu -3)}{4}+\tfrac{3}{4} = -\tfrac{\nu (\nu -4)}{4}\).

  2. 2.

    (7.34) is the \(\nu \)-dimensional Hardy inequality with optimal constant (see Sect. 10 below).

  3. 3.

    By (7.28), if \(\nu \ge 4\) and V is spherically symmetric and obeys \(V(x) \ge -\tfrac{\nu (\nu -4)}{4|x|^2}\), then \(-\Delta +V\) is esa-\(\nu \) (discussed further in Sect. 9).

  4. 4.

    In particular, \(C_{00}^\infty ({\mathbb {R}}^\nu )\) is an operator core for \(-\Delta \) if and only if \(\nu \ge 4\) and a form core for \(-\Delta \) if and only if \(\nu \ge 2\).

  5. 5.

    By (7.28), if \(\gamma > 2\), then \(-\Delta +\lambda |x|^{-\gamma }\) (\(\lambda >0\)) is esa on \(C_{00}^\infty ({\mathbb {R}}^\nu )\). If \(\nu \ge 5\) and \(2< \gamma < \nu /2\), we have that \(|x|^{-\gamma } \in L^2({\mathbb {R}}^\nu )+L^\infty ({\mathbb {R}}^\nu )\), so one can define \(T \equiv -\Delta +\lambda |x|^{-\gamma }\) on \(C_{0}^\infty ({\mathbb {R}}^\nu )\) and it is easy to see that T is symmetric. It follows by general principles [616, Section 7.1] that T is esa-\(\nu \).

  6. 6.

    There is an intuition to explain why one loses self-adjointness of \(-\Delta -|x|^{-\gamma }\) when \(\gamma > 2\). If \(\gamma <2\), in classical mechanics there is an \(\tfrac{\ell ^2}{|x|^2}\) barrier which dominates the \(-|x|^{-\gamma }\), so for almost every initial condition, the classical particle avoids the singularity at the origin. But when \(\gamma > 2\), every negative energy initial condition will fall into the origin in finite time so in classical mechanics, one needs to supplement with a rule about what happens when the particle is captured by the singularity. The quantum analog is the loss of esa. There is of course a difference at \(\gamma = 2\) where classically there is a problem no matter the coupling but not in quantum mechanics. This is associated with the uncertainty principle. In the next section, we’ll see that this intuition is also useful to understand what happens with V’s going to \(-\infty \) at spatial infinity.

We summarize in

Example 7.8

(\(|x|^{-2}\) in \({\mathbb {R}}^\nu \); \(\nu \ge 5\)) Rellich’s Inequality [510] (see also [616, Problem 7.4.10], Sect. 10 below, Gesztesy–Littlejohn [188] or Robinson [519] for a proof of Rellich’s inequality via a double commutator estimate like the one before (3.33) and Hardy’s inequality; this proof is a variant of one of Schmincke [539]) says that on \({\mathbb {R}}^\nu , \, \nu \ge 5\), one has

$$\begin{aligned} \frac{\nu (\nu -4)}{4} |||x|^{-2} \varphi || \le ||\Delta \varphi || \end{aligned}$$
(7.36)

for any \(\varphi \in C_{0}^\infty ({\mathbb {R}}^\nu )\). (Of course, this also hold if \(\nu \le 4\) since the left side is negative or 0 (maybe \(-\infty \)) in that case.) This says that \(B=-|x|^{-2}\) is \(-\Delta \)-bounded if \(\nu \ge 5\). When B is A-bounded with A positive, there are three natural values of \(\lambda \), call them \(\lambda _1, \lambda _2, \lambda _3\) with \(0<\lambda _1\le \lambda _2\le \lambda _3\) so that

\(\lambda B\) is A-bounded with relative bound \(<1\) if and only if \(0 \le \lambda < \lambda _1\).

\(A+\lambda B\) is esa on D(A) if \(0 \le \lambda < \lambda _2\) and not if \(\lambda > \lambda _2\).

\(A+\lambda B\) is bounded from below if \(0 \le \lambda < \lambda _3\) and not if \(\lambda > \lambda _3\)

By (7.34), (7.35) and (7.36), we see that

$$\begin{aligned} \lambda _1(\nu ) = \lambda _2(\nu ) = \frac{\nu (\nu -4)}{4}, \qquad \lambda _3(\nu )=\frac{(\nu -2)^2}{4} \end{aligned}$$
(7.37)

There is no reason that \(\lambda _1\) has to equal \(\lambda _2\), i.e. esa can persist past the point where the relative bound is 1. For example, if \(A=-\Delta \) on \({\mathbb {R}}^\nu \) with \(\nu \ge 5\) and

$$\begin{aligned} B = -|x|^{-2}+ 2|x-e|^{-2} \end{aligned}$$

for e some fixed, non-zero vector, then one can prove that \(\lambda _1 = \tfrac{\nu (\nu -4)}{8},\,\lambda _2= \tfrac{\nu (\nu -4)}{2}\) and \(\lambda _3=\tfrac{(\nu -2)^2}{4}\).

We turn now to the extensions of Theorem 7.1 to \(\nu \ne 3\). The first results are due to Stummel which we’ll discuss later. In 1959, Brownell [75] proved any \(V \in L^p({\mathbb {R}}^\nu )+L^\infty ({\mathbb {R}}^\nu )\) is \(-\Delta \)-bounded with relative bound zero (see also Nilsson [467]) if

$$\begin{aligned} p=2\, (\nu \le 3), \qquad p > \nu /2 \, (\nu \ge 4) \end{aligned}$$
(7.38)

Since \(|x|^{-2} \in L^p+L^\infty \) for any \(p < \nu /2\), we see that (7.38) is optimal, except perhaps for the borderline case \(p=\nu /2\) (see remark 2 below). Brownell mimicked Kato’s proof, except that (7.14) is replaced by

$$\begin{aligned} ||\varphi ||_r \le C(||\Delta \varphi ||_2 + ||\varphi ||_2) \end{aligned}$$
(7.39)

for any \(r>r_0\) where \(r_0^{-1} = \tfrac{1}{2} -\tfrac{2}{\nu }\) when \(\nu \ge 4\). In place of (7.13), Brownell used a Hausdorff–Young inequality (see [612, Theorem 6.6.2]).

(7.39) is what is known as an inhomogeneous Sobolev inequality. There are now (and even then, but not so widely known) sharper inequalities than (7.39). Recall that \(L^p_w({\mathbb {R}}^\nu )\), the weak \(L^p\) space is defined as the measurable functions for which \(||f||_{p,w}^*\) is finite where

$$\begin{aligned} |\{x\,|\,|f(x)|>t\}| \le \frac{(||f||_{p,w}^*)^p}{t^p} \end{aligned}$$
(7.40)

\(||f||_{p,w}^*\) is defined to be the minimal constant so that (7.40) holds. It is not a norm but, for \(p > 1\), it is equivalent to one—see [615, Section 2.2]. One has that \(L^p({\mathbb {R}}^\nu ) \subset L^p_w({\mathbb {R}}^\nu )\) but for \(f \in L^p_w\), one can have \(\int |f(x)|^p d^\nu x\) logarithmically divergent, for example \(f(x) = |x|^{-\nu /p}\) is in \(L^p_w\) but not \(L^p\).

We call p, \(\nu \)-canonical if \(p=2\) for \(\nu \le 3\), \(p > 2\) if \(\nu =4\) and \(p=\nu /2\) if \(p \ge 5\). The optimal \(L^p\) extension of Theorem 7.1 is

Theorem 7.9

Let p be \(\nu \)-canonical. Then \(V \in L^p({\mathbb {R}}^\nu )+L^\infty ({\mathbb {R}}^\nu )\) is \(-\Delta \)-bounded with relative bound zero. If \(\nu \ge 5\), \(V \in L^p_w({\mathbb {R}}^\nu )+L^\infty ({\mathbb {R}}^\nu )\) is \(-\Delta \)-bounded on \(L^2({\mathbb {R}}^\nu )\).

Remarks

  1. 1.

    In the \(L^p_w\) case, the relative bound may not be zero; for example \(V(x) = |x|^{-2}\) as discussed above. Since any \(L^p\) function can be written as the sum of a bounded function and an \(L^p_w\) function of arbitrarily small \(||\cdot ||_{p, w}^*\), the second sentence implies the first.

  2. 2.

    One proof of the \(\nu \ge 5\) result uses a theorem of Stein–Weiss [624] (see [615, Section 6.2]) that if \(f \in L^{\nu /2}_w({\mathbb {R}}^\nu )\) and \(g \in L^{\nu /(\nu -2)}_w({\mathbb {R}}^\nu )\), then \(h \mapsto g*(fh)\) maps \(L^2\) to \(L^2\). Another proof uses Rellich’s inequality and Brascamp–Lieb–Luttinger inequalities (see [71, 520,521,522] or [611]).

  3. 3.

    That one can use \(p = \nu /2\) rather than \(p > \nu /2\) when \(\nu \ge 5\) was noted first by Faris [146].

I’m not sure to whom to attribute the use of sharp Sobolev and Stein–Weiss inequalities. I learned it in about 1968 from a course of lectures of Ed Nelson and it was popularized by Reed–Simon [495].

When Brownell did his work, he was unaware that his results were a consequence of a different approach of Stummel [631] (Brownell thanks the referee for telling him about Stummel’s work). Stummel considered the class, \(S_{\nu ,\alpha }\), of functions, V(x), on \({\mathbb {R}}^\nu \) obeying

$$\begin{aligned} ||f||_{\nu ,\alpha } = \sup _x \int _{|x-y| \le 1} |V(y)|\, |x-y|^{-(\nu -4+\alpha )} d^\nu y \end{aligned}$$
(7.41)

is finite. Here \(\alpha > 0\) and \(\alpha \ge (4-\nu )\), so if \(\nu \le 3\), one has that \(S_{\nu ,4-\nu } = L^2_{unif}\). Stummel [631] proves that if \(V \in S_{\nu ,\alpha }\) with \(\alpha \) as above, then V is \(-\Delta \)-bounded with relative bound 0. This has several advantages over the Kato–Brownell approach:

  1. (a)

    Since \(\int _{|w| \le 1} |w|^{-\beta +\nu } dw_{\kappa +1}\ldots dw_\nu \sim |(w_1,\ldots .w_\kappa )|^{-\beta +\kappa }\) where the tilde means comparable in terms of upper and lower bounds, extra variables go through directly and there is no need for step 3 in Kato’s proof.

  2. (b)

    As we’ve seen, it is uniformly local, i.e. to be in a Stummel class rather than \(L^p\), one only needs \(L^p_{unif}\).

  3. (c)

    By Young’s inequality [612, Theorem 6.6.3], the Brownell \(L^p\) condition implies Stummel’s condition, so Stummel’s result is stronger.

Stummel’s proof relies on the fact that \(((-\Delta )^2+1)^{-1}\) has an integral kernel diverging as \(|x-y|^{-(\nu -4)}\) for \(|x-y|\) small and decaying exponentially for \(|x-y|\) large. As with Brownell’s paper, Stummel’s \(\alpha > 0\) condition isn’t needed if \(\nu \ge 5\). The issue is that instead of using Young’s inequality, one needs to use the stronger Hardy–Littlewood–Sobolev inequalities [615, Theorem 6.2.1] which were not well known in the 1950s. Motivated by Kato’s introduction of the class \(K_\nu \) (see Sect. 9), in [101], I introduced the class \(S_\nu \) which I defined as those measurable V on \({\mathbb {R}}^\nu \) with

$$\begin{aligned} \left\{ \begin{array}{ll} \sup _x \int _{|x-y| \le 1} |V(y)|^2 d^\nu y < \infty , &{} \hbox { if } \nu \le 3 \\ \lim _{\alpha \downarrow 0} \sup _x \int _{|x-y| \le \alpha } \log (|x-y|^{-1}) |V(y)|^2 d^\nu y = 0 , &{} \hbox { if } \nu =4\\ \lim _{\alpha \downarrow 0} \sup _x \int _{|x-y| \le \alpha } |x-y|^{-(\nu -4)} |V(y)|^2 d^\nu y = 0, &{} \hbox { if } \nu \ge 5 \end{array} \right. \end{aligned}$$
(7.42)

Then one has

Theorem 7.10

([101]; Section 1.2) A multiplication operator, \(V \in S_\nu \) is \(-\Delta \)-bounded with relative bound zero. Conversely, if V is a multiplication operator so that for some \(a,b>0\) and some \(\delta \in (0,1)\) and for all \(\epsilon \in (0,1)\) and \(\varphi \in D(-\Delta )\), one has that

$$\begin{aligned} ||V\varphi ||_2^2 \le \epsilon ||\Delta \varphi ||_2^2 + a \exp (b\epsilon ^{-\delta })||\varphi ||_2^2 \end{aligned}$$
(7.43)

then \(V \in S_\nu \).

One key to the proof is a simple necessary and sufficient condition

Theorem 7.11

([101]; Section 1.2) A multiplication operator, V, is in \(S_\nu \) if and only if \(\lim _{E \rightarrow \infty } ||(-\Delta +E)^{-2}|V|^2||_{\infty ,\infty } = 0\) where \(||\cdot ||_{p,p}\) is the operator norm from \(L^p({\mathbb {R}}^\nu )\) to itself.

For example, to get the boundedness, one uses duality and interpolation to see that

$$\begin{aligned} \lim _{E \rightarrow \infty } ||(-\Delta +E)^{-2}|V|^2||_{\infty ,\infty } = 0&\Rightarrow \lim _{E \rightarrow \infty } |||V|(-\Delta +E)^{-2}|V|||_{2,2}=0 \\&\iff \lim _{E \rightarrow \infty } |||V|(-\Delta +E)^{-1}||_{2,2} = 0 \end{aligned}$$

This concludes what I want to say about uses of the Kato–Rellich theorem to study esa of Schrödinger operators. In [314], Kato also remarks on self-adjointness of Dirac Coulomb Hamiltonians, an issue he returned to several times as we’ll see in Sect. 10.

Let \(\alpha _1, \alpha _2, \alpha _3, \alpha _4=\beta \) be four \(4\times 4\) matrices obeying

$$\begin{aligned} \alpha _i\alpha _j+\alpha _j\alpha _i=2\delta _{ij} {\varvec{1}}; \qquad i,j=1,\ldots ,4 \end{aligned}$$
(7.44)

Our Hilbert space is \({\mathcal {H}}=L^2({\mathbb {R}}^3;{\mathbb {C}}^4,d^3x)\) of \({\mathbb {C}}^4\) valued \(L^2\) functions. The free Dirac operator is

$$\begin{aligned} T_0 = \sum _{j=1}^{3} \alpha _j p_j + m \beta ; \qquad p_j=\frac{1}{i}\frac{\partial }{\partial x} \end{aligned}$$
(7.45)

One has, using (7.44), that formally

$$\begin{aligned} T_0^2=-\Delta +m^2 \end{aligned}$$
(7.46)

Using Fourier transform, one can prove that \(T_0\) is esa on \(C^\infty _0({\mathbb {R}}^3;{\mathbb {C}}^4)\) with the domain of the closure being \(\{\varphi \,|\, \int (1+p^2)|\hat{\varphi }(p)|^2\,d^3p< \infty \}\). The Dirac Coulomb operator is

$$\begin{aligned} T=T_0+\frac{\mu }{|x|} \end{aligned}$$
(7.47)

In terms of the nuclear charge, Z, one has that \(\mu =Z\alpha \) where \(\alpha \) is the fine structure constant, \(\alpha ^{-1}=137.035999139\ldots \), so a given \(\mu \) corresponds to \(Z\sim 137\mu \). In [314], Kato notes without proof that his method proves esa of Dirac Coulomb operators on \(C^\infty _0({\mathbb {R}}^3;{\mathbb {C}}^4)\) for \(Z \le 68\). Clearly he had the result for \(\mu < \tfrac{1}{2}\) and 68 is the integral part of \(\tfrac{1}{2}\alpha ^{-1}\). Raised as a physicist, Kato thought of integral Z.

In fact

$$\begin{aligned} \left||\frac{\mu }{r}\varphi \right||^2 \le ||T_0\varphi ||^2+c||\varphi ||^2 \iff \frac{\mu ^2}{r^2}\le T_0^2+c \end{aligned}$$

Hardy’s inequality says that on \({\mathbb {R}}^3\), \((4r^2)^{-1} \le p^2\) (with no larger constant). This and (7.46) shows that \(r^{-1}\) is \(T_0\)-bounded with precise relative bound 2, so the Kato–Rellich Theorem implies self-adjointness if and only if \(\mu < \tfrac{1}{2}\). But (7.47) can be essentially self-adjoint on \(C^\infty _0({\mathbb {R}}^3;{\mathbb {C}}^4)\) even though the Kato–Rellich theorem doesn’t work—in the language of Example 7.8 it can happen that \(\lambda _1\) is strictly smaller than \(\lambda _2\). Indeed, it is known that

Theorem 7.12

(7.47) is esa on \(C^\infty _0({\mathbb {R}}^3;{\mathbb {C}}^4)\) if and only if

$$\begin{aligned} |\mu | \le \tfrac{1}{2}\sqrt{3} \end{aligned}$$
(7.48)

This result is essentially due to Rellich [509] in 1943. He proved it using spherical symmetry and applying the Weyl limit–limit circle theory (Theorem 7.5). We say “essentially” because at the time he did this, the Weyl theory had not been proven for systems and (7.47) is a system. This theory for systems was established by Kodaira [385] in 1951 (see also Weidmann [675]) so Theorem 7.12 should be regarded as due to Rellich–Kodaira. Interestingly enough, Kato seems to have been unaware of this result when he wrote his book (second edition was 1976).

One can also consider \(T_0+V\) where V is not necessarily spherically symmetric and V obeys

$$\begin{aligned} |V(x)| \le \frac{\mu }{|x|} \end{aligned}$$
(7.49)

By Kato’s argument, one can use the Kato–Rellich theorem to get esa when \(|\mu | < \tfrac{1}{2}\). Schmincke [540] proved

Theorem 7.13

Let V obey (7.49) where \(\mu <\tfrac{1}{2}\sqrt{3}\). Then \(T_0+V\) is esa on \(C^\infty _0({\mathbb {R}}^3;{\mathbb {C}}^4)\).

We’ll return to Dirac operators in Sect. 8 and at the end of Sect. 10. Having mentioned a result of Schmincke, I should mention that in the 1970s and early 1980s there was a lively school founded by Günter Hellwig that produced a cornucopia of results on esa questions for Schrödinger and Dirac operators. Among the group were H. Cycon, H. Kalf, U.-W. Schmincke, R. Wüst and J. Walter.

This said, there is a sense in which Kato’s critical value \(\mu =\tfrac{1}{2}\) is connected to loss of esa. Arai [15, 16] has shown that for any \(\mu > \tfrac{1}{2}\) there is a symmetric matrix valued potential Q(x) with \(||Q(x)||=\mu |x|^{-1}\) for all x so that \(T_0+Q\) is not esa on \(C^\infty _0({\mathbb {R}}^3;{\mathbb {C}}^4)\), so Theorems 7.12 and 7.13 depend on scalar potentials.

8 Self-adjointness, II: the Kato–Ikebe paper

Kato was clearly aware that his great 1951 paper didn’t include the Stark Hamiltonian where H isn’t bounded from below, and in fact \(\eta (x) \equiv \int _{|x-y| \le 1} |\min (V(y),0)|^2 dy \rightarrow \infty \) if one takes \(x \rightarrow \infty \) in a suitable direction. For esa, one needs restrictions on the growth of \(\eta \) at infinity (whereas, we’ll see in Sect. 9, if \(|\min (V(y),0)|\) is replaced by \(\max (V(y),0)\), no restriction is needed). To understand this, it is useful to first consider one dimension. Suppose that \(V(x) \rightarrow -\infty \) as \(x \rightarrow \infty \). In classical mechanics, if a particle of mass m starts at \(x=c\) with zero speed, \(V(c) = 0\) and \(V'(x) < 0\) on \((c,\infty )\), the particle will move to the right. By conservation of energy, the speed when the particle is at point \(x > c\) will be \(v(x) = \sqrt{-V(x)}\) if \(\tfrac{1}{2}m = 1\). The time to get from c to \(x_0 > c\) is thus \(\int _{c}^{x_0} \tfrac{dx}{\sqrt{-V(x)}}\). Thus the key issue is whether \(\int _{c}^{\infty } \tfrac{dx}{\sqrt{-V(x)}}\) is finite or not. If it is finite, the particle gets to infinity in finite time and the motion is incomplete. One expects that the quantum mechanical equivalent is that \(-\tfrac{d^2}{dx^2}+V(x)\) is esa if and only if \(\int _{c}^{\infty } \tfrac{dx}{\sqrt{-V(x)}} =\infty \). In particular, if \(V(x) = -\lambda |x|^\alpha \), this suggests esa if and only if \(\alpha \le 2\).

The classical/quantum intuition can fail if V(x) has severe oscillations or interspersed high bumps (see Rauch–Reed [491] or Sears [547] for examples). These esa results for ODEs were studied in the late 1940s using limit point–limit circle methods. Under the non-oscillation assumption (and \(V(x) < 0\))

$$\begin{aligned} \int _{c}^{\infty } \left( \frac{[\sqrt{-V}]'}{(-V)^{3/2}}\right) ' (-V)^{-1/4} \, dx < \infty \end{aligned}$$

(if \(V(x) = -x^\alpha \), the integrand is \(x^{-(5\alpha +8)/4}\), so there have to be severe oscillations for this to fail), Wintner [686] proved in 1947 that \(-\tfrac{d^2}{dx^2}+V(x)\) is limit point at \(\infty \) if and only if \(\int _{c}^{\infty } \tfrac{dx}{\sqrt{-V(x)}} = \infty \). Slightly later, in 1949, Levinson [422] proved that it is limit point at infinity if there is a positive comparison function, M(x), so that \(V(x) > -M(x)\) near infinity, \(M'(x)/M(x)^{3/2}\) bounded and \(\int _{c}^{\infty } \tfrac{dx}{\sqrt{M(x)}} = \infty \). For proofs, see [495].

This suggests that a good condition for esa-\(\nu \) of \(-\Delta +V(x)\) should be

$$\begin{aligned} V(x) \ge -c|x|^2-d \end{aligned}$$
(8.1)

Indeed, in 1959, Nilsson [467] and Wienholtz [684] independently proved that

Theorem 8.1

(Nilsson–Wienholtz) If V(x) is a continuous function of \({\mathbb {R}}^\nu \) obeying (8.1), then \(-\Delta +V\) is esa-\(\nu \).

Further developments (all later than the Ikebe–Kato paper discussed below) are due to Hellwig [235,236,237], Rohde [523,524,525] and Walter [669, 670]. In 1962, Kato and his former student Ikebe [268] studied operators of the form

$$\begin{aligned} -\sum _{j,k=1}^{\nu } c_{jk}(x)\left( \frac{\partial }{\partial x_j}-ia_j\right) \left( \frac{\partial }{\partial x_k}-ia_k\right) + V(x) \end{aligned}$$
(8.2)

where \(c_{jk}(x)\) and \(a_j(x)\) are \(C^2\) functions and for each x, \(c_{jk}(x)\) is a strictly positive matrix. For quantum mechanics, one only considers \(c_{ij}(x) = \delta _{ij}\) (at least if one ignores quantum mechanics on curved manifolds) and our discussion will be limited to that case.

Wienholtz had also considered first order terms but didn’t write it in the form (8.2) which is the right form for quantum physics; \(a_j(x)\) is the vector potential, i.e. B=da is the magnetic field. Ikebe–Kato had the important realization that one needs no global hypothesis on a, i.e. any growth at \(\infty \) of a is allowed. While they had too strong a local hypothesis on local behavior of a (see Sect. 9), their discovery on behavior at \(\infty \) was important.

For V, they supposed that \(V=V_1+V_2\) where \(V_2\) is in a Stummel space, \(S_{\nu ,\alpha },\, \alpha >0\) and \(V_1(x) \ge -q(|x|)\) where q is increasing and obeys \(\int ^{\infty } q(r)^{-1/2} dr = \infty \). Unlike Wienholtz, they could allow local singularities such as atoms in Stark fields.

figure c

Rather than discuss their techniques, I want to sketch two approaches to Wienholtz’s result which allow local singularities and are of especial elegance. For one of them, Kato made an important contribution. The first approach is due to Chernoff [90, 92] as modified by Kato [341] and the second approach is due to Faris–Lavine [148]. Interesting enough, each utilizes a self-adjointness criterion of Ed Nelson but two different criteria that he developed in different contexts. Here is the criteria for Chernoff’s method (which Nelson developed in his study of the relation between unitary group representations and their infinitesimal generators).

Theorem 8.2

(Chernoff–Nelson Theorem) Let A be a self-adjoint operator and \(U_t=e^{itA},\,t \in {\mathbb {R}}\), the induced unitary group. Suppose that \({\mathcal {D}}\) is a dense subspace of \({\mathcal {H}}\) with \({\mathcal {D}}\subset D(A^\ell )\) for some \(\ell = 1,2,\ldots \) and suppose that for all t, we have that \(U_t[{\mathcal {D}}] \subset {\mathcal {D}}\). Then \({\mathcal {D}}\) is a core for \(A,A^2,\ldots ,A^\ell \).

Remarks

  1. 1.

    Recall that Stone’s theorem [616, Theorem 7.3.1] says there is a one-one correspondence between one-parameter unitary groups and self-adjoint operators, via \(U_t=e^{itA},\,t \in {\mathbb {R}}\).

  2. 2.

    Chernoff considers the case \({\mathcal {D}}\subset D^\infty (A) \equiv \cap _\ell D^\ell (A)\) in which case \({\mathcal {D}}\) is a core for \(A^\ell \) for all \(\ell \).

  3. 3.

    Nelson [458] did the case \(\ell =1\) and Chernoff [90] noted his argument can be used for general \(\ell \).

  4. 4.

    The argument is simple. Let for some \(k=1,\ldots ,\ell \). Suppose that \(B^*\psi = i\psi \). Let \(\varphi \in {\mathcal {D}}\) and let \(f(t) = \langle \psi ,U_t\varphi \rangle \). Then since \(U_t\varphi \in D(A^k)\), we have that f is a \(C^k\) function and

    $$\begin{aligned} f^{(k)}(t)&= \langle \psi ,(iA)^kU_t\varphi \rangle = i^k \langle \psi ,BU_t\varphi \rangle \nonumber \\&= i^k \langle B^*\psi ,U_t\varphi \rangle = -i^{k+1} f(t) \end{aligned}$$
    (8.3)

    If \(g(t) = e^{i\alpha t}\), then g solves (8.3) if and only if \((i\alpha )^k = -i^{k+1}\), i.e. \(\alpha ^k = -i\). No solution of this is real, so g is a linear combination of exponentials which grow at different rates at either \(+\infty \) or \(-\infty \), so the only bounded solution is 0. Since \(|f(t)| \le ||\psi ||||\varphi ||\), we conclude that \(f(0)=0\) so \(\psi \perp {\mathcal {D}}\). Since \({\mathcal {D}}\) is dense, \(\psi = 0\), i.e. \(\ker (B^*-i)=\{0\}\). Similarly, \(\ker (B^*+i)=\{0\}\), so B is esa.

Kato proved his famous self-adjointness result to be able to solve the time dependent Schrödinger equation, \(\dot{\psi }_t = -iH\psi _t\). Chernoff turned this argument around! If one can solve the equation \(\dot{\psi }_t = -iA\psi _t\) for a dense set \({\mathcal {D}}\) in \(D^\infty (A)\) and prove that \(\psi _{t=0} \in {\mathcal {D}}\Rightarrow \psi _t \in {\mathcal {D}}\), then by Theorem 8.2, all powers of A are esa on \({\mathcal {D}}\). He combined this with existence and smoothness results of Friedrichs [173] and Lax [418] for hyperbolic equations plus finite propagation speed to show that if A is a hyperbolic equation, then the solution map takes \(C_0^\infty \) to itself.

In particular, since the Dirac equation is hyperbolic, Chernoff proved

Theorem 8.3

(Chernoff [90]) If \(T_0\) is the free Dirac operator, (7.45), and V is a \(C^\infty ({\mathbb {R}}^3)\) function, then \(T = T_0+V\) and all its powers are esa on \(C_0^\infty ({\mathbb {R}}^3;{\mathbb {C}}^4)\).

figure d

Notice that there are no restrictions on the growth of V at \(\infty \). This is an expression of the fact that for the Dirac equation, no boundary condition is needed at infinity—intuitively, this is because the particle cannot get to infinity in finite time because speeds are bounded by the speed of light! Several years after his initial paper, Chernoff [92] used results on solutions of singular hyperbolic equations and proved the following version of the fact that Dirac equations have no boundary condition at infinity:

Theorem 8.4

Let \(T_0\) be the free Dirac equation and \(V \in L^2_{loc}({\mathbb {R}}^3)\) (so \(T_0+V\) is defined on \(C_0^\infty ({\mathbb {R}}^3;{\mathbb {C}}^4)\)). Suppose for each \(x_0 \in {\mathbb {R}}^3\), there is a \(V^{(x_0)}\) equal to V in a neighborhood of \(x_0\) and so that \(T_0+V^{(x_0)}\) is esa on \(C_0^\infty ({\mathbb {R}}^3;{\mathbb {C}}^4)\). Then \(T_0+V\) is esa on \(C_0^\infty ({\mathbb {R}}^3;{\mathbb {C}}^4)\).

Combining this with Schmincke’s result (Theorem 7.13) one gets

Corollary 8.5

Let \(T_0\) be a free Dirac operator. Let V be a measurable function so that for some sequence \(\{x_j\}_{j=1}^N\) (with N finite or infinite) with no finite limit point, we have that

  1. (a)

    There are constants \(\mu _j < \sqrt{3}/2\) and \(C_j\) so that for x near \(x_j\), say x obeys \(|x-x_j| \le \tfrac{1}{2} \min _{k\ne j} |x_j-x_k|\), one has that

    $$\begin{aligned} |V(x)| \le \mu _j |x-x_j|^{-1} + C_j \end{aligned}$$
    (8.4)
  2. (b)

    V is locally bounded near any \(x \notin \{x_j\}_{j=1}^N\).

    Then \(T_0+V\) is esa on \(C_0^\infty ({\mathbb {R}}^3;{\mathbb {C}}^4)\).

Other results on esa for Dirac operators which are finite sums of Coulomb potentials include [293, 305, 374, 408,409,410, 463].

At first sight, this lovely idea seems to have nothing to do with Schrödinger operators since that equation isn’t hyperbolic; after all it has infinite propagation speed and even for the free case, the dynamical unitary group doesn’t leave the \(C_0^\infty \) functions fixed. But the wave equation

$$\begin{aligned} \frac{\partial ^2 u}{\partial t^2}=(\Delta - V) u \end{aligned}$$
(8.5)

is hyperbolic (and has finite propagation speed, namely 1). It is second order in t but can be written as a first order equation:

$$\begin{aligned} v=\frac{\partial u}{\partial t}, \qquad \frac{\partial v}{\partial t} = -Bu, \qquad B=-\Delta +V \end{aligned}$$
(8.6)

or equivalently

$$\begin{aligned} \frac{\partial }{\partial t}\left( \begin{array}{c} u \\ v \\ \end{array} \right) =-iA \left( \begin{array}{c} u \\ u \\ \end{array} \right) ; \qquad -iA = \left( \begin{array}{cc} 0 &{} {\varvec{1}}\\ -B &{} 0 \\ \end{array} \right) \end{aligned}$$
(8.7)

If V is in \(C^\infty ({\mathbb {R}}^\nu )\), one can use hyperbolic theory to prove solutions exist for \((u(0),v(0)) \in C_0^\infty ({\mathbb {R}}^\nu ) \times C_0^\infty ({\mathbb {R}}^\nu )\) and the solution remains in this space. To apply Theorem 8.2, we need this dynamics to be unitary. The energy

$$\begin{aligned} E(u,v) = \langle v,v \rangle +\langle u,Bu \rangle \end{aligned}$$
(8.8)

is formally conserved, so it is natural to use E as the square of a Hilbert space norm. For this to work, one needs that \(B \ge c{\varvec{1}}\) with \(c > 0\). Actually, so long as B is bounded from below we can add a constant to B so that \(B \ge {\varvec{1}}\) which we’ll assume. When this is so, one can prove that on the Hilbert space \(L^2({\mathbb {R}}^\nu )\oplus Q(-\Delta +V)\) (where Q is the quadratic form of the Friedrichs extension as discussed in Sect. 10), \(e^{-itA}\) with A given by (8.7) is a unitary group which leaves \({\mathcal {D}}= C_0^\infty ({\mathbb {R}}^\nu )\oplus C_0^\infty ({\mathbb {R}}^\nu )\) invariant and with \({\mathcal {D}}\subset D^\infty (A)\). We note that

$$\begin{aligned} A^2 = -(iA)^2 = \left( \begin{array}{cc} B &{} 0 \\ 0 &{} B \\ \end{array} \right) \end{aligned}$$
(8.9)

on \({\mathcal {D}}\). We have thus related the Schrödinger equation to the square of a hyperbolic equation so we can use Chernoff’s idea to conclude that

Theorem 8.6

If V is \(C^\infty ({\mathbb {R}}^\nu )\), so that \(-\Delta +V\) is bounded from below on \(C_0^\infty ({\mathbb {R}}^\nu )\), i.e. for some c and all \(u \in C_0^\infty ({\mathbb {R}}^\nu )\)

$$\begin{aligned} \langle u,(-\Delta +V)u \rangle \ge c\langle u,u \rangle \end{aligned}$$
(8.10)

then \(-\Delta +V\) is esa-\(\nu \).

Remarks

  1. 1.

    This proof of the result appeared in Chernoff [90], but the result itself appeared earlier in Povzner [485] and Wienholtz [683].

  2. 2.

    In his second paper, Chernoff [92] handled singular V’s and also used the idea of Kato we’ll describe shortly and also Kato’s inequality ideas (see Sect. 9). He proved that \(-\Delta +V\) is esa-\(\nu \) if \(V=U-W\) with \(U, W \ge 0\), \(U \in L^2_{loc}({\mathbb {R}}^\nu ), W \in L^p_{loc}({\mathbb {R}}^\nu )\) (with p \(\nu \)-canonical) and \(-\Delta +V+cx^2\) bounded from below for some \(c>0\).

In [341], Kato showed how to modify Chernoff’s argument to extend Theorem 8.6 to replace the condition that \(-\Delta +V\) is bounded from below by the condition that for some \(c>0\), one has that \(-\Delta +V+cx^2\) is bounded from below (and thereby gets a Wienholtz–Ikebe–Kato type of result). Kato’s idea (when \(c=1\)) was to solve \(\tfrac{\partial ^2 u}{\partial t^2} = (\Delta -V)u-4t^2u\). He was able to prove that \(||u(t)||_2\) (which is bounded in the case \(-\Delta +V\) is bounded below) doesn’t grow worse than \(|t|^3\) and then push through a variant of the Chernoff–Nelson argument (since a \(|t|^3\) bound can eliminate exponential growth).

This completes our discussion of the Chernoff approach. The underlying self-adjointness criterion of Nelson needed for the Faris–Lavine approach is

Theorem 8.7

(Nelson’s Commutator Theorem [462]) Let AN be two symmetric operators so that N is self-adjoint with \(N \ge 1\). Suppose that \(D(N) \subset D(A)\) and there are constants \(c_1\) and \(c_2\) so that for all \(\varphi ,\psi \in D(N)\) we have that

$$\begin{aligned}&|\langle \varphi ,A\varphi \rangle | \le c_1 \langle \varphi ,N\varphi \rangle \end{aligned}$$
(8.11)
$$\begin{aligned}&|\langle A\varphi ,N\varphi \rangle - \langle N\varphi ,A\varphi \rangle | \le c_2\langle \varphi ,N\varphi \rangle \end{aligned}$$
(8.12)

Then A is esa on any core for N.

Remarks

  1. 1.

    The name comes from the fact that \(\langle A\psi ,N\varphi \rangle - \langle N\psi ,A\varphi \rangle = \langle \psi , [N,A] \varphi \rangle \) if \(N\varphi \in D(A)\) and \(A\varphi \in D(N)\).

  2. 2.

    Nelson [462] was motivated by Glimm–Jaffe [193] which also required bounds on [N, [NA]] which would not apply to the Faris–Lavine choices without extra conditions on V.

  3. 3.

    For a proof, see Nelson [462] or Reed–Simon [495, Theorem X.36].

To illustrate the use of this theorem, here is a special case of the Faris–Lavine theorem (see Faris–Lavine [148] or Reed–Simon [495, Theorem X.38] for the full theorem) that gives a \(V(x) \ge -x^2\) type of result:

Theorem 8.8

(Faris–Lavine [148]) Let \(V(x)\in L^2_{loc}({\mathbb {R}}^\nu )\) and obey:

$$\begin{aligned} V(x) \ge -cx^2-d \end{aligned}$$
(8.13)

Then \(-\Delta +V\) is esa-\(\nu \).

Proof

By a simple argument, we can assume \(c=1, d=0\). Let \(N=-\Delta +V+2x^2\) by which we mean the closure of that sum on \(C_0^\infty ({\mathbb {R}}^\nu )\). Let A be the operator closure of . By Theorem 9.1 below, N is self-adjoint. \(N-A = 2x^2 \ge 0\) while \(N+A = -\Delta + (2V(x)+2x^2) \ge 0\) so \(\pm A \le N\) which is (8.10).

The same method that proved (3.33) implies an estimate \(||x^2\varphi || \le a||N\varphi ||\) on \(C_0^\infty ({\mathbb {R}}^\nu )\) so \(\varphi \in D(N) \Rightarrow x^2\varphi \in L^2 \Rightarrow \varphi \in D(A)\). Thus \(D(N) \subset D(A)\).

By (8.13) \(N \ge -\Delta + x^2 \ge \pm (x\cdot p+p\cdot x)\) (by completing the square). Note that

$$\begin{aligned} i[N,-\Delta +V]&= i[2x^2,-\Delta +V] \\&= 2i[x^2,p^2] \\&= -4(x\cdot p+p\cdot x) \end{aligned}$$

so \(|\langle N\varphi ,A\varphi \rangle -\langle A\varphi ,N\varphi \rangle | \le c\langle \varphi ,N\varphi \rangle \). We can apply Theorem 8.7 to see that \(-\Delta +V\) is esa-\(\nu \). \(\square \)

9 Self-adjointness, III: Kato’s inequality

This section will discuss a self-adjointness method that appeared in Kato [340] based on a remarkable distributional inequality. Its consequences is a subject to which Kato returned often with at least seven additional papers [73, 343, 348, 349, 351, 355, 356]. It is also his work that most intersected my own—I motivated his initial paper and it, in turn, motivated several of my later papers. Throughout this section, we’ll use quadratic form ideas that we’ll only formally discuss in Sect. 10 (see [616, Section 7.5]).

To explain the background, recall that in Sect. 7, we defined p to be \(\nu \)-canonical (\(\nu \) is dimension) if \(p=2\) for \(\nu \le 3\), \(p > 2\) for \(\nu = 4\) and \(p = \nu /2\) for \(\nu \ge 5\). For now, we focus on \(\nu \ge 5\) so that \(p=\nu /2\). As we saw, if \(V \in L^p({\mathbb {R}}^\nu )+L^\infty ({\mathbb {R}}^\nu )\), then \(-\Delta +V\) is esa-\(\nu \). The example \(V(x) = - \lambda |x|^{-2}\) for \(\lambda \) sufficiently large shows that \(p=\nu /2\) is sharp. That is, for any \(2 \le q \le \nu /2\), there is a \(V \in L^q({\mathbb {R}}^\nu )+L^\infty ({\mathbb {R}}^\nu )\), so that \(-\Delta +V\) is defined on but not esa on \(C_0^\infty ({\mathbb {R}}^\nu )\).

In these counterexamples, though, V is negative. It was known since the late 1950s (see Sect. 8) that while the negative part of V requires some global hypothesis for esa-\(\nu \), the positive part does not (e.g. \(-\Delta -x^4\) is not esa-\(\nu \) while \(-\Delta +x^4\) is esa-\(\nu \)). But when I started looking at these issues around 1970, there was presumption that for local singularities, there was no difference between the positive and negative parts. In retrospect, this shouldn’t have been the belief! After all, as we’ve seen (see the Remarks after Proposition 7.7), limit point–limit circle methods show that if \(V(x) = |x|^{-\alpha }\) with \(\alpha < \nu /2\) (to make \(V \in L^2_{loc}\) so that \(-\Delta +V\) is defined on \(C_0^\infty ({\mathbb {R}}^\nu \))) then \(-\Delta +V\) is esa-\(\nu \) although \(-\Delta -V\) is not. (Limit point–limit circle methods apply for \(-\Delta +V\) for any \(\alpha \) if we look at \(C_{00}^\infty ({\mathbb {R}}^\nu )\) but then only when \(\alpha < \nu /2\), we can extend the conclusion to \(C_0^\infty ({\mathbb {R}}^\nu )\).) This example shows that the conventional wisdom was faulty but people didn’t think about separate local conditions on

$$\begin{aligned} V_+(x) \equiv \max (V(x),0); \qquad V_-(x) = \max (-V(x),0) \end{aligned}$$
(9.1)

Kato’s result shattered the then conventional wisdom:

Theorem 9.1

(Kato [340]) If \(V \ge 0\) and \(V \in L^2_{loc}({\mathbb {R}}^\nu )\), then \(-\Delta +V\) is esa-\(\nu \).

Remark

As we’ll see later, this extends, for example, to \(V_+\in L^2_{loc}, V_-\in L^p_{unif}\) with p \(\nu \)-canonical

Kato’s result was actually a conjecture that I made on the basis of a slightly weaker result that I had proven:

Theorem 9.2

(Simon [575]) If \(V \ge 0\) and \(V \in L^2({\mathbb {R}}^\nu ,e^{-cx^2}\,d^\nu x)\) for some \(c > 0\), then \(-\Delta +V\) is esa-\(\nu \).

Of course this covers pretty wild growth at infinity but Theorem 9.1 is the definitive result since one needs that \(V \in L^2_{loc}({\mathbb {R}}^\nu )\) for \(-\Delta +V\) to be defined on all functions in \(C_0^\infty ({\mathbb {R}}^\nu )\).

I found Theorem 9.2 because I was also working at the time in constructive quantum field theory which was then studying the simplest interacting field models \(\varphi ^4_2\) and \(P(\varphi )_2\) (the subscript 2 means two space–time dimensions). To start with, one wanted to define \(H_0+V\) where \(H_0\) was a positive mass free quantum field Hamiltonian and V a spatially cutoff interaction. Nelson [461] realized that one could view \(H_0\) as an infinite sum of independent harmonic oscillators (shifted to have ground state energy 0) which he analyzed as follows: For a single variable oscillator on \(L^2({\mathbb {R}},dx)\), there is a unit vector \(\Omega _0\) with \(H_0\Omega _0=0\). The map \(Uf \mapsto f\Omega _0^{-1}\) maps \(L^2({\mathbb {R}},dx)\) unitarily to \(L^2({\mathbb {R}},\Omega _0^2 \, dx)\) and Nelson analyzed \(A_0=UH_0U^{-1}\) on \(L^2({\mathbb {R}},\Omega ^2\,dx)\) and found (with \(d\mu =\Omega ^2\,dx\) a probability measure on \(X={\mathbb {R}}\)) that

$$\begin{aligned} ||e^{-tA_0}\varphi ||_p&\le ||\varphi ||_p \qquad \text {all } \varphi \in L^p(X,d\mu ), \text { all } t>0 \end{aligned}$$
(9.2)
$$\begin{aligned} ||e^{-TA_0}\varphi ||_4&\le B||\varphi ||_2 \qquad T \text { large enough} \end{aligned}$$
(9.3)

By taking products, he got similar bounds on the infinite dimensional spaces of the field theory (he was restricted to a field theory with a periodic boundary condition but Glimm [191] did the full theory). Eventually, semigroups, \(e^{-tA_0}\), obeying (9.2)/(9.3) were called hypercontractive semigroups. [615, Section 6.6] has a lot on the general theory and the history.

Nelson also proved that the V of the cutoff field theory wasn’t bounded below but it did obey

$$\begin{aligned} V \in L^p(X,d\mu ),\, p<\infty \text { and } e^{-sV} \in L^1(X,d\mu ), \text { all } s>0 \end{aligned}$$
(9.4)

He also showed that (9.2), (9.3), (9.4) \(\Rightarrow A_0+V\) is bounded from below on \(D(A_0)\cap D(V)\).

Segal [548, 549] then proved that these same hypotheses imply that \(A_0+V\) is esa on \(D(A_0)\cap D(V)\) (for the field theory case Glimm–Jaffe [192] and Rosen [527] using Nelson’s estimates but additional properties had earlier proven esa for this specific situation).

Simon–Høegh Krohn [618] systematized these results and showed that if \(V \ge 0\), one can replace \(V \in L^p\) for some \(p>2\) by \(V \in L^2(X,d\mu )\). The Simon–Høegh Krohn paper was written in 1970. In 1972, I realized that by looking at \(-\Delta +x^2\) on \(L^2({\mathbb {R}}^\nu )\), one could prove that if \(V \ge 0\) and \(V \in L^2({\mathbb {R}}^\nu ,e^{-x^2}\,dx)\), then \(-\Delta +V+x^2\) is esa-\(\nu \). Arguments like those that proved (3.33), using that \([x_i,[x_i,-\Delta +V+x^2]]\) is a constant, show that one has that

$$\begin{aligned} ||x^2\varphi ||^2 \le ||(-\Delta +V+x^2)\varphi ||^2 + b ||\varphi ||^2 \end{aligned}$$
(9.5)

so by Wüst’s theorem (see the discussion around (7.14)), one sees that \(-\Delta +V=-\Delta +V+x^2 - x^2\) is esa-\(\nu \). This idea of adding an operator C to \(A+B\) so that C is \(A+C+B\) bounded with relative bound one so one can use Wüst theorem is called Konrady’s trick after Konrady [386]

Within a few weeks of my sending out a preprint with Theorem 9.2 and the conjecture of Theorem 9.1, I received a letter from Kato proving the conjecture by what appeared to be a totally different method. Over the next few years, I spent some effort understanding the connection between Kato’s work and semigroups. I will begin the discussion here by sketching a semigroup proof of Theorem 9.1, then give Kato’s proof of this theorem, then discuss semigroup aspects of Kato’s inequality and finally discuss some other aspects of Kato’s paper [340].

After the smoke cleared, it was apparent that my failure to get the full Theorem 9.1 in 1972 was due to my focusing on \(L^p\) properties of semigroups on probability measure spaces rather than on \(L^p({\mathbb {R}}^\nu ,d^\nu x)\). As a warmup to the semigroup proof of Theorem 9.1, we prove (we use quadratic form ideas only discussed in Sect. 10)

Theorem 9.3

(Simon [595]) Let \(V \ge 0\) be in \(L^1_{loc}({\mathbb {R}}^\nu ,d^\nu x)\) and let \(a \in L^2_{loc}({\mathbb {R}}^\nu ,d^\nu x)\) be an \({\mathbb {R}}^\nu \) valued function. Let \(Q(D_j^2) = \{\varphi \in L^2({\mathbb {R}}^\nu ,d^\nu x)\,|\, (\nabla _j-ia_j)\varphi \in L^2({\mathbb {R}}^\nu ,d^\nu x)\}\) with quadratic form \(\langle \varphi ,-D_j^2\varphi \rangle = ||(\nabla _j-ia_j)\varphi ||^2\). Let h be the closed form sum \(\sum _{j=1}^{\nu } -D_j^2+V\). Then \(C_0^\infty ({\mathbb {R}}^\nu )\) is a form core for h.

Remarks

  1. 1.

    For \(a=0\), this result was first proven by Kato [343], although [616] mistakenly attributes it to Simon.

  2. 2.

    Kato [348] proved this result if \(a \in L^2_{loc}\) is replaced by \(a \in L^\nu _{loc}\) and he conjectured this theorem.

  3. 3.

    Since \(a_j \in L^2_{loc}\), we have that \(a_j\varphi \in L^1_{loc}\) so \((\nabla _j-ia_j)\varphi \) is a well defined distribution and it makes sense to say that it is in \(L^2\).

  4. 4.

    Just as \(V \in L^2_{\mathrm{loc}}\) is necessary for \(H\varphi \) to lie in \(L^2\) for all \(\varphi \in C_0^\infty ({\mathbb {R}}^\nu )\), \(V \in L^1_{loc}\) and \(a \in L^2_{loc}\) are necessary for \(C_0^\infty \subset V_h\).

  5. 5.

    There is an analog of Theorem 9.1 with magnetic field. If \(V \ge 0\), one needs to have \(V \in L^2_{loc}, \, a \in L^4_{loc}\) and \(\nabla \cdot \overrightarrow{a} \in L^2_{loc}\) for H to be defined as an operator on \(C_0^\infty \). It is a theorem of Leinfelder–Simader [420] that this is also sufficient for esa-\(\nu \) (see [101, Section 1.4] for a proof along the lines discussed below for the current theorem).

  6. 6.

    Kato [343] has a lovely way of interpreting that \(C_0^\infty \) is a form core. A natural maximal operator domain for the operator associated with h is \(H_{max}\) defined on (here \(V_h=Q(V)\cap \bigcap _{j=1}^\nu Q(D_j^\nu )\))

    $$\begin{aligned} D(H_{max}) = V_h \cap \{\varphi \,|\, \sum _{j=1}^{\nu } -D_j^2\varphi +V\varphi \in L^2({\mathbb {R}}^\nu )\} \end{aligned}$$
    (9.6)

    Since \(\varphi \in V_h\), we have that \(D_j\varphi \in L^2\) which implies that \(a_jD_j\varphi \in L^1_{loc}\) and \(\nabla _j D_j\varphi \) makes sense as a distribution. Also \(\varphi \in V_h \Rightarrow V^{1/2}\varphi \in L^2 \Rightarrow V\varphi = V^{1/2}(V^{1/2}\varphi ) \in L^1_{loc}\) so \(-D_j^2\varphi +V\varphi \) is a well defined distribution. What Kato shows is that if H is the operator associated to the closed form, h, then \(H_{max}\) symmetric \(\iff H_{max} = H \iff C_0^\infty \) is a form core for h.

Here is a sketch of a proof of Theorem 9.3 following [595]

Step 1. Use Kato’s ultimate Trotter product formula of Sect. 18 in Part 2 (for \(\nu + 1\) rather than 2 operators, so one needs the result of Kato–Masuda [364]; we note these results weren’t available in 1972 but they are only needed for the case \(a \ne 0\)) to see that

$$\begin{aligned} |(e^{-tH}\varphi )(x)| \le \left( |e^{t\Delta }|\varphi |\right) (x) \end{aligned}$$
(9.7)

which is implied by

$$\begin{aligned} |(e^{-tV}\varphi )(x)|&\le |\varphi |(x) \end{aligned}$$
(9.8)
$$\begin{aligned} |(e^{tD_j^2}\varphi )(x)|&\le \left( |e^{t\partial ^2_j}|\varphi |\right) (x) \end{aligned}$$
(9.9)

(We note that (9.7) is called a diamagnetic inequality; we’ll say more about its history below.)

Step 2. This step proves (9.9). Since \(V \ge 0\), (9.8) is trivial. Define

$$\begin{aligned} \lambda _j(x) = \int _{0}^{x_j} a_j(x_1,\ldots ,x_{j-1},s,x_{j+1},\ldots ,x_\nu ) \, ds \end{aligned}$$

so \(\partial _j \lambda _j = a_j\) in distributional sense. One proves that \(D_j=e^{i\lambda _j}\partial _j e^{-i\lambda _j}\) in the sense that \(\varphi \mapsto e^{-i\lambda _j}\) maps \(D(D_j)\) to \(D(\partial _j)\) and the unitary map \(U:\varphi \mapsto e^{-i\lambda _j}\varphi \) obeys \(e^{tD_j^2} = U e^{t\partial _j^2} U^{-1}\). From this and the fact that \(e^{t\partial _j^2}\) is positivity preserving, (9.9) follows. From the point of view of physics, we exploit the fact that 1D magnetic fields can be “gauged away”.

Step 3. Let \(g \in C_0^\infty ({\mathbb {R}}^\nu )\). Then \(\varphi \mapsto g\varphi \) maps Q(H) to itself. Moreover, if \(g(x) = 1\) for \(|x| \le 1\) and \(g_n(x)=g(x/n)\), then for any \(\varphi \in Q(H)\) we have that \(g_n\varphi \rightarrow \varphi \) in the form norm of H. Since \(V^{1/2}\varphi \in L^2 \Rightarrow gV^{1/2}\varphi \in L^2\) and \(||(g_n-1)V^{1/2}\varphi ||_2 \rightarrow 0\), we see that the V pieces behave as claimed. Moreover, \(D_j(g\varphi ) = gD_j\varphi +(\partial _jg)\varphi \) as distributions, so \(D_j\varphi ,\varphi \in L^2 \Rightarrow D_j(g\varphi ), g\varphi \in L^2\) and since \(||\partial _j g_n||_\infty \le Cn^{-1}\), we get the required convergence.

Step 4. Since \(e^{t\Delta }\) maps \(L^2\) to \(L^\infty \), by (9.7), we have that \(e^{-H}[L^2]\), which is a form core for H, lies in \(L^\infty \). We conclude by step 3 that \(\{\varphi \in Q(H)\,|\, \varphi \in L^\infty \) and \(\varphi \) has compact support\(\}\) is a core for H.

Step 5. We haven’t yet used \(V \in L^1_{loc}\) in that the above arguments work, for example, if \(V(x) = |x|^{-\beta }\) for any \(\beta > 0\). We now want to look at \(k*\varphi \) for \(k \in C_0^\infty ({\mathbb {R}}^\nu )\) and for \(\beta > \nu \) it is easy to see that \(\varphi \mapsto k*\varphi \) does not leave \(Q(|x|^{-\beta })\) invariant (since such functions must vanish at \(x=0\)).

If \(\varphi \) is bounded with compact support and \(V \in L^1_{loc}\) it is easy to see that for \(k \in C_0^\infty ({\mathbb {R}}^\nu )\), we have that \(V^{1/2}(k*\varphi ) \in L^2\) and if \(k_n\) is an approximate identity, that \(||V^{1/2}(k_n*\varphi )-V^{1/2}\varphi ||\rightarrow 0\). Similarly, if \((\partial _j-ia_j)\varphi \in L^2\) and \(\varphi \) bounded with compact support, then \(\partial _j\varphi \in L^2\) so \(D_j(k*\varphi ) \in L^2\) and if \(k_n\) is an approximate identity, then \({||D_j(k_n*\varphi )-D_j\varphi || \rightarrow 0}\). It follows that \(C_0^\infty ({\mathbb {R}}^\nu )\) is a form core concluding this sketch of the proof of Theorem 9.3.

Next, we provide our first proof of Theorem 9.1 following [595]. So we have, \(V \ge 0\), \(V \in L^2_{loc}\) and \(a=0\). By the just proven Theorem 9.3 and Remark 5 after the statement of the theorem:

$$\begin{aligned} D(H)=\{\varphi \in L^2\,|\, \nabla \varphi \in L^2, V^{1/2}\varphi \in L^2, -\Delta \varphi +V\varphi \in L^2\} \end{aligned}$$
(9.10)

where \(-\Delta \varphi +V\varphi \) is viewed as a sum of distributions. If \(g \in C_0^\infty ({\mathbb {R}}^\nu )\) and \(\varphi \in D(H)\), then

$$\begin{aligned} H(g\varphi ) = g(H\varphi ) - 2\nabla g \cdot \nabla \varphi - (\Delta g) \varphi \end{aligned}$$

so \(\varphi \mapsto g\varphi \) maps D(H) to itself with \(g_n\varphi \rightarrow \varphi \) (\(g_n(x) = g(x/n); g(x) \equiv 1\) for x near 0) in graph norm for any \(\varphi \in D(H)\). Moreover, as above, \(e^{-tH}[L^2] \subset L^\infty \) and is an operator core for H. It follows that the set of bounded, compact support functions in D(H) is a core. For any such function, it is easy to see that if \(h_n\) is an approximate identity, then \(h_n*\varphi \rightarrow \varphi \) in graph norm so we conclude esa-\(\nu \) completing the first proof of Theorem 9.1.

We next turn to Kato’s original approach to proving his theorem, Theorem 9.1. He proved

Theorem 9.4

(Kato’s inequality) Let \(u \in L^1_{loc}({\mathbb {R}}^\nu )\) be such that its distributional Laplacian, \(\Delta u\) is also in \(L^1_{loc}({\mathbb {R}}^\nu )\). Define

$$\begin{aligned} \mathrm {sgn}(u)(x) = \left\{ \begin{array}{ll} \overline{u(x)}/|u(x|), &{} \hbox { if } u(x) \ne 0 \\ 0, &{} \hbox { if } u(x) = 0 \end{array} \right. \end{aligned}$$
(9.11)

(so \(u\,\mathrm {sgn}(u) = |u|\)). Then as distributions

$$\begin{aligned} \Delta |u| \ge {{\mathrm{Re}}}\left[ \mathrm {sgn}(u) \Delta u\right] \end{aligned}$$
(9.12)

Remarks

  1. 1.

    What we call \(\mathrm {sgn}(u)\), Kato calls \(\mathrm {sgn}(\bar{u})\).

  2. 2.

    We should pause to emphasize what a surprise this was. Kato was a long established master of operator theory. He was 55 years old. Seemingly from left field, he pulled a distributional inequality out of his hat. It is true, like other analysts, that he’d been introduced to distributional ideas in the study of PDEs, but no one had ever used them in this way. Truly a remarkable discovery.

The proof is not hard. By replacing u by \(u*h_n\) with \(h_n\) a smooth approximate identity and taking limits (using \(\mathrm {sgn}(u*h_n)(x) \rightarrow \mathrm {sgn}(u)(x)\) for a.e. x and using a suitable dominated convergence theorem), we can suppose that u is a \(C^\infty \) function. In that case, for \(\epsilon > 0\), let \(u_\epsilon = (\bar{u}u + \epsilon ^2)^{1/2}\). From \(u_\epsilon ^2=\bar{u}u+\epsilon ^2\), we get that

$$\begin{aligned} 2u_\epsilon \overrightarrow{\nabla } u_\epsilon = 2{{\mathrm{Re}}}(\bar{u}\overrightarrow{\nabla } u) \end{aligned}$$
(9.13)

which implies (since \(|\bar{u}| \le u_\epsilon \)) that

$$\begin{aligned} |\overrightarrow{\nabla }u_\epsilon | \le |\overrightarrow{\nabla }u| \end{aligned}$$
(9.14)

Applying \(\tfrac{1}{2}\overrightarrow{\nabla }\cdot \) to (9.13), we get that

$$\begin{aligned} u_\epsilon \Delta u_\epsilon + |\overrightarrow{\nabla }u_\epsilon |^2 = {{\mathrm{Re}}}(\bar{u}\Delta (u)) + |\overrightarrow{\nabla }u|^2 \end{aligned}$$
(9.15)

Using (9.14) and letting \(\mathrm {sgn}_\epsilon (u) = \bar{u}/u_\epsilon \), we get that

$$\begin{aligned} \Delta u_\epsilon \ge {{\mathrm{Re}}}(\mathrm {sgn}_\epsilon (u)\Delta u) \end{aligned}$$
(9.16)

Taking \(\epsilon \downarrow 0\) yields (9.12).

Once we have (9.12), here is Kato’s proof of Theorem 9.1 (the second proof that we sketch). Consider T, the operator closure of \(-\Delta +V\) on \(C_0^\infty ({\mathbb {R}}^\nu )\). \(T \ge 0\), so, by a simple argument ([495, Corollary to Theorem X.1]), it suffices to show that \({\mathrm{ran}}(T+{\varvec{1}})={\mathcal {H}}\) or equivalently, that \(T^*u=-u \Rightarrow u=0\). So suppose that \(u \in L^2({\mathbb {R}}^\nu )\) and that

$$\begin{aligned} T^*u = -u \end{aligned}$$
(9.17)

Since \(T^*\) is defined via distributions, (9.17) implies that

$$\begin{aligned} \Delta u = (V+1)u \end{aligned}$$
(9.18)

Since u and \(V+1\) are both in \(L^2_{loc}\), we conclude that \(\Delta u \in L^1_{loc}\) so by Kato’s inequality

$$\begin{aligned} \Delta |u| \ge (\mathrm {sgn}(u))(V+1)u = |u|(V+1) \ge |u| \end{aligned}$$
(9.19)

Convolution with non-negative functions preserves positivity of distributions, so for any non-negative \(h \in C_0^\infty ({\mathbb {R}}^\nu )\), we have that

$$\begin{aligned} \Delta (h*|u|) = h*\Delta |u| \ge h*|u| \end{aligned}$$
(9.20)

Since \(u \in L^2\), \(h*u\) is a \(C^\infty \) function with classical Laplacian in \(L^2\), so \(h*u \in D(-\Delta )\). \((-\Delta +1)^{-1}\) has a positive integral kernel, so (9.20)\(\Rightarrow (-\Delta +1)(h*|u|) \le 0 \Rightarrow h*|u| \le 0 \Rightarrow h*|u| = 0\). Taking \(h_n\) to be an approximate identity, we have that \(h_n*u \rightarrow u\) in \(L^2\), so \(u=0\) completing the proof.

At first sight, Kato’s proof seems to have nothing to do with the semigroup ideas used in the proof of Theorem 9.2 and our first proof of Theorem 9.1. But in trying to understand Kato’s work, I found the following abstract result:

Theorem 9.5

(Simon [582]) Let A be a positive self-adjoint operator on \(L^2(M,d\mu )\) for a \(\sigma \)-finite, separable measure space \((M,\Sigma ,d\mu )\). Then the following are equivalent:

  1. (a)

    (\(e^{-tA}\) is positivity preserving)

    $$\begin{aligned} \forall u \in L^2,\, u\ge 0, t \ge 0 \Rightarrow e^{-tA}u \ge 0 \end{aligned}$$
  2. (b)

    (Beurling–Deny criterion) \(u \in Q(A) \Rightarrow |u| \in Q(A)\) and

    $$\begin{aligned} q_A(|u|) \le q_A(u) \end{aligned}$$
    (9.21)
  3. (c)

    (Abstract Kato Inequality) \(u \in D(A) \Rightarrow |u| \in Q(A)\) and for all \(\varphi \in Q(A)\) with \(\varphi \ge 0\), one has that

    $$\begin{aligned} \langle A^{1/2}\varphi ,A^{1/2}|u| \rangle \ge {{\mathrm{Re}}}\langle \varphi ,\mathrm {sgn}(u) Au \rangle \end{aligned}$$
    (9.22)

The equivalence of (a) and (b) for M a finite set (so A is a matrix) is due to Beurling–Deny [54]. For a proof of the full theorem (which is not hard), see Simon [582] or [616, Theorem 7.6.4].

In his original paper, Kato [340] proved more than (9.12). He showed that

$$\begin{aligned} \Delta |u| \ge {{\mathrm{Re}}}\left[ \mathrm {sgn}(u)(\overrightarrow{\nabla }-i\overrightarrow{a})^2u\right] \end{aligned}$$
(9.23)

In [340], he required that \(\overrightarrow{a}\) to be \(C^1({\mathbb {R}}^\nu )\) but he implicitly considered less regular \(\overrightarrow{a}\)’s in [348]. For smooth a’s, one gets (9.23) as we got (9.12). Since \({{\mathrm{Re}}}(\bar{u}(-ia)u) = 0\), (9.13), with \(D=\nabla -ia\) implies that

$$\begin{aligned} u_\epsilon \nabla u_\epsilon = {{\mathrm{Re}}}(\bar{u}Du) \end{aligned}$$
(9.24)

which implies that

$$\begin{aligned} |\nabla u_\epsilon | \le |Du| \end{aligned}$$
(9.25)

Note next that

$$\begin{aligned} \nabla _j(\bar{u}D_j u) = \left[ (\nabla _j+ia_j)\bar{u}\right] D_ju+\bar{u}D_j^2u \end{aligned}$$

since \(ia_j\bar{u}D_ju+\bar{u}(-ia_j)D_ju=0\). Thus applying \(\overrightarrow{\nabla }\) to (9.24) yields

$$\begin{aligned} u_\epsilon \Delta u_\epsilon + |\nabla u_\epsilon |^2 = |D u|^2+{{\mathrm{Re}}}(\bar{u}D^2u) \end{aligned}$$
(9.26)

By (9.25), we get (9.23).

In [340], Kato followed his arguments to get Theorem 9.1 with \(-\Delta +V\) replaced by \(-(\nabla -ia)^2+V\) when \(a \in C^1({\mathbb {R}}^\nu ), V \in L^2_{loc}({\mathbb {R}}^\nu ), V \ge 0\). But there was a more important consequence of (9.23) than a self-adjointness result. In [580], I noted that (9.23) implies, by approximating |u| by positive \(\varphi \in C_0^\infty ({\mathbb {R}}^\nu )\), that

$$\begin{aligned} \langle |u|,\Delta |u| \rangle \ge \langle u,D^2u \rangle \end{aligned}$$

which implies that

$$\begin{aligned} \langle u,(-D^2+V)u \rangle \ge \langle |u|,(-\Delta +V)|u| \rangle \end{aligned}$$
(9.27)

This in turn implies that turning on a magnetic field always increases the ground state energy (for spinless bosons), something I called universal diamagnetism.

If one thinks of this as a zero temperature result, it is natural to expect a finite temperature result (that is, for, say, finite matrices, one has that \(\lim _{\beta \rightarrow \infty } -\beta ^{-1}{\mathrm{Tr}}(e^{-\beta A}) = \inf \sigma (A)\) which in statistical mechanical terms is saying that as the temperature goes to zero, the free energy approaches a ground state energy).

$$\begin{aligned} {\mathrm{Tr}}(e^{-tH(a,V)}) \le {\mathrm{Tr}}(e^{-tH(a=0,V)}) \end{aligned}$$
(9.28)

where

$$\begin{aligned} H(a,V) = -(\nabla -ia)^2+V \end{aligned}$$
(9.29)

This suggested to me the inequality

$$\begin{aligned} |e^{-tH(a,V)}\varphi | \le e^{-tH(a=0,V)}|\varphi | \end{aligned}$$
(9.30)

I mentioned this conjecture at a brown bag lunch seminar when I was in Princeton. Ed Nelson remarked that formally, it followed from the Feynman–Kac–Ito formula for semigroups in magnetic fields which says that adding a magnetic field with gauge, \(\overrightarrow{a}\), adds a factor \(\exp (i\int \overrightarrow{a}(\omega (s))\cdot d\omega )\) to the Feynman–Kac formula (the integral is an Ito stochastic integral). (9.30) is immediate from \({|\exp (i\int \overrightarrow{a}(\omega (s))\cdot d\omega )| = 1}\) and the positivity of the rest of the Feynman–Kac integrand. Some have called (9.30) the Nelson–Simon inequality but the name I gave it, namely diamagnetic inequality, has stuck.

The issue with Nelson’s proof is that at the time, the Feynman–Kac–Ito was only known for smooth a’s. One can obtain the Feynman–Kac–Ito for more general a’s by independently proving a suitable core result. Simon [582] and then Kato [348] obtained results for more and more singular a’s until Simon [595] proved

Theorem 9.6

(Simon [595]) (9.30) holds for \(V \ge 0\), \(V \in L^1_{loc}({\mathbb {R}}^\nu )\) and \(\overrightarrow{a} \in L^2_{loc}\).

Indeed, our proof of (9.7) above implies this if we don’t use (9.8) but keep \(e^{-tV}\) (equivalently, if we just use (9.9)).

As with Theorem 9.5, there is an abstract two operator Kato inequality result (originally conjectured in Simon [582]):

Theorem 9.7

(Hess–Schrader–Uhlenbrock [247], Simon [596]) Let A and B be two positive self-adjoint operators on \(L^2(M,d\mu )\) where \((M,\Sigma ,d\mu )\) is a \(\sigma \)-finite, separable measure space. Suppose that \(\varphi \ge 0 \Rightarrow e^{-tA}\varphi \ge 0\). Then the following are equivalent:

  1. (a)

    For all \(\varphi \in L^2\) and all \(t \ge 0\), we have that

    $$\begin{aligned} |e^{-tB}\varphi | \le e^{-tA}|\varphi | \end{aligned}$$
  2. (b)

    \(\psi \in D(B) \Rightarrow |\psi | \in Q(A)\) and for all \(\varphi \in Q(A)\) with \(\varphi \ge 0\) and all \(\psi \in D(B)\) we have that

    $$\begin{aligned} \langle A^{1/2}\varphi ,A^{1/2}|\psi | \rangle \le {{\mathrm{Re}}}\langle \varphi ,\mathrm {sgn(\psi ) B\psi } \rangle \end{aligned}$$
    (9.31)

For a proof, see the original papers or [616, Theorem 7.6.7].

As one might expect, the ideas in Kato [340] have generated an enormous literature. Going back to the original paper are two kinds of extensions: replace \(\Delta \) by \(\sum _{i,j=1}^{\nu } \partial _i a_{ij}(x) \partial _j\) and allowing \(q(x) \rightarrow -\infty \) as \(|x| \rightarrow \infty \) with lower bounds of the Wienholtz–Ikebe–Kato type as discussed in Sect. 8. Some papers on these ideas include Devinatz [117], Eastham et al. [130], Evans [132], Frehse [166], Güneysu–Post [210], Kalf [299], Knowles [379,380,381], Milatovic [445] and Shubin [553]. There is a review of Kato [351]. For applications to higher order elliptic operators, see Davies–Hinz [104], Deng et al. [114] and Zheng–Yao [711]. There are papers on V’s obeying \(V(x) \ge -\nu (\nu -4)|x|^{-2}; \, \nu \ge 5\), some using Kato’s inequality by Kalf–Walter [302], Schmincke [539], Kalf [297], Simon [576], Kalf–Walter [303] and Kalf et. al. [300].

Kato himself applied these ideas to complex valued potentials in three papers [73, 349, 355]. In particular, Brézis–Kato [355] has been used extensively in the nonlinear equation literature as part of a proof of \(L^p\) regularity of eigenfunctions.

There is one final aspect of [340] which should be mentioned. In it, Kato introduced a condition on the negative part of the potential that I dubbed Kato’s class and denoted \(K_\nu \) and which has since been used extensively. Earlier, Schechter [535] had introduced a family of spaces with several parameters which agrees with \(K_\nu \) for one choice of parameters but he didn’t single it out. A function, V on \({\mathbb {R}}^\nu \) is said to lie in \(K_\nu \) if and only if

$$\begin{aligned} \left\{ \begin{array}{ll} \lim _{\alpha \downarrow 0} \left[ \sup _x \int _{|x-y| \le \alpha } |x-y|^{2-\nu } |V(y)|\, d^\nu y\right] =0, &{} \hbox { if } \nu >2\\ \lim _{\alpha \downarrow 0} \left[ \sup _x \int _{|x-y| \le \alpha } \log (|x-y|^{-1}) |V(y)|\, d^\nu y\right] =0, &{} \hbox { if } \nu =2\\ \sup _x \int _{|x-y| \le 1} |V(y)| \, dy < \infty , &{} \hbox { if } \nu =1 \end{array} \right. \end{aligned}$$
(9.32)

\(K_\nu ^{loc}\) is those where we demand (9.32) not for \(\sup _x\) but rather, for each \(x_0\) for \(\sup _{|x-x_0|} \le 1\). Note that the class \(S_\nu \) of Section 7 is an operator analog of this and was motivated by Kato’s definition. There are analogs of Theorem 7.10 and 7.11 for \(K_\nu \), see [101, Section 1.2].

Kato used \(K_\nu \) to discuss local (and global) singularities of the negative part of V. Ironically, \(K_\nu \) is not maximal for such considerations. If \(\nu \ge 3\) and \(V(x) = |x|^{-2} \log (|x|^{-1})^{-\delta }\) (for \(|x| < \tfrac{1}{2})\), then \(V \in K_\nu \iff \delta > 1\) but V is form bounded if and only if \(\delta > 0\). However, Aizenman–Simon [7] have proven the following showing the naturalness of Kato’s class for semigroup considerations:

Theorem 9.8

(Aizenman–Simon [7]) Let \(V \le 0\) have compact support. Then \(V \in K_\nu \) if and only if \(e^{-tH},\, (H=-\Delta +V)\) maps \(L^\infty ({\mathbb {R}}^\nu )\) to itself for all \(t > 0\) and

$$\begin{aligned} \lim _{t \downarrow 0}||e^{-tH}||_{\infty ,\infty } = 1 \end{aligned}$$
(9.33)

For more on this theme, see [7, 600].

10 Self-adjointness, IV: quadratic forms

Hilbert, around 1905, originally discussed operators on inner product spaces in terms of (bounded) quadratic forms, not surprising given Hilbert’s background in number theory. F. Riesz emphasized the operator theory point of view starting in 1913 and von Neumann’s approach to unbounded operators in 1929 also emphasized the operator point of view which has dominated most of the discussion since. In the 1930s and 1940s, there was work in which the quadratic form point of view was implicit but it was only in the 1950s that forms became explicitly discussed objects and Kato was a major player in this development. In this section, we’ll first describe the basic theory and give a Kato–centric history and then discuss two special aspects in which Kato had seminal contributions: first, the theory of monotone convergence for forms and secondly, the theory of pseudo-Friedrichs extensions and its application to the Dirac Coulomb problem, as well as some other work of Kato on the Dirac Coulomb problem.

In his delightful reminisces of Kato, Cordes [97] quotes Kato as saying “there is no decent Banach space, except Hilbert space.” While this ironic given Kato’s development of eigenvalue perturbation theory and semigroup theory in general Banach spaces, it is likely he had in mind the spectral theorem and the subject of this section.

Let \({\mathcal {H}}\) be a (complex, separable) Hilbert space. A quadratic form is a map \(q:{\mathcal {H}}\rightarrow [0,\infty ]\) with \(\infty \) an allowed value that is quadratic and obeys the parallelogram law, i.e.

$$\begin{aligned} q(z\varphi )&= |z|^2 q(\varphi ),\quad \text { all } \varphi \in {\mathcal {H}}, z \in {\mathbb {C}} \end{aligned}$$
(10.1)
$$\begin{aligned} q(\varphi +\psi )+q(\varphi -\psi )&= 2q(\varphi )+2q(\psi ) \end{aligned}$$
(10.2)

where \(a\infty =\infty \) (for \(a>0\)), \(=0\) for \(a=0\) and \(\infty +a=a+\infty =\infty \) for any \(a \in [0,\infty ]\). The form domain of q is

$$\begin{aligned} V_q = \{\varphi \,|\, q(\varphi ) < \infty \} \end{aligned}$$
(10.3)

A sesquilinear form is a pair (VQ) of a subspace \(V \subset {\mathcal {H}}\) (V is not necessarily a closed and/or dense subspace. Typically V is dense in \({\mathcal {H}}\), but as we’ll see in Sect. 18 in Part 2, there are very interesting cases where V is not dense.) and a map \(Q:V \times V \rightarrow {\mathbb {C}}\) obeying

$$\begin{aligned} \forall \psi \in V,\quad&\varphi \mapsto Q(\psi ,\varphi ) \text { is linear} \end{aligned}$$
(10.4)
$$\begin{aligned} \forall \varphi ,\psi \in V, \quad&Q(\psi ,\varphi ) = \overline{Q(\varphi ,\psi )} \end{aligned}$$
(10.5)

which imply that \(\forall \psi \in V, \varphi \in V \mapsto Q(\varphi ,\psi )\) is antilinear. Q is called positive if and only if \(\forall \varphi \in V\) one has that \(Q(\varphi ,\varphi ) \ge 0\).

An elementary fact is:

Theorem 10.1

There is a one-one correspondence between quadratic forms and positive sesquilinear forms given by

  1. (a)

    If (VQ) is a sesquilinear form, define a quadratic form, q, by

    $$\begin{aligned} q(\varphi ) = \left\{ \begin{array}{ll} Q(\varphi ,\varphi ) &{} \hbox { if } \varphi \in V \\ \infty , &{} \hbox { if } \varphi \notin V \end{array} \right. \end{aligned}$$
    (10.6)

    (so \(V_q=V\)).

  2. (b)

    If q is a quadratic form, take \(V = V_q\) and define a map, Q on \(V \times V\) by

    $$\begin{aligned} Q(\varphi ,\psi ) = \tfrac{1}{4}[q(\varphi +\psi )-q(\varphi -\psi )+i q(\varphi -i\psi ) -i q(\varphi +i\psi )] \end{aligned}$$
    (10.7)

If \(q:{\mathcal {H}}\rightarrow (-\infty ,\infty ]\) so that there is an \(\alpha \) so that \(\widetilde{q}(\varphi )=q(\varphi )+\alpha ||\varphi ||^2\) is a (positive) quadratic form, we say that q is a semibounded quadratic form. Theorem 10.1 extends and we speak of semibounded sesquilinear forms (where \(Q(\varphi ,\varphi ) \ge 0\) is replaced by \(Q(\varphi ,\varphi ) \ge -\alpha ||\varphi ||^2\)). For any semibounded sesquilinear form, we define \(\beta =\inf _{\varphi \in V,\varphi \ne 0} Q(\varphi ,\varphi )/||\varphi ||^2\) to be the lower bound of Q.

Given two quadratic forms, \(q_1\) and \(q_2\), we write

$$\begin{aligned} q_1 \le q_2 \iff \forall \varphi \in {\mathcal {H}}, \quad q_1(\varphi ) \le q_2(\varphi ) \end{aligned}$$
(10.8)

If in addition

$$\begin{aligned} q_2(\varphi ) < \infty \Rightarrow q_1(\varphi ) = q_2(\varphi ) \end{aligned}$$
(10.9)

we say that \(q_1\) is an extension of \(q_2\). The name comes from the fact that (10.8)/(10.9) is equivalent to \(V_{q_2} \subset V_{q_1}\) and .

Given a (positive) quadratic form, q, one defines a norm, \(||\cdot ||_{+1}\) on \(V_q\) by

$$\begin{aligned} ||\varphi ||_{+1}^2 = q(\varphi )+||\varphi ||^2 \end{aligned}$$
(10.10)

\(||\cdot ||_{+1}\) is a norm (because of the \(||\varphi ||^2\), we have that \(||\varphi ||_{+1} \ne 0\) if \(\varphi \ne 0\) even if \(q(\varphi )=0\)) which also obeys the parallelogram law so \(||\cdot || _{+1}\) comes from an inner product [612, Theorem 3.1.6]. We say that q is a closed quadratic form if and only if V is complete in \(||\cdot ||_{+1}\) (see Theorem 10.14 below for an important characterization of closed forms). A subspace \(W \subset V\) is called a form core for q if W is dense in V in \(||\cdot || _{+1}\).

We say that a quadratic form, q, is closable if and only if q has a closed extension. One can show that there is then a smallest closed extension, \(\bar{q}\) (in that if t is another closed extension of q, it is also an extension of \(\bar{q}\)).

Example 10.2

Let \({\mathcal {H}}=L^2({\mathbb {R}},dx)\). Define q with \(V_q = C_0^\infty ({\mathbb {R}})\) and for \(\varphi \in V_q\)

$$\begin{aligned} q(\varphi ) = |\varphi (0)|^2 \end{aligned}$$
(10.11)

For obvious reasons, we write \(q=\delta (x)\), the Dirac delta function. One can show [616, Example 7.5.17] that this form is not closable (see also the Remark after Theorem 10.14 below).

Example 10.3

Let \({\mathcal {K}}\subset {\mathcal {H}}\) be a closed subspace, so \({\mathcal {K}}\) is a Hilbert space. Let A be a self-adjoint operator on \({\mathcal {K}}\). We recall that the spectral theorem [616, Chapters 5 and Section 7.2] lets one define f(A) as an operator on \({\mathcal {K}}\) for any real valued measurable function, f, from the spectrum of A to \([0,\infty )\). f(A) is self-adjoint with domain \(\{\varphi \,|\, \int |f(x)|^2 d\mu ^A_\varphi (x) < \infty \}\) where \(d\mu ^A_\varphi \) is the spectral measure, defined, for example by \(\langle \varphi ,(A-z)^{-1}\varphi \rangle = \int (x-z)^{-1} d\mu ^A_\varphi (x)\) for all \(z \in {\mathbb {C}}{\setminus }{\mathbb {R}}\). In particular, if A is a positive self-adjoint operator on \({\mathcal {K}}\), we can define a positive, self-adjoint operator, \(A^{1/2}\) on \({\mathcal {K}}\). We define the quadratic form \(q_A\) on \({\mathcal {H}}\) by

$$\begin{aligned} q_A(\varphi ) = \left\{ \begin{array}{ll} ||A^{1/2}\varphi ||^2, &{} \hbox { if } \varphi \in {\mathcal {K}}\text { and } \varphi \in D(A^{1/2}) \\ \infty , &{} \hbox { otherwise} \end{array} \right. \end{aligned}$$
(10.12)

This definition is basic even when \({\mathcal {K}}={\mathcal {H}}\). It is not hard to prove that this quadratic form is closed. We call \(V_q\) the form domain of A and denote it by Q(A).

Example 10.4

Given A as in the last example and \(g:\sigma (A) \rightarrow [0,\infty )\) which is continuous and bounded and obeys \(\lim _{t \rightarrow \infty } g(t)=0\), we define g(A) on \({\mathcal {H}}\) by setting it to the spectral theorem g(A) on \({\mathcal {K}}\) and to 0 on \({\mathcal {K}}^\perp \). If \(A=0\) on \({\mathcal {K}}\) (and in some sense \(\infty \) on \({\mathcal {K}}^\perp \)), then for any \(t>0\), we have that \(e^{-tA}\) is the orthogonal projection onto \({\mathcal {K}}\).

What makes quadratic forms so powerful is that, in a sense, Example 10.3 has a converse. Here are two versions of this result:

Theorem 10.5

Let q be a closed quadratic form. Let \({\mathcal {K}}=\overline{V_q}\). Then there is a unique positive self-adjoint operator, A, on \({\mathcal {K}}\) so that \(q=q_A\).

Remark

The closure in \(\overline{V_q}\) means closure in the Hilbert space topology (which in many cases is the entire Hilbert space).

Theorem 10.6

Let q be a closed quadratic form with \(V_q\) dense in \({\mathcal {H}}\). Then, there is a unique self-adjoint operator, A, on \({\mathcal {H}}\) so that:

  1. (a)

    \(D(A) \subset V_q\)

  2. (b)

    If \(\varphi \in D(A), \psi \in V_q\), then

    $$\begin{aligned} Q_q(\psi ,\varphi ) = \langle \psi ,A\varphi \rangle \end{aligned}$$
    (10.13)

    Moreover, D(A) is a form core for A.

Remarks

  1. 1.

    In his book [345], Kato calls Theorem 10.6 the first representation theorem and Theorem 10.5 the second representation theorem. He puts Theorem 10.6 first because it is the version going back to the 1930s (see below). I put Theorem 10.5 first because I think that it is the fundamental result—indeed, it is the only variant in Reed–Simon [494] and Simon [616].

  2. 2.

    For proofs, see Kato [345], Reed–Simon [494, Theorem VIII.15] or Simon [616, Theorem 7.5.5].

Example 10.7

Let B be a densely defined symmetric operator on \({\mathcal {H}}\) with \(\langle \varphi ,B\varphi \rangle \ge 0\) for all \(\varphi \in D(B)\). B might not be self-adjoint. Define a quadratic form, \(\widetilde{q_B}\), (which differs from \(q_B\) if B is self-adjoint!) by

$$\begin{aligned} \widetilde{q_B}(\varphi ) = \left\{ \begin{array}{ll} \langle \varphi ,B\varphi \rangle , &{} \hbox { if } \varphi \in D(B) \\ \infty , &{} \hbox { if } \varphi \notin D(B) \end{array} \right. \end{aligned}$$
(10.14)

If B is not bounded, one can show that \(\widetilde{q_B}\) is never closed but one can prove [616, Theorem 7.5.19] that it is always closable. If \(q^\#\) is its closure, there is a self-adjoint A with \(q^\#=q_A\). One can show (it is immediate from Theorem 10.6) that A is an operator extension of B so B has a natural self-adjoint extension. It is called the Friedrichs extension, \(B_F\). Unless B is esa, there are lots of other self-adjoint extensions as we’ll see. It can happen (but usually doesn’t) that B is not esa but has a unique positive self-adjoint extension.

There is a form analog of the Kato–Rellich theorem:

Theorem 10.8

(KLMN theorem) Let q be a closed quadratic form. Let \((V_R,R)\) be a (not necessarily positive or even bounded from below) sesquilinear form with \(V_q \subset V_R\) so that for some \(a \in (0,1)\) and \(b > 0\) and all \(\varphi \in V_q\), we have that

$$\begin{aligned} |R(\varphi ,\varphi )| \le aq(\varphi )+b||\varphi ||^2 \end{aligned}$$
(10.15)

Define a quadratic form, s, with \(V_s=V_q\) so that for \(\varphi \in V_q\), we have that

$$\begin{aligned} s(\varphi ) = q(\varphi ) + R(\varphi ,\varphi )+b||\varphi ||^2 \end{aligned}$$
(10.16)

Then s is a positive, closed quadratic form.

Remarks

  1. 1.

    The name comes from Kato [323], Lax–Milgram [419], Lions [431] and Nelson [459].

  2. 2.

    If formally \(q(\varphi )=\langle \varphi ,A\varphi \rangle , R(\psi ,\varphi )=\langle \psi ,C\varphi \rangle \), then since s is closed, we have that \(s=q_D\). Then \(D-b{\varvec{1}}\) gives a self-adjoint meaning to the formal sum \(A+C\). It is called the form sum.

  3. 3.

    The proof is really simple. If \(||\cdot || _{+1,q}\) and \(||\cdot || _{+1,s}\) are the \(||\cdot || _{+1}\) for q and s, then (10.15) implies that the two norms are equivalent so one is complete if and only if the other one is.

Example 10.9

Let q be the quadratic form, \(q_A\), for \(A=-\tfrac{d^2}{dx^2}\) on \(L^2({\mathbb {R}},dx)\). The same argument that we used to prove (7.12) shows that any \(\varphi \in V_q\) is a continuous function and for some C and all \(\epsilon > 0\) and all \(\varphi \in V_q\):

$$\begin{aligned} |\varphi (0)|^2 \le C\left[ \epsilon q(\varphi )+\epsilon ^{-1}||\varphi ||^2\right] \end{aligned}$$
(10.17)

Thus, by the KLMN theorem, we can define \(A=-\tfrac{d^2}{dx^2}+\lambda \delta (x)\) for any \(\lambda \in {\mathbb {R}}\) as the quadratic form \(q_\lambda \) with \(V_{q_\lambda }=V_q\) and, for all \(\varphi \in V_q\):

$$\begin{aligned} q_\lambda (\varphi ) = q(\varphi )+\lambda |\varphi (0)|^2 \end{aligned}$$
(10.18)

The following is elementary to prove but useful

Theorem 10.10

The sum of two closed quadratic forms is closed

Remarks

  1. 1.

    This allows a definition of a self-adjoint sum of any two positive self-adjoint operators.

  2. 2.

    It is obvious that \(V_{q_1+q_2} = V_{q_1} \cap V_{q_2}\).

  3. 3.

    There is a similar result for n arbitrary closed forms.

  4. 4.

    The simplest proof is to use the Davies–Kato characterization (below) that closedness is equivalent to lower semicontinuity.

We end our discussion of the general theory by noting some distinctions between forms and symmetric operators.

\(\textcircled {1}\).:

There are closed symmetric operators which are not self-adjoint but every closed quadratic form is the form of a self-adjoint operator.

\(\textcircled {2}\).:

Every symmetric operator has a smallest closed extension but there exist quadratic forms with no closed extensions.

\(\textcircled {3}\).:

If A and B are self-adjoint operators and B is an extension of A (i.e. \(D(A) \subset D(B)\) and ), then \(A=B\). But there exist closed quadratic forms \(q_1\) and \(q_2\) where \(q_2\) is an extension of \(q_1\) but \(q_1 \ne q_2\). For example, let \({\mathcal {H}}= L^2([0,1],dx)\) and \(q_0\) given by

$$\begin{aligned} q_0(\varphi ) = \left\{ \begin{array}{ll} \int _{0}^{1} |\varphi '(x)|^2\, dx, &{} \hbox { if } \varphi \in C^\infty ([0,1]) \\ \infty , &{} \hbox { otherwise} \end{array} \right. \end{aligned}$$

Here \(C^\infty ([0,1])\) means the functions infinitely differentiable on [0, 1] with one sided derivatives at the end points. Let \(q_1\) be the closure of the restriction of \(q_0\) to \(C_0^\infty (0,1)\) and \(q_2\) the closure of \(q_0\). Then \(q_1\) is the quadratic form of \(-\tfrac{d^2}{dx^2}\) with Dirichlet boundary conditions and \(q_2\) the quadratic form of \(-\tfrac{d^2}{dx^2}\) with Neumann boundary conditions (see [616, Examples 7.5.25 and 7.5.26]) and \(q_2\) is an extension of \(q_1\).

Having completed our discussion of the general theory, we turn to a brief indication of its history. In his original paper on self-adjoint operators [663], von Neumann noted that if A was a closed symmetric operator with

$$\begin{aligned} \langle \varphi ,A\varphi \rangle \ge \epsilon ||\varphi ||^2 \end{aligned}$$
(10.19)

for some \(\epsilon > 0\) and all \(\varphi \in D(A)\), is a self-adjoint extension \(A_{KvN}\) of A. By looking at \((A-\epsilon _1{\varvec{1}})_{KvN}+\epsilon _1{\varvec{1}}\) for any \(\epsilon _1 < \epsilon \), we get self-adjoint extensions, \(B_{\epsilon _1} \ge \epsilon _1{\varvec{1}}\). von Neumann conjectured there were self-adjoint extensions with lower bound exactly \(\epsilon \). Many years later, Krein [390] (see also Ando–Nishio [14]) proved that \(\lim _{\epsilon _1\uparrow \epsilon }B_{\epsilon _1}\) exists (this follows from the monotone convergence theorem below). Put differently, given \(A \ge 0\) symmetric, there is the Krein–von Neumann extension \(A_{KvN} \equiv \lim _{\epsilon _2\downarrow 0}\left[ (A+\epsilon _2{\varvec{1}})_{KvN}-\epsilon _2{\varvec{1}}\right] \) which is a positive self-adjoint extension. (The full theory of positive self-adjoint extensions [616, Theorem 7.5.20] shows the set of such extensions is all positive self-adjoint operators, B with \(A_{KvN} \le B \le A_F\).)

Friedrichs [168, 169] (long before Krein) provided the first proof of von Neumann’s conjecture (Stone [630] had a proof at about the same time) by a construction related to the method behind Theorem 10.6. A follow-up paper of Freudenthal [167] did Friedrichs extension in something close to form language. In the 1950s, work on parabolic PDEs and NRQM by Kato [323], Lax–Milgram [419], Lions [431] and Nelson [459] led to a systematic general theory. In particular, Kato’s lecture notes [323] had considerable impact.

Next, we turn to a discussion of monotone convergence of quadratic forms. Given a closed form, q, with \({\mathcal {K}}\) the closure of \(V_q\), define for \(z \in {\mathbb {C}}{\setminus }{\mathbb {R}}\)

$$\begin{aligned} (\tilde{A}-z)^{-1} \equiv (A-z)^{-1}P_{\mathcal {K}}\end{aligned}$$
(10.20)

i.e. under \({\mathcal {H}}={\mathcal {K}}\oplus {\mathcal {K}}^\perp \), \((\tilde{A}-z)^{-1} = (A-z)^{-1}\oplus 0\), consistent with how we said to define f(A).

We will need the following result of Simon [586] (see also [616, Theorem 7.5.15])

Theorem 10.11

Any quadratic form q has an associated closed quadratic form, \(q_r\), which is the largest closed form less than q, i.e. \(q_r \le q\) and if t is closed with \(t \le q\), then \(t \le q_r\).

Remarks

  1. 1.

    One defines \(q_s=q-q_r\). More precisely, \(V_{q_s} = V_q\) and for \(\varphi \in V_q\) we have that \(q_s(\varphi )=q(\varphi )-q_r(\varphi )\). “r” is for regular and “s” for singular.

  2. 2.

    Let \(\mu \) and \(\nu \) be two probability measures on a compact space, X, and \(d\nu = fd\mu +d\nu _s\) with \(d\nu _s\) singular wrt \(d\mu \) the Lebesgue decomposition (see [612, Theorem 4.7.3]). If \({\mathcal {H}}=L^2(X,d\mu )\) and if \(q_\nu \) is defined with \(V_{q_\nu }=C(X)\) and for \(\varphi \in C(X)\)

    $$\begin{aligned} q_\nu (\varphi ) = \int |\varphi (x)|^2 d\nu (x) \end{aligned}$$
    (10.21)

    then [616, Problem 7.5.7] \((q_\nu )_r\) is the closure of the form (on C(X))

    $$\begin{aligned} \varphi \mapsto \int f(x)|\varphi (x)|^2 d\mu \end{aligned}$$
    (10.22)

    whose associated operator is multiplication by f(x) (on the operator domain of those \(\varphi \) with \(\int f(x)^2|\varphi (x)|^2 d\mu < \infty \)). \(V_{q_s} = C(X)\). For \(\varphi \in C(X)\), \(q_s\) is given by (10.21) with \(d\nu \) replaced by \(d\nu _s\). In particular, if q is the form of (10.11), then \(q_r=0\).

The two monotone convergence theorems for (positive) quadratic forms are

Theorem 10.12

Let \(\{q_n\}_{n=1}^\infty \) be an increasing family of positive closed quadratic forms. Define

$$\begin{aligned} q_\infty (\varphi ) = \lim _{n \rightarrow \infty } q_n(\varphi ) = \sup _n q_n(\varphi ) \end{aligned}$$
(10.23)

Then \(q_\infty \) is a closed form. If \({\mathcal {K}}_n\) (resp. \({\mathcal {K}}_\infty \)) is the closure of \(V_{q_n}\) (resp. \(V_{q_n}\)) and \(A_n\) (resp. \(A_\infty \)) the associated self-adjoint operators on \({\mathcal {K}}_n\) (resp. \({\mathcal {K}}_\infty \)), then for any \(z \in {\mathbb {C}}{\setminus }{\mathbb {R}}\), we have that

$$\begin{aligned} (\widetilde{A_n}-z)^{-1} \overset{s}{\rightarrow }(\widetilde{A_\infty }-z)^{-1} \end{aligned}$$
(10.24)

where \(\tilde{B}\) is given by (10.20).

Theorem 10.13

Let \(\{q_n\}_{n=1}^\infty \) be a decreasing family of positive closed quadratic forms. Define

$$\begin{aligned} q_\infty (\varphi ) = \lim _{n \rightarrow \infty } q_n(\varphi ) = \inf _n q_n(\varphi ) \end{aligned}$$
(10.25)

Let \(A_\infty \) be the self-adjoint operator on \({\mathcal {K}}_\infty \), the closure of \(V_{(q_\infty )_r}\) associated to \((q_\infty )_r\). Let \(A_n\) be as in the last theorem. Then (10.24) holds.

Remarks

  1. 1.

    For proofs, see [616, Theorem 7.5.18].

  2. 2.

    Let \(q_n\) be the form of \(-\tfrac{1}{n}\tfrac{d^2}{dx^2}+\delta (x)\) as defined in Example 10.9. Then \(q_n\) is decreasing and \(q_\infty \) is the form \(\delta (x)\) so that \((q_\infty )_r=0\). This shows that in the decreasing case, the limit need not be closed or even closable.

Theorems of this genre appeared first in Kato’s book [345] (already in the first edition). He only considered cases where all \(V_{q_n}\) are dense. In the increasing case, he assumed there was a \(\tilde{q}\) with \(V_{\tilde{q}}\) dense so that for all n, one has that \(q_n \le \tilde{q}\). In both cases, he proved there was a self-adjoint operator, \(A_\infty \), with \(A_n\) converging to \(A_\infty \) in srs. He considered the form \(q_\infty (\varphi ) = \lim _{n} q_n(\varphi )\). In the decreasing case, he proved that if \(q_\infty \) is closable, its closure is the form of \(A_\infty \). In the increasing case, he said it was an open question whether \(q_\infty \) was the form of \(A_\infty \). This material from the 1966 first edition was unchanged from the 1976 second edition.

In 1971, Robinson [518] proved Theorem 10.12. He noted that \(q_\infty \) was closed by writing \(q_n = \sum _{j=1}^{n}s_j\) where \(s_1=q_1, s_j=q_j-q_{j-1}\) if \(j \ge 2\). Then \(q_\infty =\sum _{j=1}^{\infty } s_j\) and he says that the proof that \(q_\infty \) is closed is the same as the proof that an infinite direct sum of Hilbert spaces is complete; see Bratteli–Robinson [72, Lemma 5.2.13] for a detailed exposition of the proof. In 1975, Davies [102] also proved this theorem. His proof relied on lower semicontinuity being equivalent to q being closed (see below). Robinson seems to have been aware of the results in Kato’s book. While Davies quotes Kato’s book for background on quadratic forms, he may have been unaware of the monotone convergence results which are in a later chapter (Chapter VIII) than the basic material on forms (Chapter VI). When Kato published his second edition, he was clearly unaware of their work.

The lower semicontinuity fits in nicely with even then well known work on variational problems that used the weak lower semicontinuity of Banach space norms so it was not surprising. Indeed Davies mentions it in passing in his paper without proof. To add to the historical confusion, in his 1980 book [103], when Davies quoted this result, he seems to have forgotten that it appeared first explicitly in his paper and attributes it to the 1966 first edition of Kato [345] where it doesn’t appear!

Shortly after this second edition, I wrote and published [586] which had the notion of \((q)_r\) and the full versions of Theorems 10.12 and 10.13. I noted that these extended and complemented what was in Kato’s book. At the time I wrote the preprint, I was unaware of the relevant work of Davies and Robinson although I knew each of them personally. In response to my preprint, Kato wrote to me that he had an alternate proof that in the increasing case, \(q_\infty \) was always closed. He stated a lovely result.

Theorem 10.14

A quadratic form is closed if and only if it is lower semicontinuous as a function from \({\mathcal {H}}\) to \([0,\infty ]\).

Remarks

  1. 1.

    For a proof, see [616, Theorem 7.5.2]

  2. 2.

    This theorem provides a quick proof that \(\delta (x)\) is not closable. It is easy to find a \(C_0^\infty ({\mathbb {R}})\) function \(\varphi \) with \(\varphi (0)=1\) and a sequence \(\varphi _n \in C_0^\infty \) with \(\varphi _n(0)=0,\,\varphi _n \le \varphi \) and \(\varphi _n \rightarrow \varphi \) in \(L^2\). Given this convergent sequence with \(\lim \delta (\varphi _n) =0 < \delta (\varphi ) = 1\), there cannot be a lower semicontinuous function that agrees with \(\delta \) on \(C_0^\infty \).

Given the theorem, it is immediate that \(q_\infty \) is closed in the increasing case, since an increasing limit of lower semicontinuous functions is lower semicontinuous. I note that in precisely this context, Theorem 10.14 was also found by Davies [102]. Kato told me that he had no plans to publish his remark and approved my writing [587] that explores consequences of Theorem 10.14. However, in 1980, Springer published an “enlarged and corrected” printing of the second edition of Kato’s book and one of the few changes was a completely reworked discussion of monotone convergence theorems! In particular, he had the full Theorem 10.12 using Theorem 10.14. In the Supplemental Notes, he quotes [586] and [587] but neither of the papers of Davies and Robinson, despite the fact that in response to their writing to me after the preprint, I added a Note Added in Proof to [586] referencing their work.

The final topic of this section concerns pseudo-Friedrichs extensions and form definitions of the Dirac Coulomb operator. Recall that in Sect. 7 we discussed the free Dirac operator \(T_0=\alpha \cdot (-i\nabla )+m\beta \) and the formal sum, (7.41):

$$\begin{aligned} T=T_0+\frac{\mu }{|x|} \end{aligned}$$
(10.26)

As we saw in Sect. 7, Kato proved that (10.26) is esa-3 (where for the rest of the section, this means on \(C_0^\infty ({\mathbb {R}}^3;{\mathbb {C}}^4)\)) so long as \(|\mu | < \tfrac{1}{2}\). Moreover, one can prove esa-3 if and only if \(|\mu | \le \tfrac{1}{2}\sqrt{3}\). In his book, [345, Sections V.5 and VII.3], Kato attempted to show that the T of (10.26) had a natural self-adjoint extension for suitable \(\mu \in (\tfrac{1}{2},1)\). He found an extension of the KLMN theorem to cover cases where the unperturbed operator is not semibounded. He proved the following result:

Theorem 10.15

Let A be a self-adjoint operator and B a symmetric operator with \(D(B) \subset D(A)\) and so that D(B) is a core for \(|A|^{1/2}\). Suppose that for some \(a \in (0,1)\) and \(b \ge 0\) and all \(\varphi \in D(B)\) we have that

$$\begin{aligned} |\langle \varphi ,B\varphi \rangle | \le a \langle \varphi ,|A|\varphi \rangle +b||\varphi ||^2 \end{aligned}$$
(10.27)

Then there is a unique self-adjoint operator, C, extending \(A+B\) on D(B) which also obeys

$$\begin{aligned} D(C) \subset D(|A|^{1/2}) \end{aligned}$$
(10.28)

Kato called C the pseudo-Friedrichs extension. Kato remarked that this had little to do with quadratic forms (which for him were positive) but the constructions shared elements of Friedrichs’ construction of his extension. Faris [147] has a presentation that uses sesquilinear forms and makes this closer to the KLMN theorem.

In applying this to Dirac operators, Kato [345] states without proof, that for each \(\varphi \in C_0^\infty ({\mathbb {R}}^3)\), one has:

$$\begin{aligned} \langle \varphi ,|x|^{-1}\varphi \rangle \le \tfrac{\pi }{2} \langle \varphi ,|p|\varphi \rangle \end{aligned}$$
(10.29)

in the sense that

$$\begin{aligned} \int \frac{|\varphi (x)|^2}{x} d^3x \le \frac{\pi }{2} \int |k| |\hat{\varphi }(k)|^2 d^3k \end{aligned}$$
(10.30)

Like Hardy’s and Rellich’s inequality, this is scale invariant. And Kato implies (but doesn’t explicitly state) that \(\tfrac{\pi }{2}\) is the optimal constant. This is often called Kato’s inequality (of course, it has no connection to what we called Kato’s inequality in Sect. 9). In his book, Kato states this inequality with its optimal constant and then says that it is equivalent to \(|p|^{-1/2}|x|^{-1}|p|^{-1/2}\) as an operator on \(L^2\) having norm \(\tfrac{\pi }{2}\). He then notes that since \(|x|^{-1}\) has a Fourier space kernel \((2\pi ^2)^{-1}|k-k'|^{-2}\), one has to compute the norm of the integral operator with kernel \((2\pi ^2)^{-1}(|k|\,|k'|)^{-1/2}|k-k'|^{-2}\) but he doesn’t tell the reader how to actually compute this norm. However, Kato’s proof can be found in the appendix at the end of this paper.

So while the book is given as the source for the inequality, the standard place given for the proof is a lovely paper of Herbst [239] who computes the norm of \(|x|^{-\alpha }|p|^{-\alpha }\) as an operator on \(L^p({\mathbb {R}}^\nu )\) when \(1< p < \nu \alpha ^{-1}\) (that the operator is bounded on \(L^p\) is a theorem of Stein–Weiss [624]). This has as special cases the optimal constants for Kato’s, Hardy’s and Rellich’s inequalities. Herbst notes that this operator commutes with scaling, so after applying the Mellin transform, it commutes with translations and so, it is a convolution operator in Mellin transform space. The function it is convolution with is positive function so the norm is related to the computable integral of this explicit function. Five later publications on the optimal constant are Beckner [45], Yafaev [702], Frank–Lieb–Seiringer [162], Frank–Seiringer [164] and Balinsky–Evans [38, pgs 48–50].

In his book, Kato [345] noted that by combining his definition of the pseudo-Friedrichs extension and his inequality, one can define a natural self-adjoint extension of (10.26) for \(\tfrac{1}{2} \le \mu < \tfrac{2}{\pi }\). But note that \(\tfrac{2}{\pi } = 0.6366\ldots \) while \(\tfrac{1}{2}\sqrt{3}=0.866\ldots \) so

$$\begin{aligned} \frac{2}{\pi } < \frac{\sqrt{3}}{2} \end{aligned}$$
(10.31)

and the regime that Kato was able to treat in his book was a subset of the region where Kato–Rellich fails but one can still prove esa-3 by other means!

That said, Kato’s ideas stimulated later work which picked out a natural extension for all \(\mu \) with \(|\mu | < 1\). Among the papers on the subject are Schmincke [541], Wüst [691,692,693], Nenciu [463], Kalf et. al. [300], Estaban–Loss [144] and Estaban–Lewin–Séré [143]. Domain conditions motivated by Kato’s pseudo-Friedrichs extension are common. Typical is the following result of Nenciu [463] (which is a variant of Schmincke [541]):

Theorem 10.16

For any \(\mu \) with \(|\mu |<1\), there exists a unique self-adjoint operator, T, with \(D(T) \subset D(|T_0|^{1/2})\) so that for all \(\varphi \in D(T), \psi \in D(T_0^{1/2})\) we have that

$$\begin{aligned} \langle \psi ,T\varphi \rangle = \langle |T_0|^{1/2}\psi ,(T_0|T_0|^{-1/2})\varphi \rangle + \mu \langle r^{-1/2}\psi ,r^{-1/2}\varphi \rangle \end{aligned}$$
(10.32)

(10.32) uses the fact that, by the above mentioned inequality of Kato, if \(\psi \in D(|T_0|^{1/2})\), then \(\psi \in D(r^{-1/2})\).

In 1983, Kato wrote a further paper on the Dirac Coulomb problem [353] (see also [354]) which seems to be little known (I only learned of it while preparing this article). To understand Kato’s idea, return to \(-\Delta -\beta r^{-2}\) on \(L^2({\mathbb {R}}^\nu ), \nu \ge 5\) as discussed in Proposition 7.7 above. If \(0 < \beta \le \tfrac{\nu (\nu -4)}{4}\), then \(H(\beta )\) can be defined as the operator closure of the operator on \(C_0^\infty ({\mathbb {R}}^\nu )\). It is self-adjoint and except at the upper end, we know the domain is that of \(-\Delta \). For \(\tfrac{\nu (\nu -4)}{4} < \beta \le \tfrac{(\nu -2)^2}{4}\), there is a Friedrichs extension since \(-\Delta -\beta r^{-2} \ge 0\) on \(C_0^\infty ({\mathbb {R}}^\nu )\). Kato notes that the Friedrichs extension is natural from the following point of view: \(H(\beta )\) is an analytic family of operators for \(0< \beta < \tfrac{(\nu -2)^2}{4}\) and is the unique analytic family from the esa region—it is type (A) if \(\beta \in (0,\tfrac{\nu (\nu -4)}{4})\) and type (B) if \(\beta \in (0,\tfrac{(\nu -2)^2}{4})\). (In fact, it can proven that as a holomorphic family, there is a square root singularity at \(\beta = \tfrac{(\nu -2)^2}{4}\) and in the variable \(m = \sqrt{\beta -\tfrac{(\nu -2)^2}{4}}\), one has a holomorphic family in \(\text{ Re }(m) > -1\); see Bruneau–Dereziński–Georgescu [76]).

In the same way, Kato showed that the distinguished self-adjoint extension of the Dirac operator in (10.26) found by others for \(|\mu | < 1\) is an analytic family for \(\mu \in (-1,1)\) and is the unique analytic continuation from the Kato–Rellich region \(\mu \in (-\tfrac{1}{2},\tfrac{1}{2})\).