1 Introduction

An atomistic potential energy landscape (PEL) is a mapping assigning energies \(E(\varvec{r})\), or local energy contributions, to atomic structures \(\varvec{r} = \{\varvec{r}_\ell \}_{\ell \in \Lambda } \in ({\mathbb {R}}^d)^\Lambda \), where \(\Lambda \) is a general (possibly infinite) index set. High-fidelity models are provided by the Born–Oppenheimer PEL associated with ab initio electronic structure models such as tight-binding, Kohn–Sham density functional theory (DFT), Hartree–Fock, or even lower level quantum chemistry models [38, 48, 54, 58, 73, 94]. Even now, however, the high computational cost of electronic structure models severely limits their applicability in material modelling to thousands of atoms for static and hundreds of atoms for long-time dynamic simulations.

There is a long and successful history of using surrogate models for the simulation of materials, devised to remain computationally tractable but capture as much detail of the reference ab initio PEL as possible. Empirical interatomic potentials are purely phenomenological and are able to capture a minimal subset of desired properties of the PEL, severely limiting their transferability [23, 86]. The rapid growth in computational resources, increased both the desire and the possibility to match as much of an ab inito PEL as possible. A continuous increase in the complexity of parameterisations since the 1990s [6, 7, 36] has over time naturally led to a new generation of “machine-learned interatomic potentials” employing universal approximators instead of empirical mechanistic models. Early examples include symmetric polynomials [11, 80], artificial neural networks [8] and kernel methods [5]. A striking case is the Gaussian approximation potential for Silicon [4], capturing the vast majority of the PEL of Silicon of interest for material applications.

The purpose of the present work is, first, to rigorously evaluate some of the implicit or explicit assumptions underlying this latest class of interatomic potential models, as well as more general models for atomic properties. Specifically, we will identify natural modelling parameters as approximation parameters and rigorously establish convergence. Secondly, our results indicate that nonlinearities are an important feature, highlighting some superior theoretical properties. Finally, unlike existing nonlinear models, we will identify explicit low-dimensional nonlinear parameterisations yet prove that they are systematic. In addition to justifying and supporting the development of new models for general atomic properties, our results establish generic properties of ab initio models that have broader consequences, e.g. for the study of the mechanical properties of atomistic materials [15, 17, 32, 93]. The application of our results to the construction and analysis of practical parameterisations (approximation schemes) that exploit our results will be pursued elsewhere.

Our overarching principle is to search for representations of properties of ab initio models in terms of simple components, where “simple” is of course highly context-specific. To illustrate this point, let us focus on modelling the potential energy landscape (PEL), which motivated this work in the first place. Pragmatically, we require that these simple components are easier to analyse and manipulate analytically or to fit than the PEL. For many materials (at least as long as Coulomb interaction does not play a role), the first step is to decompose the PEL into site energy contributions,

$$\begin{aligned} E({\varvec{r}}) = \sum _{\ell \in \Lambda } E_\ell ({\varvec{r}}), \end{aligned}$$
(1.1)

where one assumes that each \(E_\ell \) is local, i.e., it depends only weakly on atoms far away. In previous works we have made this rigorous for the case of tight-binding models of varying complexity [14, 16, 17, 93]. In practise, one may therefore truncate the interaction by admitting only those atoms \({\varvec{r}}_k\) with \(r_{\ell k} :=|\varvec{r}_k - \varvec{r}_\ell | < r_\mathrm{cut}\) as arguments. Typical cutoff radii range from 5Å to 8Å, which means that on the order 30 to 100 atoms still make important contributions. Thus the site energy \(E_\ell \) is still an extremely high-dimensional object and short of identifying low-dimensional features it would be practically impossible to numerically approximate it, due to the curse of dimensionality.

A classical example that illustrates our search for such low-dimensional features is the embedded atom model (EAM) [23], which assigns to each atom \(\ell \in \Lambda \) a site energy

$$\begin{aligned} E^\mathrm{eam}_\ell ({\varvec{r}}) = \sum _{k \ne \ell } \phi (r_{\ell k}) + F\big ( {\textstyle \sum _{k \ne \ell } \rho (r_{\ell k})} \big ). \end{aligned}$$

While the site energy \(E^\mathrm{eam}_\ell \) remains high-dimensional, the representation is in terms of three one-dimensional functions \(\phi , \rho , F\) which are easily represented for example in terms of splines with relatively few parameters. Such a low-dimensional representation significantly simplifies parameter estimation, and vastly improves generalisation of the model outside a training set. Unfortunately, the EAM model and its immediately generalisations [6] have limited ability to capture a complex ab initio PEL. Still, this example inspires our search for representations of the PEL involving parameters that are

  • low-dimensional,

  • short-ranged.

Following our work on locality of interaction [14, 16, 17, 93] we will focus on a class of tight-binding models as the ab initio reference model. These can be seen either as discrete approximations to density functional theory [38] or alternatively as electronic structure toy models sharing many similarities with the more complex Kohn–Sham DFT and Hartree–Fock models.

To control the dimensionality of representations, a natural idea is to to consider a body-order expansion,

$$\begin{aligned} E_\ell (\varvec{r})&\approx V_0 + \sum _{k\not =\ell } V_1(\varvec{r}_{\ell k}) + \sum _{ \genfrac{}{}{0.0pt}{}{k_1,k_2\not =\ell }{k_1< k_2} } V_2(\varvec{r}_{\ell k_1}, \varvec{r}_{\ell k_2}) + \cdots \nonumber \\&\quad + \sum _{ \genfrac{}{}{0.0pt}{}{k_1,\dots , k_N\not =\ell }{k_1< \cdots < k_N} } V_N\big ( \varvec{r}_{\ell k_1}, \dots , \varvec{r}_{\ell k_N} ), \end{aligned}$$
(1.2)

where \(\varvec{r}_{\ell k} :=\varvec{r}_k - \varvec{r}_\ell \) and we say that \(V_n(\varvec{r}_{\ell k_1}, \dots , \varvec{r}_{\ell k_n})\) is an \((n+1)\)-body potential modelling the interaction of a centre atom \(\ell \) and n neighbouring atoms \(\{k_1,\dots ,k_n\}\). This expansion was traditionally truncated at body-order three (\(N = 2\)) due to the exponential increase in computational cost with N. However, it was recently demonstrated by Shapeev’s moment tensor potentials (MTPs) [80] and Drautz’ atomic cluster expansion (ACE) [25] that a careful reformulation leads to models with at most linear N-dependence. Indeed, algorithms proposed in [2, 80] suggest that the computational cost may even be N-independent, but this has not been proven. Even more striking is the fact that the MTP and ACE models which are both linear models based on a body-ordered approximation, currently appear to outperform the most advanced nonlinear models in regression and generalisation tests [66, 106].

These recent successes are in stark contrast with the “folklore” that body-order expansions generally converge slowly, if at all [10, 25, 27, 46, 86]. The fallacy in those observations is typically that they implicitly assume a vacuum cluster expansion (cf. § 2.2). Indeed, our first set of main results in § 2.4 will be to demonstrate that a rapidly convergent body-order approximation can be constructed if one accounts for the chemical environment of the material. We will precisely characterise the convergence of such an approximation as \(N \rightarrow \infty \), in terms of the Fermi-temperature and the band-gap of the material.

In the simplest scheme we consider, we achieve this by considering atomic properties \([O(\mathcal {H})]_{\ell \ell }\), where \(\mathcal {H}\) is a tight-binding Hamiltonian and O an analytic function. Approximating O by a polynomial on the spectrum \(\sigma (\mathcal {H})\) results in an approximation of the atomic property \([p(\mathcal {H})]_{\ell \ell }\), which is naturally “body-ordered”. To obtain quasi-optimal approximation results, naive polynomial approximation schemes (e.g. Chebyshev) are suitable only in the simplest scenarios. For the insulating case we leverage potential theory techniques which in particular yield quasi-optimal approximation rates on unions of disconnected domains. Our main results are obtained by converting these into approximation results on atomic properties, analysing their qualitative features, and taking care to obtain sharp estimates in the zero-Fermi-temperature limit.

These initial results provide strong evidence for the accuracy of a linear body-order approximation in relatively simple scenarios, and would for example be useful in a study of the mechanical response of single crystals with a limited selection of possible defects. However, they come with limitations that we discuss in the main text. In response, we consider a much more general framework, generalizing the theory of bond order potentials [55], that incorporates our linear body-ordered model as well as a range of nonlinear models. We will highlight a specific nonlinear construction with significantly improved theoretical properties over the linear scheme.

For both the linear and nonlinear body-ordered approximation schemes we prove that they inherit regularity, symmetries and locality of the original quantity of interest.

Finally, we consider the case of self-consistent tight-binding models such as DFTB [33, 59, 78]. In this case the highly nonlinear charge-equilibration leads in principle to arbitrarily complex intermixing of the nuclei information, and thus arbitrarily high body-order. However, our results on the body-ordered approximations for linear tight-binding models mean that each iteration of the self-consistent field (SCF) iteration can be expressed in terms of a low body-ordered and local interaction scheme. This leads us to propose a self-similar compositional representation of atomic properties that is highly reminiscent of recurrant neural network architectures. Each “layer” of this representation remains “simple” in the sense that we specified above.

2 Results

2.1 Preliminaries

2.1.1 Tight binding model

We suppose \(\Lambda \) is a finite or countable index set. For \(\ell \in \Lambda \), we denote the state of atom \(\ell \) by \({\varvec{u}}_\ell = (\varvec{r}_\ell , v_\ell , Z_\ell )\) where \(\varvec{r}_\ell \in {\mathbb {R}}^d\) denotes the position, \(v_\ell \) the effective potential, and \(Z_\ell \) the atomic species of \(\ell \). Moreover, we define \(\varvec{r}_{\ell k} :=\varvec{r}_k - \varvec{r}_\ell \), \(r_{\ell k}:=|\varvec{r}_{\ell k}|\), and \(\varvec{u}_{\ell k} :=( \varvec{r}_{\ell k}, v_\ell , v_k, Z_\ell , Z_k)\). For functions f of the relative atomic positions \({\varvec{u}}_{\ell k}\), the gradient denotes the gradient with respect to the spatial variable: \(\nabla f({\varvec{u}}_{\ell k}) :=\nabla \big ( \xi \mapsto f(( \xi , v_\ell , v_k, Z_\ell , Z_k ))\big )\big |_{\xi = \varvec{r}_{\ell k}}\). The whole configuration is denoted by \({\varvec{u}}= (\varvec{r}, v, Z) = (\{\varvec{r}_\ell \}_{\ell \in \Lambda }, \{v_\ell \}_{\ell \in \Lambda }, \{Z_\ell \}_{\ell \in \Lambda })\).

For a given configuration \(\varvec{u}\), the tight binding Hamiltonian takes the following form:

(TB) For \(\ell , k \in \Lambda \) and \(N_\mathrm {b}\) atomic orbitals per atom, we suppose that

$$\begin{aligned} \mathcal {H}({\varvec{u}})_{\ell k} = h\big ( \varvec{u}_{\ell k} \big ) + \sum _{m \not \in \{\ell , k\}} t(\varvec{u}_{\ell m}, \varvec{u}_{km}) + \delta _{\ell k} v_\ell \mathrm {Id}_{N_\mathrm {b}}, \end{aligned}$$
(2.1)

where h and t have values in\({\mathbb {R}}^{N_\mathrm {b}\times N_\mathrm {b}}\), are independent of the effective potential v, and are continuously differentiable with

$$\begin{aligned} \left| h(\varvec{u}_{\ell k}) \right| + \left| \nabla h(\varvec{u}_{\ell k}) \right|&\leqslant h_0 e^{-\gamma _0 \, r_{\ell k}}, \qquad {and} \end{aligned}$$
(2.2)
$$\begin{aligned} \left| t(\varvec{u}_{\ell m},\varvec{u}_{km}) \right| +\left| \nabla t(\varvec{u}_{\ell m},\varvec{u}_{km}) \right|&\leqslant h_0 e^{-\gamma _0 (r_{\ell m} + r_{km})}, \end{aligned}$$
(2.3)

for some \(h_0,\gamma _0>0\).

Moreover, we suppose the Hamiltonian satisfies the following symmetries:

  • \(h({\varvec{u}}_{\ell k}) = h({\varvec{u}}_{k\ell })^\mathrm {T}\) and \(t({\varvec{u}}_{\ell m}, {\varvec{u}}_{km}) = t({\varvec{u}}_{k m}, {\varvec{u}}_{\ell m})^\mathrm {T}\) for all \(\ell , k, m\in \Lambda \),

  • For orthogonal transformations \(Q \in {\mathbb {R}}^{d \times d}\), there exist orthogonal \(D^\ell (Q) \in {\mathbb {R}}^{N_\mathrm {b}\times N_\mathrm {b}}\) such that \(\mathcal {H}( Q {\varvec{u}}) = D(Q) \mathcal {H}({\varvec{u}}) D(Q)^\mathrm {T}\) where \(D(Q) = \mathrm {diag}( \{ D^\ell (Q) \}_{\ell \in \Lambda } )\) and \(Q{\varvec{u}}:=( \{Q \varvec{r}_\ell \}_{\ell \in \Lambda }, v, Z)\).

Remark 1

(i) The constants in (2.2)-(2.3) are independent of the atomic sites \(\ell , k, m \in \Lambda \).

(ii) Pointwise bounds on \(|h(\varvec{u}_{\ell k})|\) and \(|t(\varvec{u}_{\ell m}, \varvec{u}_{km})|\) are normally automatically satisfied since most linear tight binding models impose finite cut-off radii. Moreover, the assumption on the derivatives \(|\nabla h(\varvec{u}_{\ell k})|\) and \(|\nabla t(\varvec{u}_{\ell m}, \varvec{u}_{km})|\) states that there are no long range interactions in the model. In particular, we are assuming that Coulomb interactions have been screened, a typical assumption in many practical tight binding codes [20, 68, 71].

(iii) The Hamiltonian is symmetric and thus the spectrum is real.

(iv) The operators \(\mathcal {H}({\varvec{u}})\) and \(\mathcal {H}( Q{\varvec{u}})\) are similar, and thus have the same spectra.

(v) The symmetry assumptions [84] of (TB) are justified in [16, Appendix A].

(vi) The entries of \(\mathcal {H}({\varvec{u}})_{\ell k} \in {\mathbb {R}}^{N_\mathrm {b} \times N_\mathrm {b}}\) will be denoted \(\mathcal {H}({\varvec{u}})_{\ell k}^{ab}\) for \(1 \leqslant a,b \leqslant N_\mathrm {b}\). When clear from the context, we drop the argument \(({\varvec{u}})\) in the notation.

The assumptions (TB) define a general three-centre tight binding model, whereas, if \(t \equiv 0\), a simplification made in the majority of tight binding codes, we say (TB) is a two-centre model [38].

The choice of potential in (TB) defines a hierarchy of tight binding models. If \(v = \mathrm{const}\), (TB) defines a linear tight binding model, a simple yet common model [14, 16, 17, 70]. In this case, we implicitly assume that the Coulomb interactions have been screened, a typical assumption made in practice for a wide variety of materials [20, 68, 71, 72]. Supposing that v is a function of a self-consistent electronic density, we arrive at a non-linear model such as DFTB [33, 59, 78]. Abstract variants of these nonlinear models have been analysed, for example, in [93, 99]. Through much of this article we will treat \(\varvec{r}, v\) as independent inputs into the Hamiltonian, but will discuss their connection and self-consistency in § 2.7.

For a finite system \({\varvec{u}}\) (that is, with \(\Lambda \) a finite set), we consider analytic observables of the density of states [14, 93]: for functions \(O :{\mathbb {R}} \rightarrow {\mathbb {R}}\) that can be analytically continued into an open neighbourhood of \(\sigma \big ( \mathcal {H}( {\varvec{u}}) \big )\), we consider that

$$\begin{aligned} \mathrm {Tr}\, O \big ( \mathcal {H}({\varvec{u}}) \big ) = \sum _s O(\lambda _s), \end{aligned}$$

where \((\lambda _s, \psi _s)\) are normalised eigenpairs of \(\mathcal {H}({\varvec{u}})\). Many properties of the system, including the particle number functional and Helmholtz free energy, may be written in this form [14, 16, 70, 93]. By distributing these quantities amongst atomic positions, we obtain a well-known spatial decomposition [14, 16, 35, 38],

$$\begin{aligned} \mathrm {Tr}\, O \big ( \mathcal {H}({\varvec{u}}) \big ) = \sum _{\ell \in \Lambda } O_\ell ({\varvec{u}}) \quad \text {where} \quad {O_\ell ({\varvec{u}}) :=\mathrm {tr} \big [O\big (\mathcal {H}({\varvec{u}})\big )_{\ell \ell }\big ] = \sum _s O(\lambda _s) \big |[\psi _s]_\ell \big |^2.} \end{aligned}$$
(2.4)

For infinite systems, we may define \(O_\ell ({\varvec{u}})\) through the thermodynamic limit [14, 16] or via the holomorphic functional calculus; see § 4.1.2 for further details.

When discussing derivatives of the local observables, we will simplify notation and write

$$\begin{aligned} \frac{\partial O_\ell ({\varvec{u}})}{\partial {\varvec{u}}_m} :=\bigg ( \frac{\partial O_\ell ({\varvec{u}})}{\partial \varvec{r}_m}, \frac{\partial O_\ell ({\varvec{u}})}{\partial v_m} \bigg ). \end{aligned}$$
(2.5)

2.1.2 Local observables

Although the results in this paper apply to general analytic observables, our primary interest is in applying them to two special cases. A local observable of particular importance is the electron density; for inverse Fermi-temperature \(\beta \in (0,+\infty ]\) and fixed chemical potential \(\mu \), we use the notation of (2.4) to define

$$\begin{aligned} \rho _\ell = F^\beta _\ell ({\varvec{u}}) \quad \text {where} \quad F^\beta (z) :={\left\{ \begin{array}{ll} \big ( 1 + e^{\beta (z - \mu )} \big )^{-1} &{}\text {if } \beta < \infty \\ \chi _{(-\infty ,\mu )}(z) + \frac{1}{2}\chi _{\{\mu \}}(z) &{}\text {if }\beta = \infty . \end{array}\right. } \end{aligned}$$
(2.6)

Throughout this paper \(F^\beta ({\varvec{u}}) :=\big (F_\ell ^\beta ({\varvec{u}})\big )_{\ell \in \Lambda }\) will denote a vector and so (2.6) reads \(\rho = F^\beta ({\varvec{u}})\).

In § 2.7, we consider the case where the effective potential is a function of the electron density (2.6) (that is, \(v = {w(\rho )}\) for some \(w :{\mathbb {R}}^\Lambda \rightarrow {\mathbb {R}}^\Lambda \)) which leads to the self-consistent local observables

$$\begin{aligned} {\left\{ \begin{array}{ll} O^{\mathrm {sc}}_\ell ({\varvec{u}}) :=O_\ell \big ( {\varvec{u}}(\rho ^\star ) \big ) \\ \rho ^\star = F^\beta \big ( {\varvec{u}}(\rho ^\star ) \big ) \end{array}\right. }, \end{aligned}$$
(2.7)

where \({\varvec{u}}(\rho ) :=\big ( \varvec{r}, w(\rho ), Z\big )\).

Remark 2

All the results of this paper also hold for the off-diagonal entries of the density matrix (\(\rho _{\ell k} :=\mathrm {tr} \, F^\beta \big (\mathcal {H}({\varvec{u}})\big )_{\ell k}\)) without any additional work. This fact will be clear from the proofs. It is likely though that additional properties related to the off-diagonal decay (near-sightedness) and spatial regularity further improve the “sparsity” of the density matrix. A complete analysis would go beyond the scope of this work.

The second observable we are particularly interested in is the site energy, which allows us to decompose the total potential energy landscape into localised “atomic” contributions. In the grand potential model for the electrons, which is appropriate for large or infinite condensed phase systems [14], it is defined as

$$\begin{aligned} G^\beta _\ell ({\varvec{u}}) :=\mathrm {tr} \big [ G^\beta \big ( \mathcal {H}({\varvec{u}}) \big )_{\ell \ell } \big ] \quad \text {where} \quad G^\beta (z) :={\left\{ \begin{array}{ll} \frac{2}{\beta } \log \big ( 1 - F^\beta (z) \big ) &{}\text {if } \beta < \infty \\ 2(z - \mu ) \chi _{(-\infty , \mu )}(z) &{}\text {if } \beta = \infty . \end{array}\right. } \end{aligned}$$
(2.8)

The total grand potential is defined as \(\sum _\ell G^\beta _\ell ({\varvec{u}})\) [14, 70].

For \(\beta < \infty \), the functions \(F^\beta (\,\cdot \,)\) and \(G^\beta (\,\cdot \,)\) are analytic in a strip of width \(\pi \beta ^{-1}\) about the real axis [17, Lemma 5.1]. To define the zero Fermi-temperature observables, we assume that \(\mu \) lies in a spectral gap (\(\mu \not \in \sigma \big ( \mathcal {H}({\varvec{u}}) \big )\); see § 2.1.3). In this case, \(F^\beta (\,\cdot \,)\) and \(G^\beta (\,\cdot \,)\) extend to analytic functions in a neighbourhood of \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) for all \(\beta \in (0,\infty ]\).

In order to describe the relationship between the various constants in our estimates and the inverse Fermi-temperature or spectral gap (in the case of insulators), we will state all of our results for \(O^\beta = F^\beta \) or \(G^\beta \). Other analytic quantities of interest can be treated similarly with constants depending, e.g., on the region of analyticity of the corresponding function \(z\mapsto O(z)\).

2.1.3 Metals, insulators, and defects

As we can see from (2.4), the structure of the spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) will have a key role in the analysis. Firstly, by (TB), \(\mathcal {H}({\varvec{u}})\) is a bounded self-adjoint operator on \(\ell ^2(\Lambda \times \{1,\dots ,N_\mathrm {b}\})\) and thus the spectrum is real and contained in some bounded interval. In order to keep the mathematical results general, we will not impose any further restrictions on the spectrum. However, to illustrate the main ideas, we briefly describe typical spectra seen in metals and insulating systems.

In the case where \(\varvec{u}\) describes a multi-lattice in \({\mathbb {R}}^d\) formed by taking the union of finitely many shifted Bravais lattices, the spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) is the union of finitely many continuous energy bands [57]. That is, there exist continuous functions, \(\varepsilon ^\alpha :\overline{\mathrm {BZ}} \rightarrow {\mathbb {R}}\), on the Brillouin zone \(\mathrm {BZ}\), a compact connected subset of \({\mathbb {R}}^d\), such that

$$\begin{aligned} \sigma \big ( \mathcal {H}({\varvec{u}}) \big ) = \bigcup _{\alpha } \varepsilon ^\alpha ( \mathrm {BZ} ). \end{aligned}$$

In particular, in this case, \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big ) = \sigma _{\mathrm {ess}}\big ( \mathcal {H}({\varvec{u}}) \big )\) is the union of finitely many intervals on the real line. The band structure \(\{\varepsilon ^\alpha \}\) relative to the position of the chemical potential, \(\mu \), determines the electronic properties of the system [89]. In metals \(\mu \) lies within a band, whereas for insulators, \(\mu \) lies between two bands in a spectral gap. Schematic plots of these two situations are given in Figure 1.

Fig. 1
figure 1

Schematic plots of the spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) of a metal (top) and insulator (bottom)

We now consider perturbations of a reference configuration \({\varvec{u}}^\mathrm {ref} = (\varvec{r}^\mathrm {ref}, v^\mathrm {ref}, Z^\mathrm {ref})\) defined on an index set \(\Lambda ^\mathrm {ref}\).

Proposition 2.1

(Perturbation of the Spectrum) For \(\delta , R_\mathrm {def} > 0\), there exists \(\delta _0 > 0\) such that if \({\varvec{u}}= (\varvec{r}, v, Z)\) is a configuration defined on some index set \(\Lambda \) satisfying \(\Lambda \setminus B_{R_\mathrm {def}} = \Lambda ^\mathrm {ref} \setminus B_{R_\mathrm {def}}\), \(\Lambda \cap B_{R_\mathrm {def}}\) is finite, \(Z_k = Z_k^\mathrm {ref}\) for all \(k \in \Lambda \setminus B_{R_\mathrm {ref}}\), and \(\sup _{k \in \Lambda \setminus B_{R_\mathrm {def}}} \big [ |\varvec{r}_k - \varvec{r}_k^\mathrm {ref}| + | v_k - v_k^\mathrm {ref} | \big ] \leqslant \delta _0\), then

$$\begin{aligned} \Big | \sigma \big ( \mathcal {H}({\varvec{u}}) \big ) \setminus B_\delta \big ( \sigma \big (\mathcal {H}({\varvec{u}}^\mathrm {ref})\big ) \big ) \Big | < \infty . \end{aligned}$$

In particular, if \({\varvec{u}}^\mathrm {ref}\) describes a multilattice, then, since local perturbations in the defect core are of finite rank, the essential spectrum is unchanged and we obtain finitely many eigenvalues bounded away from the spectral bands. Moreover, a small global perturbation can only result in a small change in the spectrum. Again, a schematic plot of this situation is given in Figure 2.

For the remainder of this paper, we consider the following notation:

Definition 1

Suppose that \({\varvec{u}}^\mathrm {ref}\) is a general reference configuration defined on \(\Lambda ^\mathrm {ref}\) and \({\varvec{u}}\) is a configuration arising due to Proposition 2.1. Then, we define \(I_-\) and \(I_+\) to be compact intervals and \(\{\lambda _j\}\) to be a finite set such that

$$\begin{aligned} \sigma \big ( \mathcal {H}({\varvec{u}}^\mathrm {ref}) \big ) \subset I_- \cup I_+, \qquad \sigma \big ( \mathcal {H}({\varvec{u}}) \big ) \subset I_- \cup \{\lambda _j\} \cup I_+ \end{aligned}$$
(2.9)

and \(\max I_- \leqslant \mu \leqslant \min I_+\). Moreover, we define

$$\begin{aligned} \mathsf {g}&:=\min I_+ - \max I_- \geqslant 0, \qquad \text {and} \end{aligned}$$
(2.10)
$$\begin{aligned} \mathsf {g}^\mathrm {def}&:=\min I_+ \cup \{\lambda _j :\lambda _j \geqslant \mu \} - \max I_- \cup \{\lambda _j :\lambda _j \leqslant \mu \}. \end{aligned}$$
(2.11)

The constants in Definition 1 are also displayed in Figure 2. The constant \(\mathsf {g}\) in Definition 1 is slightly arbitrary in the sense that as long as \(B_\delta \big ( \sigma \big (\mathcal {H}({\varvec{u}}^\mathrm {ref})\big ) \big ) \subset I_- \cup I_+\) (where \(\delta \) is the constant from Proposition 2.1), then there exists a finite set \(\{\lambda _j\}\) as in (2.9). Choosing smaller \(\mathsf {g}\) reduces the size of the set \(\{\lambda _j\}\).

Fig. 2
figure 2

Top: Schematic plot of the spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}^\mathrm {ref}) \big )\) for an insulating system, together with two compact intervals \(I_-\) and \(I_+\) as in (2.9) and the constant \(\mathsf {g}\) from (2.10). Bottom: The spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) after considering perturbations satisfying Proposition 2.1. While the edges of the spectrum may be accumulation points for a sequence of eigenvalues within the band gap, the number of such eigenvalues bounded away from the edges is finite

2.2 Vacuum cluster expansion

For a system of M identical particles \(X_1,\dots ,X_M\), a maximal body-order N, and a permutation invariant energy \(E = E(\{X_1,\dots ,X_M\})\), we may consider the vacuum cluster expansion,

$$\begin{aligned} E(\{X_1,\dots ,X_M\}) \approx \sum _{n=0}^{N} \sum _{1 \leqslant m_1< \dots < m_n \leqslant M} V^{(n)}(X_{m_1},\dots ,X_{m_n}), \end{aligned}$$
(2.12)

where the n-body interaction potentials \(V^{(n)}\) are defined by considering all isolated clusters of \(j \leqslant n\) atoms:

$$\begin{aligned} V^{(n)}(X_{1},\dots ,X_{n}) = \sum _{j = 0}^n (-1)^{n-j} \sum _{1 \leqslant m_1< \dots < m_j \leqslant n} E(\{X_{m_1},\dots ,X_{m_j}\}). \end{aligned}$$

The expansion (2.12) is exact for \(N = M\). The vacuum cluster expansion is the traditional and, arguably, the most natural many-body expansion of a potential energy landscape. However, in many systems, it converges extremely slowly with respect to the body-order N and is thus computationally impractical. An intuitive explanation for this slow convergence is that, when defining the body-order expansion in this way, we are building an interaction law for a condensed or possibly even crystalline phase material from clusters in vacuum where the bonding chemistry is significantly different. Although this observation appears to be “common knowledge” we were unable to find references that provide clear evidence for it. However, some limited discussions and further references can be found in [10, 25, 27, 46, 86].

Our own approach employs an entirely different mechanism, which in particular incorporates environment information and leads to an exponential convergence of an N-body approximation. Technically, our approximation is not an expansion, that is, the n-body terms \(V^{(n)}\) of the classical cluster expansion are replaced by terms that depend also on the highest body-order N. We will provide a more technical discussion contrasting our results with the vacuum cluster expansion in § 2.6.

2.3 A general framework

Before we consider two specific body-ordered approximations, we present a general framework which both incorporates many (linear-scaling) electronic structure methods from the literature (e.g. the kernel polynomial method (KPM) [82], bond-order potentials (BOP) [26, 39, 55, 74], and quadrature-based methods [69, 87, 88]), and illustrates the key features needed for a convergent scheme: To that end, we introduce the local density of states (LDOS) [38] which is the (positive) measure \(D_\ell \) supported on \(\sigma (\mathcal {H})\) such that

$$\begin{aligned} \int x^n \mathrm {d}D_\ell (x) = \mathrm {tr}[\mathcal {H}^n]_{\ell \ell }, \qquad \text {for } n \in {\mathbb {N}}_0. \end{aligned}$$
(2.13)

Existence and uniqueness follows from the spectral theorem for normal operators (e.g. see [1, Theorem 6.3.3] or [92]). In particular, (2.4) may be written as the integral \( O_\ell ({\varvec{u}}) = \int O \,\mathrm {d}D_\ell \).

Then, on constructing a (possibly signed) unit measure \(D_\ell ^N\) with exact first N moments (that is, \(\int x^n \mathrm {d}D_\ell ^N(x) = \mathrm {tr}[\mathcal {H}^n]_{\ell \ell }\) for \(n = 1,\dots , N\)), we may define the approximate local observable \(O_\ell ^N({\varvec{u}}) :=\int O \, \mathrm {d}D_\ell ^N\), and obtain the general error estimates

$$\begin{aligned} \big | O_\ell ({\varvec{u}}) - O_\ell ^N({\varvec{u}}) \big |&= \inf _{P_N \in {\mathcal {P}}_N} \Big | \int \big ( O - P_N \big ) \mathrm {d}\big ( D_\ell - D_\ell ^N \big ) \Big | \nonumber \\&\leqslant \big \Vert D_\ell - D_\ell ^N \big \Vert _{\mathrm {op}} \inf _{P_N \in {\mathcal {P}}_N} \big \Vert O - P_N \big \Vert _{\infty }, \end{aligned}$$
(2.14)

where \({\mathcal {P}}_N\) denotes the set of polynomials of degree at most N, and \(\Vert \,\cdot \,\Vert _{\mathrm {op}}\) is the operator norm on a function space \(({\mathcal {S}}, \Vert \,\cdot \,\Vert _\infty )\). For example, we may take \({\mathcal {S}}\) to be the set of functions analytic on an open set containing \({\mathscr {C}}\), a contour encircling \(\mathrm {supp}\big ( D_\ell - D_\ell ^N \big )\), and consider

$$\begin{aligned} \Vert O \Vert _{\infty } :=\frac{\mathrm {len}({\mathscr {C}})}{2\pi } \Vert O\Vert _{L^\infty ( {\mathscr {C}} )}. \end{aligned}$$

Alternatively, we may consider \({\mathcal {S}} = L^\infty \big ( \mathrm {supp}(D_\ell - D_\ell ^N) \big )\) leading to the total variation operator norm.

Equation (2.14) highlights the key generic features that are crucial ingredients in obtaining convergence results:

  • Analyticity. The potential theory results of § 4.1.5 connect the asymptotic convergence rates for polynomial approximation to the size and shape of the region of analyticity of O.

  • Spectral Pollution. While \(\mathrm{supp} D_\ell \subset \sigma (\mathcal {H})\), this need not be true for \(D_\ell ^N\). Indeed, if \(\mathrm{supp} D_\ell ^N\) introduces additional points within the band gap, this may significantly slow the convergence of the polynomial approximation; cf. § 2.6.

  • Regularity of \(D_\ell ^N\). Roughly speaking, the first term of (2.14) measures how “well-behaved” \(D_\ell ^N\) is. In particular, if \(D_\ell ^N\) is positive, then this term is bounded independently of N, whereas, if \(D_\ell ^N\) is a general signed measure, then this factor contributes to the asymptotic convergence behaviour.

In the sections to follow, we introduce linear (§ 2.4) and nonlinear (§ 2.5) approximation schemes that fit into this general framework. Moreover, in § 2.6, we also write the vacuum cluster expansion as an integral against an approximate LDOS. In order to complement the intuitive explanation for the slow convergence of the vacuum cluster expansion, we investigate which of the requirements listed above fail.

In the appendices, we review other approximation schemes that fit into this general framework such as the quadrature method (Appendix D), numerical bond order potentials (Appendix E), and the kernel polynomial method (Appendix F).

2.4 Linear body-ordered approximation

We will construct two distinct but related many-body approximation models. To construct our first model we exploit the observation that polynomial approximations of an analytic function correspond to body-order expansions of an observable.

An intuitive approach is to write the local observable in terms of its Chebyshev expansion and truncate to some maximal polynomial degree. The corresponding projection operator is a simple example of the kernel polynomial method (KPM) [82] and the basis for analytic bond order potentials (BOP) [74]. We discuss in Appendix F that these schemes put more emphasis on the approximation of the local density of states (LDOS) and, in particular, exploit particular features of the Chebyshev polynomials to obtain a positive approximate LDOS. Since our focus is instead on the approximation of observables, we employ a different approach that is tailored to specific properties of the band structure and leads to superior convergence rates for these quantities.

For a set of \(N+1\) interpolation points \(X_N = \{x_j\}_{j=0}^N\), and a complex-valued function O defined on \(X_N\), we denote by \(I_{X_N}O\) the degree N polynomial interpolant of \(x\mapsto O(x)\) on \(X_N\). This gives rise to the body-ordered approximation

$$\begin{aligned} I_{X_N} O_\ell ({\varvec{u}}) :=\mathrm {tr} \big [ I_{X_N} O\big (\mathcal {H}({\varvec{u}})\big )_{\ell \ell } \big ]. \end{aligned}$$
(2.15)

We may connect (2.15) to the general framework in § 2.3 by defining

$$\begin{aligned} I_{X_N} O_\ell ({\varvec{u}}) = \int O\, \mathrm {d}D_\ell ^{N,\mathrm {lin}} \qquad \text {where} \qquad D_\ell ^{N,\mathrm {lin}} :=\mathrm {tr}\sum _{j} \ell _j(\mathcal {H})_{\ell \ell } \, \delta (\,\cdot \,- x_j), \end{aligned}$$
(2.16)

and \(\ell _j\) are the node polynomials corresponding to \(X_N = \{x_j\}_{j=0}^N\) (that is, \(\ell _j\) are the polynomials of degree N with \(\ell _j(x_i) = \delta _{ij}\)).

Proposition 2.2

\(I_{X_N} O_\ell ({\varvec{u}})\) has body-order at most 2N. More specifically, there exists \((n+1)\)-body potentials \(V_{nN}\) for \(n = 0,\dots ,2N-1\) such that

$$\begin{aligned} I_{X_N} O_\ell ({\varvec{u}}) = \sum _{n = 0}^{2N-1} \sum _{\genfrac{}{}{0.0pt}{}{k_1,\dots ,k_n \not = \ell }{k_1< \dots < k_n} } V_{nN}({\varvec{u}}_\ell ; {\varvec{u}}_{\ell k_1}, \dots , {\varvec{u}}_{\ell k_n}). \end{aligned}$$
(2.17)

Proof

(Sketch of the Proof.) Since (2.15) is a linear combination of the monomials \([\mathcal {H}^n]_{\ell \ell }\), it is enough to show that, for each \(n \in {\mathbb {N}}\),

$$\begin{aligned}{}[\mathcal {H}^{n}]_{\ell \ell } = \sum _{\ell _1,\dots ,\ell _{n-1}} \mathcal {H}_{\ell \ell _1} \mathcal {H}_{\ell _1\ell _2} \cdots \mathcal {H}_{\ell _{n-1}\ell } \end{aligned}$$
(2.18)

has finite body order.

Each term in (2.18) depends on the central atom \(\ell \), the \(n-1\) neighbouring sites \(\ell _1, \dots , \ell _{n-1}\), and the at most n additional sites arising from the three-centre summation in the tight binding Hamiltonian (TB). In particular, (2.15) has body order at most 2N. See § 4.2 for a complete proof including an explicit definition of the \(V_{nN}\). \(\square \)

If one uses Chebyshev points as the basis for the body-ordered approximation (2.15), the rates of convergence depend on the size of the largest Bernstein ellipse (that is, ellipses with foci points \(\pm 1\)) contained in the region of analyticity of \(z \mapsto O(z)\) [95]. This leads to a exponentially convergent body-order expansion in the metallic finite-temperature case (see § 4.1.4 for the details).

However, the resulting estimates deteriorate in the zero-temperature limit. Instead, we apply results of potential theory to construct interpolation sets \(X_N\) that are adapted to the spectral properties of the system (see § 4.1.5 for examples) and (i) do not suffer from spectral pollution, and (ii) (asymptotically) minimise the total variation of \(D_{\ell }^{N,\mathrm {lin}}\) which, in this context, is the Lebesgue constant [95] for the interpolation operator \(I_{X_N}\). This leads to rapid convergence of the body-order approximation based on (2.15). The interpolation sets \(X_N\) depend only on the intervals \(I_-, I_+\) from Definition 1 (see also Figure 2) and can be chosen independently of \({\varvec{u}}^\mathrm {ref}\) as long as \(B_\delta \big ( \sigma \big ( \mathcal {H}({\varvec{u}}^\mathrm {ref}) \big ) \big ) \subset I_- \cup I_+\).

Theorem 2.3

Suppose \(\varvec{u}^\mathrm {ref}\) is given by Definition 1. Fix \(0 < \beta \leqslant \infty \) and suppose that, either \(\beta < \infty \) or \(\mathsf {g}>0\). Then, for all \(N\in {\mathbb {N}}\), there exist constants \(\gamma _N > 0\) and interpolation sets \(X_N = \{x_j\}_{j=0}^N \subset I_- \cup I_+\) satisfying (2.17) such that

$$\begin{aligned} \big |O^\beta _\ell (\varvec{u}^\mathrm {ref}) - I_{X_N} O^\beta _\ell (\varvec{u}^\mathrm {ref})\big |&\leqslant C_1 e^{- \gamma _N N}, \qquad \text {and}\\ \Bigg | \frac{\partial O^\beta _\ell }{\partial \varvec{u}_m} (\varvec{u}^\mathrm {ref}) - \frac{\partial I_{X_N} O^\beta _\ell }{\partial \varvec{u}_m} (\varvec{u}^\mathrm {ref}) \Bigg |&\leqslant C_2 e^{-\frac{1}{2} \gamma _N N} e^{-\eta \, r_{\ell m}}, \end{aligned}$$

where \(O^\beta = F^\beta \) or \(G^\beta \) and \(C_1, C_2, \eta >0\) are independent of N. The asymptotic convergence rate \(\gamma :=\lim _{N\rightarrow \infty } \gamma _N\) is positive and exhibits the asymptotic behaviour

$$\begin{aligned}&C_1 \sim (\mathsf {g} + \beta ^{-1})^{-1}, \quad C_2 \sim (\mathsf {g} + \beta ^{-1})^{-3},\nonumber \\&\quad \text {and} \quad \gamma , \eta \sim \mathsf {g} + \beta ^{-1} \quad \text { as } \mathsf {g} + \beta ^{-1} \rightarrow 0. \end{aligned}$$
(2.19)

In this asymptotic relation, we assume that the limit \(\mathsf {g}\rightarrow 0\) is approached symmetrically about the chemical potential \(\mu \).

Remark 3

Higher derivatives may be treated similarly under the assumption that higher derivatives of the tight binding Hamiltonian (TB) exist and are short ranged.

2.4.1 The role of the point spectrum

We now turn towards the important scenario when a localised defect is embedded within a homogeneous crystalline solid. Recall from § 2.1.3 (see in particular Fig. 2) that this gives rise to a discrete spectrum, which “pollutes” the band gap [70]. Thus, the spectral gap is reduced and a naive application of Theorem 2.3 leads to a reduction in the convergence rate of the body-ordered approximation. We now improve these estimates by showing that, away from the defect, we obtain improved pre-asymptotics, reminiscent of similar results for locality of interaction [17].

In that follow, we fix \({\varvec{u}}\) satisfying Definition 1. While improved estimates may be obtained by choosing \(\{\lambda _j\}\) as interpolation points, leading to asymptotic exponents that are independent of the defect, in practice, this requires full knowledge of the point spectrum. Since the point spectrum within the spectral gap depends on the whole atomic configuration, the approximate quantities of interest corresponding to these interpolation operators would no longer satisfy Proposition 2.2.

Remark 4

This phenomenon has been observed in the context of Krylov subspace methods for solving linear equations \(Ax = b\) where outlying eigenvalues delay the convergence by O(1) steps without affecting the asymptotic rate [30]. Indeed, since the residual after n steps may be written as \(r_n = p_n(A) r_0\) where \(p_n\) is a polynomial of degree n, there is a close link between polynomial approximation and convergence of Krylov methods.

On the other hand, we may use the exponential localisation of the eigenvectors corresponding to isolated eigenvalues to obtain pre-factors that decay exponentially as \(|\varvec{r}_\ell | \rightarrow \infty \).

Theorem 2.4

Suppose \({\varvec{u}}\) satisfies Definition 1 with \(\mathsf {g}>0\). Fix \(0<\beta \leqslant \infty \) and suppose that, if \(\beta = \infty \), then \(\mathsf {g}^\mathrm {def}>0\), and let \(C_1,C_2, \gamma _N, \gamma , \eta \), and \(X_N = \{x_j\}_{j=0}^N \subset I_-\cup I_+\) be given by Theorem 2.3. Then, there exist \(\gamma _\mathrm {CT},\gamma _N^{\mathrm {def}}>0\) such that

$$\begin{aligned} \big |O^\beta _\ell ({\varvec{u}}) - I_{X_N} {O}^\beta _\ell ({\varvec{u}})\big |&\leqslant C_1 e^{-\gamma _N N} + C_3 e^{-\gamma _\mathrm {CT} |\varvec{r}_\ell | } e^{-\frac{1}{2}\gamma _N^{\mathrm {def}} N} \end{aligned}$$
(2.20)
$$\begin{aligned} \Bigg | \frac{\partial O^\beta _\ell }{\partial \varvec{u}_m} (\varvec{u}) - \frac{\partial I_{X_N} O^\beta _\ell }{\partial \varvec{u}_m} (\varvec{u}) \Bigg |&\leqslant \Big ( C_2 e^{-\frac{1}{2}\gamma _N N} + C_4 e^{-\gamma _\mathrm {CT} |\varvec{r}_\ell |} e^{-\frac{1}{2}\gamma _N^{\mathrm {def}} N} \Big ) e^{-\eta \, r_{\ell m}} \end{aligned}$$
(2.21)

where \(O^\beta = F^\beta \) or \(G^\beta \) and \(C_3, C_4>0\) are independent of N. The asymptotic convergence rate \(\gamma ^\mathrm {def} :=\lim _{N\rightarrow \infty } \gamma ^\mathrm {def}_N\) is positive and we have

$$\begin{aligned}&\gamma ^\mathrm {def} \sim \mathsf {g}^\mathrm {def} + \beta ^{-1} \quad \text { as } \mathsf {g}^\mathrm {def} + \beta ^{-1} \rightarrow 0, \nonumber \\&\quad \text {and} \qquad \gamma _\mathrm {CT}, \eta \sim \mathsf {g} + \beta ^{-1} \quad \text { as } \mathsf {g} + \beta ^{-1} \rightarrow 0. \end{aligned}$$
(2.22)

In these asymptotic relations, we assume that the limits \(\mathsf {g}^\mathrm {def},\mathsf {g}\rightarrow 0\) are approached symmetrically about the chemical potential \(\mu \).

In practice, Theorem 2.4 means that, for atomic sites \(\ell \) away from the defect-core, the observed pre-asymptotic error estimates may be significantly better than the asymptotic convergence rates obtained in Theorem 2.3.

Remark 5

(Locality) (i) By Theorem 2.4, and the locality estimates for the exact observables \(O_\ell ^\beta \) [17], we immediately obtain corresponding locality estimates for the approximate quantities:

$$\begin{aligned} \left| \frac{\partial I_{X_N} O^\beta _\ell ({\varvec{u}}) }{\partial {\varvec{u}}_m} \right| \lesssim e^{- \eta \,r_{\ell m}}. \end{aligned}$$
(2.23)

(ii) We investigate another type of locality in Appendix B where we show that various truncation operators result in approximation schemes that only depend on a small atomic neighbourhood of the central site. An exponential rate of convergence as the truncation radius tends to infinity is obtained.

Remark 6

(Connection to the general framework) The fact that the exponents in Theorem 2.4 depend on the discrete eigenvalues of \(\mathcal {H}({\varvec{u}})\) can be seen from the general estimate (2.14) applied to the approximate LDOS \(D_\ell ^{N,\mathrm {lin}}\) from (2.16):

  • Spectral Pollution. We choose the interpolation points so that the support of \(D_\ell ^{N,\mathrm {lin}}\) lies within \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) and so spectral pollution does not play a role,

  • Regularity of \(D_\ell ^{N.\mathrm {lin}}\). The total variation of \(D_\ell ^{N,\mathrm {lin}}\) can be estimated by the Lebesgue constant [95] for the interpolation operator \(I_{X_N}\):

    $$\begin{aligned} \Vert D_\ell ^{N,\mathrm {lin}} \Vert _{\mathrm {TV}}&:=\sup _{\Vert f\Vert _{L^\infty (\sigma (\mathcal {H}))} = 1} | I_{X_N} f(\mathcal {H})_{\ell \ell } | \leqslant \sup _{\Vert f\Vert _{L^\infty (\sigma (\mathcal {H}))} = 1} \sup _{x \in \sigma (\mathcal {H})} | I_{X_N} f(x)| \\&= \sup _{x \in \sigma (\mathcal {H})} \sum _j |\ell _j(x)|. \end{aligned}$$

    This quantity depends on the discrete eigenvalues within the band gap.

2.5 A non-linear representation

The method presented in § 2.4 approximates local quantities of interest by approximating the integrand \(O :{\mathbb {C}} \rightarrow {\mathbb {C}}\) with polynomials. As we have seen, this leads to approximation schemes that are linear functions of the spatial correlations \(\{[\mathcal {H}^n]_{\ell \ell }\}_{n \in {\mathbb {N}}}\). In this section, we construct a non-linear approximation related to bond-order potentials (BOP) [26, 39, 55] and show that the added non-linearity leads to improved asymptotic error estimates that are independent of the discrete spectra lying within the band gap. In this way, the nonlinearity captures “spectral information” from \(\mathcal {H}\) rather than only approximating \(O :{\mathbb {C}} \rightarrow {\mathbb {C}}\) without reference to the Hamiltonian.

Applying the recursion method [49, 50], a reformulation of the Lanczos process [61], we obtain a tri-diagonal (Jacobi) operator T on \(\ell ^2({\mathbb {N}}_0)\) whose spectral measure is the LDOS \(D_\ell \) [91] (see § 4.3.1 for the details). We then truncate T by taking the principal \(\frac{1}{2}(N+1) \times \frac{1}{2}(N+1)\) submatrix \(T_{\frac{1}{2}(N-1)}\) and define

$$\begin{aligned} {\Theta _N\big ( \mathcal {H}_{\ell \ell }, [\mathcal {H}^2]_{\ell \ell }, \dots , [\mathcal {H}^{N}]_{\ell \ell } \big )} :=O^\beta (T_{\frac{1}{2}(N-1)})_{00} = \int O^\beta \mathrm {d}D_\ell ^{N,\mathrm {nonlin}}, \end{aligned}$$
(2.24)

where \(D_\ell ^{N,\mathrm {nonlin}} = \sum _s [\psi _s]_{0}^2 \delta (\,\cdot - \lambda _s)\) is a spectral measure for \(T_{\frac{1}{2}(N-1)}\) (that is, \((\lambda _s,\psi _s)\) are normalised eigenpairs of \(T_{\frac{1}{2}(N-1)}\)). By showing that the first N moments of \(D_\ell ^{N,\mathrm {nonlin}}\) are exact, we are able to apply (2.14) to obtain the following error estimates. The asymptotic behaviour of the exponent in these estimates follows by proving that the spectral pollution of \(D_\ell ^{N,\mathrm {nonlin}}\) in the band gap is sufficiently mild.

Theorem 2.5

Suppose \({\varvec{u}}\) satisfies Definition 1. Fix \(0 < \beta \leqslant \infty \) and suppose that, if \(\beta = \infty \), then \(\mathsf {g}>0\). Then, for N odd, there exists an open set \(U \subset {\mathbb {C}}^{N}\) such that (2.24) extends to an analytic function \(\Theta _N :U \rightarrow {\mathbb {C}}\), such that

$$\begin{aligned} \Big | O^\beta _\ell ({\varvec{u}}) - {\Theta _N}\big ( \mathcal {H}_{\ell \ell }, [\mathcal {H}^2]_{\ell \ell }, \dots , [\mathcal {H}^{N}]_{\ell \ell } \big ) \Big |&\lesssim e^{-\gamma _N N} \end{aligned}$$
(2.25)

where \(O^\beta = F^\beta \) or \(G^\beta \). The asymptotic convergence rate \(\gamma :=\lim _{N\rightarrow \infty } \gamma _N\) is positive and \(\gamma \sim \mathsf {g} + \beta ^{-1}\) as \(\mathsf {g} + \beta ^{-1} \rightarrow 0\).

Remark 7

It is important to note that \({\Theta _N} :U \rightarrow {\mathbb {C}}\) can be constructed without knowledge of \(\mathcal {H}\) because, as we have seen, if the discrete eigenvalues are known a priori, then Theorem 2.5 is immediate from Theorem 2.4 by adding finitely many additional interpolation points on the discrete spectrum.

In particular, the fact that \({\Theta _N}\) is a material-agnostic nonlinearity has potentially far-reaching consequences for material modelling.

Remark 8

(Connection to the general framework) The fact that the exponents in Theorem 2.5 are independent of the discrete eigenvalues of \(\mathcal {H}({\varvec{u}})\) can be seen from the general estimate (2.14) applied to the approximate LDOS \(D_\ell ^{N,\mathrm {nonlin}}\) from (2.24):

  • Spectral Pollution. We show that \(\big |\mathrm {supp}\,D_\ell ^{N,\mathrm {nonlin}} \setminus \mathrm {supp}\,D_\ell \big |\) remains bounded independently of N and so spectral pollution only slows the convergence by at most O(1) steps,

  • Regularity of \(D_\ell ^{N,\mathrm {nonlin}}\). Since \(D_\ell ^{N,\mathrm {nonlin}}\) is a positive unit measure, we have the bound \(\Vert D_\ell - D_\ell ^{N,\mathrm {nonlin}}\Vert _{\mathrm {TV}} \leqslant 2\).

Remark 9

(Quadrature Method) Alternatively, we may use the sequence of orthogonal polynomials [40] corresponding to \(D_\ell \) as the basis for a Gauss quadrature rule to evaluate local observables. This procedure, called the Quadrature Method [51, 69], is a precursor of the bond order potentials. Outlined in Appendix D, we show that it produces an alternative scheme also satisfying Theorem 2.5.

The linear-scaling spectral Gauss quadrature (LSSGQ) method [87] is based upon this idea, albeit in the context of finite difference approximations to the DFT Hamiltonian. However, since the resulting discrete Hamiltonian in [87] is banded, the analysis of the present work may be readily applied. Therefore, Theorem 2.5 provides rigorous justification for the exponential rate of convergence for increasing body-order (number of quadrature points), complementing the intuitive explanations and numerical experiments of [87].

Since the convergence results are independent of system size, we obtain a linear-scaling method, a result that complements the intuitive explanation [87, (56)], and numerical evidence [87, Fig. 5].

Remark 10

(Convergence of Derivatives) In this more complicated nonlinear setting, obtaining results such as (2.21) is more subtle. We require an additional assumption on \(D_\ell \), which we believe maybe be typically satisfied, but we currently cannot justify it and have therefore postponed this discussion to Appendix C. We briefly mention, however, that if \(D_\ell \) is absolutely continuous (e.g., in periodic systems), we obtain

$$\begin{aligned} \bigg | \frac{\partial }{\partial {\varvec{u}}_m} \Big ( O^\beta _\ell ({\varvec{u}}) - {\Theta _N}\big ( \mathcal {H}_{\ell \ell }, [\mathcal {H}^2]_{\ell \ell }, \dots , [\mathcal {H}^{N}]_{\ell \ell } \big ) \Big ) \bigg | \lesssim e^{-\frac{1}{2}\gamma _N N} e^{-\eta \,r_{\ell m}}. \end{aligned}$$

2.6 The vacuum cluster expansion revisited

For \(\ell \in \Lambda \), we denote by \(\mathcal {H}\big |_{\ell ;K}\) the Hamiltonian matrix corresponding to the finite subsystem \(\{\ell \}\cup K \subset \Lambda \): for \(k_1,k_2 \in \{\ell \} \cup K\),

$$\begin{aligned} \big [\mathcal {H}\big |_{\ell ;K}\big ]_{k_1k_2} :=h({\varvec{u}}_{k_1 k_2}) + \sum _{m \in \{ \ell \} \cup K} t({\varvec{u}}_{k_1 m}, {\varvec{u}}_{k_2 m}) + \delta _{k_1k_2} v_{k_1} \mathrm {Id}_{N_\mathrm {b}}. \end{aligned}$$
(2.26)

For an observable O, the vacuum cluster expansion as detailed in § 2.2 is constructed as follows:

$$\begin{aligned} O_\ell ^{N,\mathrm {vac}}({\varvec{u}})&:=\sum _{n=0}^{{2N}-1} \sum _{\genfrac{}{}{0.0pt}{}{k_1,\dots ,k_n \not = \ell }{k_1< \dots < k_n} } V^{(n)}({\varvec{u}}_\ell ; {\varvec{u}}_{\ell k_1},\dots , {\varvec{u}}_{\ell k_n}) \qquad \text {where} \ \end{aligned}$$
(2.27)
$$\begin{aligned} V^{(n)}({\varvec{u}}_\ell ; {\varvec{u}}_{\ell k_1}, \dots , {\varvec{u}}_{\ell k_n})&= \sum _{K \subseteq \{k_1,\dots ,k_n\}} (-1)^{n - |K|} O\big (\mathcal {H}\big |_{\ell ;K}\big )_{\ell \ell }. \end{aligned}$$
(2.28)

Therefore, on defining the spectral measure \( D_{\ell ;K} :=\sum _s \delta \big ( \,\cdot \, - \lambda _s(K) \big ) |[\psi _s(K)]_{\ell }|^2 \) where \(\big (\lambda _s(K), \psi _s(K) \big )\) the are normalised eigenpairs of \(\mathcal {H}\big |_{\ell ;K}\), we may write the vacuum cluster expansion as in § 2.3:

$$\begin{aligned}&O^{N,\mathrm {vac}}_\ell ({\varvec{u}}) = \int O \, \mathrm {d}D_{\ell }^{N,\mathrm {vac}} \qquad \text {where} \qquad \nonumber \\&D_{\ell }^{N,\mathrm {vac}} :=\sum _{n=0}^{{2N}-1} \sum _{\genfrac{}{}{0.0pt}{}{k_1,\dots ,k_n \not = \ell }{k_1< \dots < k_n} } \sum _{K \subseteq \{k_1,\dots ,k_n\}} (-1)^{n-|K|}D_{\ell ;K}. \end{aligned}$$
(2.29)

While \(D_{\ell }^{N,\mathrm {vac}}\) is a generalised signed measure (with values in \({\mathbb {R}} \cup \{\pm \infty \}\)), all moments are finite. More specifically, if we absorb the effective potential and two centre terms into the three centre summation by writing \(\mathcal {H}_{k_1 k_2} = \sum _m \mathcal {H}_{k_1 k_2 m}\), see (4.16), we have

$$\begin{aligned} \int x^j \, \mathrm {d}D_{\ell }^{N,\mathrm {vac}}(x)&= {\sum _{ \genfrac{}{}{0.0pt}{}{\ell _1,\dots ,\ell _{j-1},m_1,\dots ,m_j}{|\{\ell ,\ell _1,\dots ,\ell _{j-1},m_1,\dots ,m_j\}| \leqslant 2N} } \mathcal {H}_{\ell \ell _1 m_1} \mathcal {H}_{\ell _1 \ell _2 m_2} \dots \mathcal {H}_{\ell _{j-1} \ell m_j}}. \end{aligned}$$
(2.30)

Equation (2.30) follows from the proof of Proposition 2.2, see (4.19). In particular, the first N moments of \(D_\ell ^{N,\mathrm {vac}}\) are exact. Therefore, we may apply the general error estimate (2.14) and describe the various features of \(D_\ell ^{N,\mathrm {vac}}\) which provide mathematical intuition for the slow convergence of the vacuum cluster expansion:

  • Spectral Pollution. When splitting the system up into arbitrary subsystems as is the case in the vacuum cluster expansion, one expects significant spectral pollution in the band gaps, leading to a reduction in the convergence rate,

  • Regularity of \(D_\ell ^{N,\mathrm {vac}}\). The approximate LDOS is a linear combination of countably many Dirac deltas and does not have bounded variation. Moreover, \(D_\ell ^{N,\mathrm {vac}}\) has values in \({\mathbb {R}} \cup \{\pm \infty \}\).

2.7 Self-consistency

Throughout this section, we suppose that the effective potential is a function of a self-consistent electron density: that is, (2.6) becomes the following nonlinear equation:

$$\begin{aligned} \rho ^\star = F^\beta \big ( {\varvec{u}}(\rho ^\star ) \big ) \end{aligned}$$
(2.31)

where \({\varvec{u}}(\rho ) :=\big (\varvec{r}, {w(\rho )}, Z\big )\). We shall assume that the effective potential satisfies the following:

(EP) We suppose that \({w} :{\mathbb {R}}^\Lambda \rightarrow {\mathbb {R}}^\Lambda \) is twice continuously differentiable with

$$\begin{aligned} \big | \nabla {w(\rho )}_{\ell k} \big | \leqslant C e^{-\gamma _v \,r_{\ell k}} \end{aligned}$$

for some \(\gamma _v > 0\).

Remark 11

(i) For a smooth function \(\widetilde{w} :{\mathbb {R}} \rightarrow {\mathbb {R}}\), the effective potential \({w(\rho )_\ell :=\widetilde{w}(\rho _\ell )}\) satisfies (EP). This leads to the simplest abstract nonlinear tight binding models discussed in [93, 99].

(ii) The (short-ranged) Yukawa potential defined by \({w(\rho )}_\ell :=\sum _{m \not = \ell } \frac{\rho _m - Z_m}{r_{\ell m}} e^{-\tau \, r_{\ell m}}\) (for some \(\tau > 0\)) also fits into this general framework. This setting already covers many important modelling scenarios and also serves as a crucial stepping stone towards charge equilibration under full Coulomb interaction, which goes beyond the scope of the present work.

The main result of this section is the following: if there exists a self-consistent solution \(\rho ^\star \) to (2.31), then we can approximate \(\rho ^\star \) with self-consistent solutions to the following approximate self-consistency equation:

$$\begin{aligned} \rho _{N} = I_{X_N}{F}^\beta \big ( {\varvec{u}}(\rho _N) \big ), \end{aligned}$$
(2.32)

for sufficiently large N. The operator \(I_{X_N} F^\beta \) is a linear body-ordered approximation of the form we analyzed in detail in § 2.4.

To do this, we require a natural stability assumption on the electronic structure problem, which was employed for example in [93, 99, 100]:

(STAB) The stability operator \({\mathscr {L}}(\rho )\) is the Jacobian of \(\rho \mapsto F^\beta \big ( {\varvec{u}}(\rho ) \big )\). We say electron densities \(\rho ^\star \) solving (2.31) are stable if \(I - {\mathscr {L}}(\rho ^\star )\) is invertible as a bounded linear operator \(\ell ^2 \rightarrow \ell ^2\).

Remark 12

(Stability) (i) The stability condition of Theorem 2.6 is a minimal starting assumption that naturally arises from the analysis [93, 99, 100].

For example, if \(\rho \) is a stable self-consistent electron density, then there exists \(\phi ^{(m)} \in \ell ^2(\Lambda )\) such that [93]:

$$\begin{aligned} \frac{\partial \rho _\ell }{\partial {\varvec{u}}_m} = \Big [ \big ( I - {\mathscr {L}}(\rho ) \big )^{-1} \phi ^{(m)} \Big ]_\ell . \end{aligned}$$

(ii) As noted in [99] (in a slightly simpler setting), the stability condition of Theorem 2.6 is automatically satisfied for multi-lattices with \(\nabla w\) positive semi-definite. In fact, in this case the stability operator is negative semi-definite.

Theorem 2.6

For \({\varvec{u}}\) satisfying Definition 1, suppose that \(\rho ^\star \) is a corresponding stable self-consistent electron density.

Then, for N sufficiently large, there exist self-consistent solutions \(\rho _{N}\) of (2.32) such that

$$\begin{aligned} \big \Vert \rho _{N} - \rho ^\star \big \Vert _{\ell ^\infty } \leqslant C e^{-\gamma _N N}, \end{aligned}$$
(2.33)

where \(\gamma _N\) are the constants from Theorem 2.3 applied to \({\varvec{u}}(\rho ^\star )\).

Corollary 2.7

Suppose that \(\rho ^\star \) and \(\rho _N\) are as in Theorem 2.6 and denote by \(O^\mathrm {sc}_\ell (\varvec{u}) :=O_\ell \big ( {\varvec{u}}(\rho ^\star ) \big )\) a self-consistent local observable as in (2.7). Then,

$$\begin{aligned} \big | O_\ell ^\mathrm {sc}({\varvec{u}}) - I_{X_N} O_\ell \big ( {\varvec{u}}(\rho _N) \big ) \big | \leqslant C e^{-\gamma _N N}, \end{aligned}$$

where \(\gamma _N\) are the constants from Theorem 2.3 applied to \({\varvec{u}}(\rho ^\star )\).

In order for this result to be of any practical use, we need to solve the non-linear equation (2.32) for the electron density via a self-consistent field (SCF) procedure. Supposing we have the electron density \(\rho ^{i}\) and corresponding state \({\varvec{u}}^{i} :={\varvec{u}}(\rho ^{i})\) after i iterations, we diagonalise the Hamiltonian \(\mathcal {H}({\varvec{u}}^{i})\) and hence evaluate the output density \(\rho ^\mathrm {out} = I_{X_N}{F}^\beta ({\varvec{u}}^{i})\). At this point, since the simple iteration \(\rho ^{i+1} = \rho ^{\mathrm {out}}\) does not converge in general, a mixing strategy, possibly combined with Anderson acceleration [19], is used in order to compute the next iterate. The analysis of such mixing schemes is a major topic in electronic structure and numerical analysis in general and so we only present a small step in this direction.

Proposition 2.8

(Stability) The approximate electron densities \(\rho _N\) arising from Theorem 2.6 are stable in the following sense: \(I - {\mathscr {L}}_N(\rho _N) :\ell ^2 \rightarrow \ell ^2\) is an invertible bounded linear operator where \({\mathscr {L}}_N\) is the Jacobian of \(\rho \mapsto I_{X_N} F^\beta \big ( {\varvec{u}}(\rho ) \big )\). Moreover, \(\big ( I - {\mathscr {L}}_N(\rho _N) \big )^{-1}\) is uniformly bounded in N in operator norm.

Theorem 2.9

For \({\varvec{u}}\) satisfying Definition 1, suppose that \(\rho _{N}\) is a corresponding approximate self-consistent electron density stable in the sense of Proposition 2.8. For fixed \(\rho ^0\), we define \(\{\rho ^i\}_{i = 0}^\infty \) via the Newton iteration

$$\begin{aligned} \rho ^{i+1} = \rho ^i - \big ( I - {{\mathscr {L}}}_N(\rho ^i) \big )^{-1} \Big ( \rho ^i - I_{X_N}{F}^\beta \big ({\varvec{u}}(\rho ^i)\big ) \Big ). \end{aligned}$$

Then, for \(\Vert \rho ^0 - \rho _{N}\Vert _{\ell ^\infty }\) sufficiently small, the Newton iteration converges quadratically to \(\rho _{N}\).

A more thorough treatment of these SCF results is beyond the scope of this work. See [12, 53, 63] for recent results in the context of Hartree-Fock and Kohn-Sham density functional theory. For a recent review of SCF in the context density functional theory, see [101].

Remark 13

It is clear from the proofs of Theorems 2.6 and 2.9 that as long as the approximate scheme \(F^{\beta ,N}\) satisfies

$$\begin{aligned} \left| F^\beta _\ell ({\varvec{u}}) - F^{\beta ,N}_\ell ({\varvec{u}})\right| \lesssim e^{-\gamma _N N} \qquad \text {and} \qquad \left| \frac{\partial F^\beta _\ell ({\varvec{u}})}{\partial v_m} - \frac{\partial F^{\beta ,N}_\ell ({\varvec{u}})}{\partial v_m} \right| \lesssim e^{-\frac{1}{2}\gamma _N N}e^{-\eta r_{\ell m}}, \end{aligned}$$

then we may approximate (2.31) with approximate self-consistent solutions \(\rho _N = F^{\beta ,N}\big ({\varvec{u}}(\rho _N)\big )\). In particular, as long as we have the estimate from Remark 10 (see Appendix C for the technical details), then we may use the nonlinear approximation scheme \({\Theta _N}\) from Theorem 2.5 in Theorems 2.6 and 2.9 . In this case, we obtain error estimates that are (asymptotically) independent of the discrete spectrum.

Remark 14

In the linear-scaling spectral Gauss quadrature (LSSGQ) method [87], a self-consistent field iteration analogous to (2.32) is proposed. In particular, with the caveats outlined in Remark 13 taken into consideration, Theorem 2.6 goes some way to rigorously justify the exponential rate of convergence observed numerically in [87, Fig. 4].

3 Conclusions and Discussion

The main result of this work is a sequence of rigorous results about body-ordered approximations of a wide class of properties extracted from tight-binding models for condensed phase systems, the primary example being the potential energy landscape. Our results demonstrate that exponentially fast convergence can be obtained, provided that the chemical environment is taken into account. In the spirit of our previous results on the locality of interaction [16, 17, 93], these provide further theoretical justification—albeit qualitative—for widely assumed properties of atomic interactions. More broadly, our analysis illustrates how to construct general low-dimensional but systematic representations of high-dimensional complex properties of atomistic systems. Our results, as well as potential generalisations, serve as a starting point towards a rigorous end-to-end theory of multi-scale and coarse-grained models, including but not limited to machine-learned potential energy landscapes.

In the following paragraphs we will make further remarks on the potential applications of our results, and on some apparent limitations of our analysis.

3.1 Representation of atomic properties

Our initial motivation for studying the body-order expansion was to explain the (unreasonable?) success of machine-learned interatomic potentials [5, 8, 80], and our remarks will focus on this topic, however in principle they apply more generally.

Briefly, given an ab initio potential energy landscape (PEL) \(E^\mathrm{QM}\) for some material one formulates a parameterised interatomic potential

$$\begin{aligned} E(\{\varvec{r}_\ell \}_\ell ) = \sum _{\ell } \varepsilon ({\varvec{\theta }}, \{ {\varvec{r}}_{\ell k} \}_{k \ne \ell } ) \end{aligned}$$

and then “learn” the parameters \({\varvec{\theta }}\) by fitting them to observations of the reference PEL \(E^\mathrm{QM}\). A great variety of such parameterisations exist, including but not limited to neural networks [8], kernel methods [5] and symmetric polynomials [2, 25, 80]. Symmetric polynomials are linear regression schemes where each basis function has a natural body-order attached to it. It is particularly striking that for very low body-orders of four to six these schemes are able to match and often outperform the more complex nonlinear regression schemes [66, 80, 106]. Our analysis in the previous sections provides a partial explanation for these results, by justifying why one may expect that a reference ab initio PEL intrinsically has a low body-order. Moreover, classical approximation theory can now be applied to the body-ordered components as they are finite-dimensional to obtain new approximation results where the curse of dimensionality is alleviated.

Our results on nonlinear representations are less directly applicable to existing MLIPs, but rather suggest new directions to explore. Still, some connections can be made. The BOP-type construction of § 2.5,

$$\begin{aligned} {\Theta _N}\big ( \mathcal {H}_{\ell \ell }, [\mathcal {H}^2]_{\ell \ell }, \dots , [\mathcal {H}^{N}]_{\ell \ell } \big ) \end{aligned}$$
(3.1)

points towards a blending of machine-learning and BOP techniques that have not been explored to the best of our knowledge. A second interesting connection is to the overlap-matrix based fingerprint descriptors (OMFPs) introduced in [105] where a global spectrum for a small subcluster is used as a descriptor, while (3.1) can be understood as taking the projected spectrum as the descriptor. Thus, Theorem 2.5 suggests (1) an interesting modification of OMFPs which comes with guaranteed completeness to describe atomic properties; and (2) a possible pathway towards proving completeness of the original OMFPs.

Finally, our self-consistent representation of § 2.7 motivates how to construct compositional models, reminiscent of artificial neural networks, but with minimal nonlinearity that is moreover physically interpretable. Although we did not pursue it in the present work, this is a particularly promising starting point to incorporate meaningful electrostatic interaction into the MLIPs framework.

3.2 Linear body-ordered approximation: the preasymptotic regime

Possibly the most significiant limitation of our analysis of the linear body-ordered approximation scheme is that the estimates deteriorate when defects cause a pollution of the point spectrum. Here, we briefly demonstrate that this appears to be an asymptotic effect, while in the pre-asymptotic regime this deterioration is not noticable.

To explore this we choose a union of intervals \(E \supseteq \sigma (\mathcal {H})\) and a polynomial \(P_N\) of degree N and note

$$\begin{aligned} \Big |\big [O(\mathcal {H}) - P_N(\mathcal {H})\big ]_{\ell \ell }\Big |&\leqslant \big \Vert O(\mathcal {H}) - P_N(\mathcal {H}) \big \Vert _{\ell ^2 \rightarrow \ell ^2} = \big \Vert O - P_N \big \Vert _{L^\infty (\sigma (\mathcal {H}))} \nonumber \\&\leqslant \big \Vert O - P_N \big \Vert _{L^\infty (E)}. \end{aligned}$$
(3.2)

We then construct interpolation sets (Fejér sets) such that the corresponding polynomial interpolant gives the optimal asymptotic approximation rates (for details of this construction, see §4.1.54.1.8). We then contrast this with a best \(L^\infty (E)\)-approximation, and with the nonlinear approximation scheme from Theorem 2.5. We will observe that the non-linearity leads to improved asymptotic but comparable pre-asymptotic approximation errors.

As a representative scenario we consider the Fermi-Dirac distribution \(F^\beta (z) = (1 + e^{\beta z})^{-1}\) with \(\beta = 100\) and both the “defect-free” case \(E_1 :=[-1,a]\cup [b,1]\) and \(E_2 :=[-1,a] \cup [c,d] \cup [b,1]\) with the parameters \(a = -0.2, b = 0.2, c = -0.06\), and \(d = -0.03\). Then, for fixed polynomial degree N and \(j \in \{1,2\}\), we construct the \((N+1)\)-point Fejér set for \(E_j\) and the corresponding polynomial interpolant \(I_{j,N}F^\beta \). Moreover, we consider a polynomial \(P^\star _{j,N}\) of degree N minimising the right hand side of (3.2) for \(E = E_j\). Then, in Figure 3, we plot the errors \(\Vert F^\beta - I_{j,N} F^\beta \Vert _{L^\infty (E_j)}\) and \(\Vert F^\beta - P_{j,N}^\star \Vert _{L^\infty (E_j)}\) for both \(j = 1\) (Fig. 3a) and \(j=2\) (Fig. 3b) against the polynomial degree N together with the theoretical asymptotic convergence rates for best \(L^\infty (E_j)\) polynomial approximation (4.15).

What we observe is that, as expected, introducing the interval [cd] into the approximation domain drastically affects the asymptotic convergence rate and the errors in the approximation based on interpolation. While the best approximation errors follow the asymptotic rate for larger polynomial degree, it appears that, pre-asymptotically, the errors are significantly reduced. We also see that the approximation errors are significantly better than the general error estimate \(\Vert F^\beta - \Pi _N F^\beta \Vert _{L^\infty } \lesssim e^{-\pi \beta ^{-1} N}\) where \(\Pi _N\) is the Chebyshev projection operator (see § 4.1.4).

Fig. 3
figure 3

Approximation errors for Chebyshev projection (green), polynomial interpolation in Fejér sets on \(E_j\) (black), best \(L^\infty (E_j)\) polynomial approximation (blue), and, for \(j=2\), errors in the nonlinear approximation scheme (red). We also plot the corresponding predicted asymptotic rates (from (4.5), (4.15), and Theorem 2.3). Here, we only plot data points for \(N \in \{1, 6, 11, 16, \dots \}\) in the linear schemes (which captures the oscillatory behaviour), and \(N \in \{1,7,13,19,\dots \}\) for the nonlinear scheme (since N must be odd)

Moreover, in Figure 3b, we plot the errors when using a nonlinear approximation scheme satisfying Theorem 2.5. In this simple experiment, we consider the Gauss quadrature rule \(\Theta _{N} :=\int I_{X_{\frac{1}{2}(N-1)}} F^\beta \mathrm {d}D_\ell \) where \(X_{\frac{1}{2}(N-1)}\) is the set of zeros of the degree \(\frac{1}{2}(N+1)\) orthogonal polynomial (see Appendix D) with respect to \( \mathrm {d}D_\ell (x) :=\big ( \chi _{E_1}(x) + \sum _{j} \delta (x - \lambda _j) \big ) \mathrm {d}x \) where \(\{\lambda _j\} = \{c,\tfrac{1}{2}(c+d),d\} \subset [c,d]\). While \(D_\ell \) does not correspond to a physically relevant Hamiltonian, the same procedure may be carried out for any measure supported on \(E_1\) with \(\mathrm {supp}\, D_\ell \cap [c,d]\) finite. Then plotting the errors \( | F^\beta _\ell - \Theta _N|\), we observe improved asymptotic convergence rates that agree with that of the “defect-free” case from Figure 3a. However, the improvement is only observed in the asymptotic regime which corresponds to body-orders never reached in practice.

4 Proofs

4.1 Preliminaries

Here, we introduce the concepts needed in the proofs of the main results.

4.1.1 Hermite integral formula

For a finite interpolation set \(X\subset {\mathbb {C}}\), we let \(\ell _X(z) :=\prod _{x\in X}(z - x)\) be the correpsonding node polynomial.

For fixed \(z\in {\mathbb {C}} \setminus X\), we suppose that O is analytic on an open neighbourhood of \(X\cup \{z\}\). Then, for a simple closed positively oriented contour (or system of contours) \({\mathscr {C}}\) contained in the region of analyticity of O, encircling X, and avoiding \(\{z\}\), we have

$$\begin{aligned} I_X O(z) = \frac{1}{2\pi i} \oint _{{\mathscr {C}}} \frac{\ell _X(\xi ) - \ell _X(z)}{\ell _X(\xi )} \frac{O(\xi )}{\xi - z} \mathrm {d}\xi . \end{aligned}$$
(4.1)

If, in addition, \({\mathscr {C}}\) encircles \(\{z\}\), then

$$\begin{aligned} O(z) - I_X O(z) = \frac{1}{2\pi i} \oint _{{\mathscr {C}}} \frac{\ell _X(z)}{\ell _X(\xi )} \frac{O(\xi )}{\xi - z} \mathrm {d}\xi . \end{aligned}$$
(4.2)

The proof of these facts is a simple application of Cauchy’s integral formula, [3, 95].

4.1.2 Resolvent calculus

Given a configuration \({\varvec{u}}\), we consider the Hamiltonian \(\mathcal {H}= \mathcal {H}({\varvec{u}})\) and functions O analytic in some neighbourhood of the spectrum \(\sigma (\mathcal {H})\). We define \(O(\mathcal {H})\) via the holomorphic functional calculus [1]:

$$\begin{aligned} O(\mathcal {H}) :=-\frac{1}{2\pi i}\oint _{{\mathscr {C}}} O(z) (\mathcal {H}- z)^{-1}\mathrm {d}z \end{aligned}$$
(4.3)

where \({\mathscr {C}}\) is a simple closed positively oriented contour (or system of contours) contained in the region of analyticity of O and encircling the spectrum \(\sigma (\mathcal {H})\).

The following Combes–Thomas resolvent estimate [21] will play a key role in the analysis:

Lemma 1

(Combes-Thomas) Suppose that \({\varvec{u}}\) satisfies Definition 1 and \(z \in {\mathbb {C}}\) is contained in a bounded set with \(\mathrm {dist}\left( z, \sigma \big (\mathcal {H}({\varvec{u}})\big )\right) \!>\!0\) and \(\mathfrak {d}\!:=\! \mathrm {dist}\left( z, \sigma \big (\mathcal {H}({\varvec{u}}^\mathrm {ref})\big )\right) > \delta \).

Then, there exists a constant \(C>0\) such that

$$\begin{aligned}&\left| \left[ (\mathcal {H}({\varvec{u}}) - z)^{-1}\right] _{\ell k} \right| \leqslant C_{\ell k} e^{-\gamma _\mathrm {CT} r_{\ell k}}, \qquad \text {where}\\&\quad C_{\ell k} :=2 \mathfrak {d}^{-1} + C e^{-\gamma _\mathrm {CT} (|\varvec{r}_\ell | + |\varvec{r}_k| - |\varvec{r}_{\ell k}| )} \end{aligned}$$

and \(\gamma _\mathrm {CT} :=c \min \{1, \mathfrak {d}\}\) and \(c>0\) depends on \(h_0, \gamma _0, d\) and \(\min _{\ell \not =k} r_{\ell k}\).

Proof

A proof with \(\gamma _{\mathrm {CT}}\) depending instead on \(\mathrm {dist}\left( z, \sigma \big (\mathcal {H}({\varvec{u}})\big )\right) \) can be found in [16]. A low-rank update formula leads to the improved “defect-independent” result [17] where the exponent only depends on the distance between z and the reference spectrum. See [93] for an explicit description of \(\gamma _\mathrm {CT}\) in terms of the constants \(\gamma _0, d\) and the non-interpenetration constant \(\min _{\ell \not =k} r_{\ell k}\). \(\square \)

A key observation for arguments involving forces (or more generally, derivatives of the analytic quantities of interest) is that the Combes-Thomas estimate allows us to bound derivatives of the resolvent operator:

Lemma 2

Suppose that \(z \in {\mathbb {C}}\) with \(\mathfrak {d}:=\mathrm {dist}\left( z, \sigma \big (\mathcal {H}({\varvec{u}})\big )\right) > 0\). Then,

$$\begin{aligned} \left| \frac{\partial \left[ (\mathcal {H}({\varvec{u}}) - z)^{-1}\right] _{\ell k}}{\partial {\varvec{u}}_m} \right| \leqslant 4 h_0 \mathfrak {d}^{-2} e^{-\frac{1}{2}\min \{\gamma _0, \gamma _{\mathrm {CT}}\} (r_{\ell m} + r_{m k})} \end{aligned}$$

where \(\gamma _\mathrm {CT}\) is the Combes-Thomas constant from Lemma 1 and \(\gamma _0\) is the constant from (TB).

Proof

This result can be found in the previous works [14, 16, 17], but we give a brief sketch for completeness.

Derivatives of the resolvent have the following form:

$$\begin{aligned} \frac{\partial (\mathcal {H}({\varvec{u}}) - z)^{-1}}{\partial {\varvec{u}}_m} = - (\mathcal {H}({\varvec{u}}) - z)^{-1} \frac{\partial \mathcal {H}({\varvec{u}})}{\partial {\varvec{u}}_m} (\mathcal {H}(\varvec{r},v) - z)^{-1}. \end{aligned}$$
(4.4)

The result follows by applying the Combes-Thomas resolvent estimates together with the fact that the Hamiltonian is short-ranged (TB).

Assuming that the Hamiltonian has higher derivatives that are also short-ranged, higher order derivatives of the resolvent can be treated similarly [16]. \(\square \)

4.1.3 Local observables

Firstly, we note that \(F^\beta (\,\cdot \,)\) is analytic away from the simple poles at \(\pi \beta ^{-1} (2\mathbb {Z} + 1)\). Moreover, \(G^\beta (\,\cdot \,)\) can be analytically continued onto the open set \({\mathbb {C}} \setminus \left\{ \mu + i r :r\in {\mathbb {R}}, |r| \geqslant \pi \beta ^{-1} \right\} \) [17]. Therefore, we may consider (4.3) with \(O = F^\beta \) or \(G^\beta \) and a contour \({\mathscr {C}}_\beta \) encircling \(\sigma \left( \mathcal {H}\right) \) and avoiding \({\mathbb {C}} \setminus \left\{ \mu + i r :r\in {\mathbb {R}}, |r| \geqslant \pi \beta ^{-1} \right\} \). Therefore, we may choose \({\mathscr {C}}_\beta \) so that the constant \(\mathfrak {d}\), from Lemma 1, is proportional to \(\beta ^{-1}\). Moreover, if there is a spectral gap, the constant \(\mathfrak {d}\) is uniformly bounded below by a positive constant multiple of \(\mathsf {g}\) as \(\beta \rightarrow \infty \).

In the case of insulators at zero Fermi-temperature, we take \({\mathscr {C}}_\infty \) encircling \(\sigma \left( \mathcal {H}({\varvec{u}})\right) \cap (-\infty , \mu )\) and avoiding the rest of the spectrum. Therefore, we may choose \({\mathscr {C}}_\infty \) so that the constant \(\mathfrak {d}\), from Lemma 1, is proportional to \(\mathsf {g}\).

Following [16, Lemma 4], we can conclude that \(\sigma (\mathcal {H}) \subset [\underline{\sigma },\overline{\sigma }]\) for some \(\underline{\sigma }, \overline{\sigma }\) depending on \(h_0, \gamma _0, v, d\) and \(\min _{\ell \not =k} r_{\ell k}\). This means that, the contours \({\mathscr {C}}_\beta \) can be chosen to have finite length and, when applying Lemma 1, we have \(\gamma _\mathrm {CT} = c \min \{1, \max \{ \beta ^{-1}, \mathsf {g} \}\}\).

Moreover, for all \(0< \mathsf {b} < \pi \) and bounded sets \(A_\beta \subset A \subset {\mathbb {C}}\) such that

$$\begin{aligned} \mathrm {dist}(z, \{ \mu + i r :r\in {\mathbb {R}}, |r| \geqslant \pi \beta ^{-1}\}) \geqslant \mathsf {b}\beta ^{-1} \qquad \text {for all } z\in A_\beta , \end{aligned}$$

both \(F^\beta (\,\cdot \,)\) and \(G^\beta (\,\cdot \,)\) are uniformly bounded on \(A_\beta \) independently of \(\beta \) [17, Lemma 5.2].

4.1.4 Chebyshev Projection and Interpolation in Chebyshev Points

We denote by \(\{T_n\}\) the Chebyshev polynomials (of the first kind) satisfying \(T_n(\cos \theta ) = \cos n\theta \) on \([-1,1]\) and, equivalently, the recurrence \(T_{0} = 1, T_1 = x\), and \(T_{n+1}(x) = 2x T_n(x) - T_{n-1}(x)\).

For O Lipshitz continuous on \([-1,1]\), there exists an absolutely convergent Chebyshev series expansion: there exists \(c_n\) such that \(O(z) = \sum _{n=0}^\infty c_n T_n(z)\). For maximal polynomial degree N, the corresponding projection operator is denoted \(\Pi _N O(z) :=\sum _{n=0}^N c_n T_n(z)\). This approach is a special case of the Kernel Polynomial Method (KPM) which we briefly review in Appendix F.

On the other hand, supposing that the interpolation set is given by the Chebyshev points \(X = \{ \cos \frac{j\pi }{N} \}_{0 \leqslant j \leqslant N}\), we may expand the polynomial interpolant \(I_N O :=I_{X} O\) in terms of the Chebyshev polynomials: there exists \(c_n^\prime \) such that \(I_N O(z) = \sum _{n=0}^N c_n^\prime T_n(z)\).

For functions O that can be analytically continued the Bernstein ellipse \(E_\rho :=\{ \frac{1}{2}( z + z^{-1} ) :|z| = \rho \}\) for \(\rho > 1\), the corresponding coefficients \(\{c_n\}\), \(\{c^\prime _n\}\) decay exponentially with rate \(\rho \). This leads to the following error estimates

$$\begin{aligned} \Vert O - \Pi _N O \Vert _{L^\infty ([-1,1])} + \Vert O - I_N O \Vert _{L^\infty ([-1,1])} \leqslant 6 \Vert O\Vert _{L^\infty (E_\rho )} \frac{\rho ^{-N}}{\rho - 1}. \end{aligned}$$
(4.5)

For \(O^\beta = F^\beta \) or \(G^\beta \), these estimates give an exponential rate of convergence with exponent depending on \(\sim \beta ^{-1}\). Indeed, after scaling \(\mathcal {H}\) so that the spectrum is contained in \([-1,1]\), we obtain

$$\begin{aligned} \Big | O_\ell ^\beta ({\varvec{u}}) - \Pi _N O_\ell ^\beta ({\varvec{u}}) \Big |&\leqslant \Big \Vert O^\beta (\mathcal {H}) - \Pi _N O^\beta (\mathcal {H}) \Big \Vert _{\ell ^2 \rightarrow \ell ^2}\nonumber \\&\leqslant \Vert O^\beta - \Pi _N O^\beta \Vert _{L^\infty ([-1,1])}, \end{aligned}$$
(4.6)

and we conclude by directly applying (4.5). The same estimate also holds for \(I_N\) (or any polynomial).

For full details of all the statements made in this subsection, see [95].

4.1.5 Classical logarithmic potential theory

In this section, we give a very brief introduction to classical potential theory in order to lay out the key notation. For a more thorough treatment, see [75] or [37, 62, 76, 95].

It can be seen from the Hermite integral formula (4.2) that the approximation error for polynomial interpolation may be determined by taking the ratio of the size of the node polynomial \(\ell _X\) at the approximation points to the size of \(\ell _X\) along an appropriately chosen contour. Logarithmic potential theory provides an elegant mechanism for choosing the interpolation points so that the asymptotic behaviour of \(\ell _X\) can be described.

We suppose that \(E \subset {\mathbb {C}}\) is a compact set. We will see that choosing the interpolation nodes as to maximise the geometric mean of pairwise distances provides a particularly good approximation scheme:

$$\begin{aligned} \delta _n(E) :=\max _{z_1, \dots , z_n \in E} \Big ( \prod _{1 \leqslant i < j \leqslant n} |z_i - z_j| \Big )^{\frac{2}{n(n-1)}}. \end{aligned}$$
(4.7)

Any set \({\mathcal {F}}_n \subset E\) attaining this maximum is known as a Fekete set. It can be shown that the quantities \(\delta _n(E)\) form a decreasing sequence and thus converges to what is known as the transfinite diameter: \(\tau (E) :=\lim \limits _{n\rightarrow \infty } \delta _n(E)\).

We let \(\ell _n(z)\) denote the node polynomial corresponding to a Fekete set and note that

$$\begin{aligned} |\ell _n(z)| \delta _n(E)^{\frac{n(n-1)}{2}} = \max _{z_0,\dots ,z_{n} \in E :z_0 = z} \prod _{0\leqslant i < j \leqslant n} |z_i - z_j| \leqslant \delta _{n+1}(E)^{\frac{n(n+1)}{2}}. \end{aligned}$$
(4.8)

Therefore, rearranging (4.8), we obtain \(\lim _{n\rightarrow \infty } \Vert \ell _n\Vert _{L^\infty (E)}^{1/n} \leqslant \tau (E)\). In fact, this inequality can be replaced with equality, showing that Fekete sets allow us to describe the asymptotic behaviour of the node polynomials on the domain of approximation.

To extend these results, it is useful to recast the maximisation problem (4.7) into the following minimisation problem, describing the minimal logarithmic energy attained by n particles lying in E with the repelling force \(1/|z_i - z_j|\) between particles i and j lying at positions \(z_i\) and \(z_j\), respectively:

$$\begin{aligned} {\mathcal {E}}_n(E) :=\min _{z_1,\dots ,z_n \in E} \sum _{1 \leqslant i < j \leqslant n} \log \frac{1}{|z_i - z_j|} = \frac{n(n-1)}{2} \log \frac{1}{\delta _n(E)}. \end{aligned}$$
(4.9)

Fekete sets can therefore be seen as minimal energy configurations and described by the normalised counting measure \(\nu _n :=\frac{1}{n} \sum _{j = 1}^n \delta _{z_j}\) where \({\mathcal {F}}_n = \{z_j\}_{j = 1}^n\).

The minimisation problem (4.9) may be extended for general unit Borel measures \(\mu \) supported on E by defining the logarithmic potential and corresponding total energy by

$$\begin{aligned} U^\mu (z) :=\int \log \frac{1}{|z - \xi |} \mathrm {d}\mu (\xi ) \qquad \text {and} \qquad I(\mu ):=\iint \log \frac{1}{|z - \xi |} \mathrm {d}\mu (\xi )\mathrm {d}\mu (z). \end{aligned}$$

The infimum of the energy over the space of unit Borel measures supported on E, known as the Robin constant for E, will be denoted \(-\infty < V_E \leqslant +\infty \). The capacity of E is defined as \(\mathrm {cap}(E):=e^{-V_E}\) and is equal to the transfinite diameter [34]. Using a compactness argument, it can be shown that there exists an equilibrium measure \(\omega _E\) with \(I(\omega _E) = V_E\) and, in the case \(V_E <\infty \), by the strict convexity of the integral, \(\omega _E\) is unique [77]. Moreover, if \(V_E < \infty \) (equivalently, if \(\mathrm {cap}(E) > 0\)), then \(U^{\omega _E}(z) \leqslant V_E\) for all \(z \in {\mathbb {C}}\), with equality holding on E except on a set of capacity zero (we say this property holds quasi-everywhere).

Moreover, if \(\mathrm {cap}\,E > 0\), then it can be shown that the normalised counting measures, \(\nu _n\), corresponding to a sequence of Fekete sets weak-\(\star \) converges to \(\omega _E\). Since \(U^{\nu _n}(z) = \frac{1}{n}\log \frac{1}{|\ell _n(z)|}\), the weak-\(\star \) convergence allows one to conclude that

$$\begin{aligned}&\lim _{n\rightarrow \infty } \Vert \ell _n\Vert _{L^\infty (E)}^{1/n} = \mathrm {cap}(E), \qquad \text {and} \qquad \nonumber \\&\lim _{n\rightarrow \infty } |\ell _n(z)|^{1/n} = e^{-U^{\omega _E}(z)} =:\mathrm {cap}(E) e^{g_E(z)} \end{aligned}$$
(4.10)

uniformly on compact subsets of \({\mathbb {C}} \setminus E\). Here, we have defined the Green’s function \(g_E(z) :=V_E - U^{\omega _E}(z)\), which describes the asymptotic behaviour of the node polynomials corresponding to Fekete sets. We therefore wish to understand the Green’s function \(g_E\).

4.1.6 Construction of the Green’s function

Now we restrict our attention to the particular case where \(E \subset {\mathbb {R}}\) is a union of finitely many compact intervals of non-zero length.

It can be shown that the Green’s function \(g_E\) satisfies the following Dirichlet problem on \({\mathbb {C}} \setminus E\) [75]:

$$\begin{aligned}&\Delta g_E(z) = 0 \quad \text { on } {\mathbb {C}} \setminus E, \end{aligned}$$
(4.11a)
$$\begin{aligned}&g_E(z) \sim \log |z| \quad \text { as } |z| \rightarrow \infty , \end{aligned}$$
(4.11b)
$$\begin{aligned}&g_E(z) = 0 \quad \text { on } E. \end{aligned}$$
(4.11c)

In fact, it can be shown that (4.11) admits a unique solution [75] and thus (4.11) is an alternative definition of the Green’s function. Using this characterisation, it is possible to explicitly construct the Green’s function \(g_E\) as follows. In the upper half plane, \(g_E(z) = \mathrm {Re}(G_E(z))\) where \(G_E:\{ z \in {\mathbb {C}} :\mathrm {Im}(z) \geqslant 0 \} \rightarrow \{ z\in {\mathbb {C}} :\mathrm {Re}(z) \geqslant 0, \mathrm {Im}(z) \in [0,\pi ]\}\) is a conformal mapping on \(\{z :\mathrm {Im}(z) > 0\}\) such that \(G_E(E) = i[0,\pi ]\), \(G_E( \min E ) = i \pi \), and \(G_E( \max E ) = 0\). Using the symmetry of E with respect to the real axis, we may extend \(\mathrm {Re}(G_E(z))\) to the whole complex plane via the Schwarz reflection principle. Then, one can easily verify that this analytic continuation satisfies (4.11). Since the image of \(G_E\) is a (generalised) polygon, \(z \mapsto G_E(z)\) is an example of a Schwarz–Christoffel mapping [29]. See Figure 4 for the case \(E = [-1,-\varepsilon ]\cup [\varepsilon ,1]\).

Fig. 4
figure 4

The Schwarz–Christoffel mapping \(G_E\) with \(E = [z_1,z_2] \cup [z_4, z_5]\) which maps the upper half plane (left) onto the infinite slit strip \(\{\omega \in {\mathbb {C}} :\mathrm {Re}\, \omega > 0, \, \mathrm {Im}\, \omega \in (0,\pi )\}\) (right), is continuous on \(\{ z \in {\mathbb {C}} :\mathrm {Re} z \geqslant 0\}\) and maps the intervals \([z_1,z_2]\), \([z_4,z_5]\) to \([\omega _1,\omega _2], [\omega _4,\omega _5] \subset i [0,\pi ]\), respectively. We also plot the image of an \(10 \times 10\) equi-spaced grid. A parameter problem is solved in order to obtain \(z_3\) and thus \(\omega _3\) and \(\omega _2 = \omega _4\) whereas the other constants are fixed. Here, we take \(z_1 = -1, z_2 = -\varepsilon , z_4 = \varepsilon , z_5 = 1, \omega _1 = i\pi , \omega _5 = 0\) with \(\varepsilon = 0.3\)

We shall briefly discuss the construction of the Schwarz–Christoffel mapping \(G_E\) for \(E = [-1,\varepsilon _-]\cup [\varepsilon _+,1]\). We define the pre-vertices \(z_1 = -1, z_2 = \varepsilon _-, z_4 = \varepsilon _+, z_5 = 1\) and wish to construct a conformal map \(G_E\) with \(G_E(z_k) = \omega _k\) as in Figure 4. For simplicity, we also define \(z_0 :=-\infty \) and \(z_6 :=\infty \) and observe that because the image is a polygon, \(\mathrm {arg} \, G_E^\prime (z)\) must be constant on each interval \((z_{k-1}, z_k)\) and

$$\begin{aligned} \mathrm {arg} \, G_E^\prime (z_k^+) - \mathrm {arg} \, G_E^\prime (z_k^-) = (1 - \alpha _k) \pi , \end{aligned}$$
(4.12)

where \(z_k^- \in (z_{k-1}, z_k)\), \(z_k^+ \in (z_{k}, z_{k+1})\), and \(\alpha _k \pi \) is the interior angle of the infinite slit strip at vertex \(\omega _k\) (that is, \(\alpha _1 = \alpha _2 = \alpha _4 = \alpha _5 = \frac{1}{2}\) and \(\alpha _3 = 2\)). After defining \(z^\alpha :=|z|^\alpha e^{i \alpha \,\mathrm {arg}\,z}\) where \(\mathrm {arg}\,z \in (-\pi ,\pi ]\), we can see that for \(z \in (z_{k-1}, z_k)\), we have \(\mathrm {arg} \prod _{j = k}^5 (z - z_j)^{\alpha _j - 1} = \sum _{j = k}^{5} (\alpha _j - 1)\pi \) and so the jump in the argument of \(z \mapsto \prod _{j = 1}^5 (z - z_j)^{\alpha _j - 1}\) is \((1 - \alpha _k)\pi \) at \(z_k\) as in (4.12). Therefore, integrating this expression, we obtain

$$\begin{aligned} G_E(z) = A + B \int ^z_1 \frac{\zeta - z_3}{\sqrt{\zeta + 1}\sqrt{\zeta - \varepsilon _-}\sqrt{\zeta - \varepsilon _+}\sqrt{\zeta - 1}} \mathrm {d}\zeta . \end{aligned}$$
(4.13)

Since \(G_E(1) = A\), we take \(A = 0\) (to ensure (4.11c) holds). Moreover, since the real part of the integral is \(\sim \log |z|\) as \(|z| \rightarrow \infty \), we apply (4.11b) to conclude \(B = 1\). Finally, we can choose \(z_3\) such that \(\mathrm {Re}\,G_E(z) = 0\) for all \(z \in E\); that is,

$$\begin{aligned} z_3 \in (\varepsilon _-, \varepsilon _+) \, :\qquad \int _{\varepsilon _-}^{\varepsilon _+} \frac{\zeta - z_3}{\sqrt{\zeta + 1}\sqrt{\zeta - \varepsilon _-}\sqrt{\varepsilon _+-\zeta }\sqrt{1 - \zeta }} \mathrm {d}\zeta = 0. \end{aligned}$$
(4.14)

For more details, see [37]. We use the Schwarz–Christoffel toolbox [29] in matlab to evaluate (4.13) and plot Figure 5.

For the simple case \(E :=[-1,1]\), by the same analysis, we can disregard \(z_2,z_3,z_4\) and \(\omega _2,\omega _3,\omega _4\) and integrate the corresponding expression to obtain the closed form \(G_{[-1,1]}(z) = \log ( z + \sqrt{z - 1}\sqrt{z + 1})\).

A similar analysis allows one to construct conformal maps from the upper half plane to the interior of any polygon. For further details, rigorous proofs and numerical considerations, see [31].

Fig. 5
figure 5

Equi-potential curves \({\mathscr {C}}_{r_k} :=\{z\in {\mathbb {C}} :e^{g_E(z)} = r_k \}\) for both metals (a) and insulators (b) where \(\frac{1}{2}(r_k - r_k^{-1}) = \frac{k \pi }{\beta }\) for \(k \in \{ 1,2,3,4,5 \}\) and \(\beta = 10\). In the case of metals (a), the equi-potential curves agree with Bernstein ellipses. We also plot the poles of \(F^\beta (\,\cdot \,)\) which determine the maximal admissible integration contours: for (a), we can take contours \({\mathscr {C}}_r\) for all \(r < r_1\) and, for (b), the contour \({\mathscr {C}}_{r_2}\) can be used for all positive Fermi-temperatures (we have chosen the gap carefully so that \({\mathscr {C}}_{r_2}\) self-intersects at \(\mu \)). Shown in black crosses are 30 Fejér points in each case. To create these plots we consider an integral formula for the Green’s function \(z\mapsto g_{E}(z)\) [37] and use the Schwarz–Christoffel matlab toolbox [28, 29] to approximate these integrals

4.1.7 Interpolation nodes

The only difficulty in obtaining (4.10) in practice is the fact that Fekete sets are difficult to compute. An alternative, based on the Schwarz–Christoffel mapping \(G_E\), are Fejér points. For equally spaced points \(\{\zeta _j\}_{j=1}^{n}\) on the interval \(i [0,\pi ]\), the \(n^\text {th}\) Fejér set is defined by \(\{ G_E^{-1}(\zeta _j)\}_{j=1}^n\). Fejér sets are also asymptotically optimal in the sense that (4.10) is satisfied where \(\ell _n\) is now the node polynomial corresponding to n-point Fejér set.

Another approach is to use Leja points which are generated by the following algorithm: for fixed \(z_1,\dots ,z_n\), the next interpolation node \(z_{n+1}\) is constructed by maximising \(\prod _{j = 1}^n |z_j - z|\) over all \(z \in E\). Sets of this form are also asymptotically optimal [90] for any choice of \(z_1 \in E\). Since we have fixed the previous nodes \(z_1,\dots ,z_{n}\), the maximisation problem for constructing \(z_{n+1}\) is much simpler than that of (4.7).

More generally, if the normalised counting measure corresponding to a sequence of sets \(\{z_j\}_{j=1}^n \subset E\) weak-\(\star \) converges to the equilibrium measure \(\omega _E\), then the corresponding node polynomials satisfy (4.10).

For the simple case where \(E=[-1,1]\), many systems of zeros or maxima of sequences of orthogonal polynomials are asymptotically optimal in the sense of (4.10). In fact, since the equilibrium measure for \([-1,1]\) is the arcsine measure [76]

$$\begin{aligned} \mathrm {d}\mu _{[-1,1]}(x) = \frac{1}{\pi } \frac{1}{\sqrt{1-x^2}} \mathrm {d}x, \end{aligned}$$

any sequence of sets with this limiting distribution is asymptotically optimal. An example of particular interest are the Chebyshev points \(\{ \cos \frac{j\pi }{n} \}_{0 \leqslant j \leqslant n}\) given by the \(n+1\) extreme points of the Chebyshev polynomials defined by \(T_n( \cos \theta ) = \cos n \theta \).

4.1.8 Asymptotically optimal polynomial approximations

Suppose that E is the union of finitely many compact intervals of non-zero length and \(O:E \rightarrow {\mathbb {C}}\) extents to an analytic function in an open neighbourhood of E. On defining \({\mathscr {C}}_\gamma :=\{ z \in {\mathbb {C}} :g_E(z) = \gamma \}\), we denote by \(\gamma ^\star \) the maximal constant for which O is analytic on the interior of \({\mathscr {C}}_{\gamma ^\star }\). We let \(P_N^\star \) be the best \(L^\infty (E)\)-approximation to O in the space of polynomials of degree at most N and suppose that \(I_N\) is a polynomial interpolation operator in \(N+1\) points satisfying (4.11). Then, the Green’s function \(g_E\) determines the asymptotic rate of approximation for not only polynomial interpolation, but also for best approximation:

$$\begin{aligned} \lim _{N\rightarrow \infty } \Vert O - P_N^\star \Vert _{L^\infty (E)}^{1/N} = \lim _{N\rightarrow \infty } \Vert O - I_NO \Vert _{L^\infty (E)}^{1/N} = e^{-\gamma ^\star }. \end{aligned}$$
(4.15)

For a proof that the asymptotic rate of best approximation is given by the Green’s function see [76]. The result for polynomial interpolation uses the Hermite integral formula and (4.10), see (4.20) and (4.22), below.

4.2 Linear body-order approximation

In this section, we use the classical logarithmic potential theory from § 4.1.5 to prove the approximation error bounds for interpolation. However, we first show that polynomial approximations lead to body-order approximations:

Proof of Proposition 2.2

We first simplify the notation by absorbing the effective potential and two-centre terms into the three-centre summation:

$$\begin{aligned}&\mathcal {H}({\varvec{u}})_{k_1 k_2} = \sum _m \mathcal {H}_{k_1 k_2 m}, \quad \text {where} \quad \nonumber \\&\mathcal {H}_{k_1k_2 m} :={\left\{ \begin{array}{ll} \frac{1}{2} h( {\varvec{u}}_{k_1k_2} ) + \delta _{k_1k_2} v_{k_1} \mathrm {Id}_{N_\mathrm {b}}, &{}\text {if } m \in \{k_1,k_2\}, \\ t({\varvec{u}}_{k_1 m}, {\varvec{u}}_{k_2 m}), &{}\text {if } m \not \in \{k_1,k_2\}. \end{array}\right. } \end{aligned}$$
(4.16)

Now, supposing that \(I_XO(z) = \sum _{j = 0}^{|X|-1} c_j z^j\), we obtain

$$\begin{aligned} I_XO_\ell ({\varvec{u}})&= \mathrm {tr} \sum _{j = 0}^{|X|-1} c_{j} \sum _{\ell _1,\dots ,\ell _{j-1}} \mathcal {H}_{\ell \ell _1} \mathcal {H}_{\ell _1 \ell _2} \dots \mathcal {H}_{\ell _{j-1} \ell } \nonumber \\&= \mathrm {tr} \sum _{j = 0}^{|X|-1} c_{j} \sum _{ \genfrac{}{}{0.0pt}{}{\ell _1,\dots ,\ell _{j-1}}{m_1,\dots ,m_{j}} } \mathcal {H}_{\ell \ell _1 m_1} \mathcal {H}_{\ell _1 \ell _2 m_2} \dots \mathcal {H}_{\ell _{j-1} \ell m_j}. \end{aligned}$$
(4.17)

there the first two terms in the outer summation are \(c_0\) and \(c_1 \mathcal {H}_{\ell \ell }\). Now, for a fixed body-order \((n+1)\), and \(k_1< \dots < k_{n}\) with \(k_l \not =\ell \), we construct \(V_{nN}({\varvec{u}}_\ell ; {\varvec{u}}_{\ell k_1}, \dots , {\varvec{u}}_{\ell k_n})\) by collecting all terms in (4.17) with \(0 \leqslant j \leqslant |X|-1\) and \(\{ \ell , \ell _1, \dots , \ell _{j-1}, m_1, \dots , m_j\} = \{\ell , k_1, \dots , k_n\}\). In particular, the maximal body-order in this expression is \(2(|X|-1)\) for three-centre models and \(|X|-1\) in the two-centre case.

More explicitly, using the notation (2.26), we have that

$$\begin{aligned}&V_{nN}({\varvec{u}}_\ell ; {\varvec{u}}_{\ell k_1}, \dots , {\varvec{u}}_{\ell k_n})\nonumber \\&\quad = \mathrm {tr} \sum _{j = 0}^{|X|-1} c_{j} \sum _{ \genfrac{}{}{0.0pt}{}{\ell _1,\dots ,\ell _{j-1},m_1,\dots ,m_j}{\{\ell ,\ell _1,\dots ,\ell _{j-1},m_1,\dots ,m_j\} = \{\ell ,k_1,\dots ,k_n\}} } \mathcal {H}_{\ell \ell _1 m_1} \mathcal {H}_{\ell _1 \ell _2 m_2} \dots \mathcal {H}_{\ell _{j-1} \ell m_j} \\&\quad = \mathrm {tr} \sum _{K \subseteq \{k_1,\dots ,k_n\}} (-1)^{n-|K|} I_X O\big (\mathcal {H}\big |_{\ell ;K}\big )_{\ell \ell }. \end{aligned}$$
(4.18)

Here, we have applied an inclusion-exclusion principle to ensure that we are not only summing over sites in \(\{k_1,\dots ,k_n\}\) but we select at least one of each site in this set. Indeed, if we choose \(\ell _1, \dots , \ell _{j-1}, m_1, \dots , m_j\) such that \(\{\ell ,\ell _1, \dots , \ell _{j-1}, m_1, \dots , m_j\} = \{\ell \}\cup K_0\), then the expression \(\mathcal {H}_{\ell \ell _1 m_1} \mathcal {H}_{\ell _1 \ell _2 m_2} \cdots \mathcal {H}_{\ell _{j-1} \ell m_j}\) appears in each term of (4.19) with \(K \supseteq K_0\) exactly once (with a ± sign). Therefore, the number of times \(\mathcal {H}_{\ell \ell _1 m_1} \mathcal {H}_{\ell _1 \ell _2 m_2} \dots \mathcal {H}_{\ell _{j-1} \ell m_j}\) appears is exactly

$$\begin{aligned} \sum _{l = 0}^{ n - |K_0| } (-1)^{n-|K_0|-l} \genfrac(){0.0pt}0{ n - |K_0| }{l} = {\left\{ \begin{array}{ll} 1 &{} \text {if } |K_0| = n,\\ 0 &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

That is, (4.19) only contains the terms in the summation (4.18). \(\square \)

Proof of Theorem 2.3

We let \(\ell _N(x) :=\prod _{j} (x - x_j^N)\) be the node polynomial for \(X_N :=\{x_j^N\}_{j=0}^N\). Again, we fix the configuration \({\varvec{u}}\) and consider \(\mathcal {H}:=\mathcal {H}({\varvec{u}})\).

Supposing that \({\mathscr {C}}\) is a simple closed positively oriented contour encircling \(\sigma (\mathcal {H})\), we apply the Hermite integral formula (4.2) to obtain that

$$\begin{aligned} \big | O^\beta _\ell ({\varvec{u}}) - I_{X_N} O^\beta _\ell ({\varvec{u}}) \big |&\leqslant \Vert O^\beta ( \mathcal {H}) - I_{X_N} O^\beta ( \mathcal {H}) \Vert _{\ell ^2 \rightarrow \ell ^2} = \sup _{z\in \sigma (\mathcal {H})} \big | O^\beta (z) - I_{X_N} O^\beta ( z ) \big | \nonumber \\&\leqslant \sup _{z\in \sigma (\mathcal {H})} \left| \frac{1}{2\pi i} \oint _{{\mathscr {C}}} \frac{\ell _N(z)}{\ell _N(\xi )} \frac{O^\beta (\xi )}{\xi - z} \mathrm {d}\xi \right| \leqslant C \sup _{z \in \sigma (\mathcal {H}), \, \xi \in {\mathscr {C}}} \left| \frac{\ell _N(z)}{\ell _N(\xi )} \right| , \end{aligned}$$
(4.19)

where

$$\begin{aligned} C :=\frac{\mathrm {len}({\mathscr {C}})}{2\pi } \frac{\max _{\xi \in {\mathscr {C}}} |O^\beta (\xi )|}{ \mathrm {dist}\big ({\mathscr {C}}, \sigma (\mathcal {H})\big ) }. \end{aligned}$$
(4.20)

At this point we apply standard results of classical logarithmic potential theory (see, § 4.1.5 or [62]) and conclude by noting that if the interpolation points are asymptotically distributed according to the equilibrium distribution corresponding to \(E:=I_- \cup I_+\), then after applying (4.10), we have that

$$\begin{aligned} \lim _{N\rightarrow \infty }\left| \frac{\ell _N(z)}{\ell _N(\xi )} \right| ^{\frac{1}{N}} = e^{g_E(z) - g_E(\xi )}. \end{aligned}$$
(4.21)

Here, the equilibrium distribution and the Green’s function \(g_E(z)\) are concepts introduced in § 4.1.5 and § 4.1.6.

Therefore, by choosing the contour \({\mathscr {C}} :=\{ \xi \in {\mathbb {C}} :g_E(\xi ) = \gamma \}\) for \(0< \gamma < g_E(\mu + i\pi \beta ^{-1})\), the asymptotic exponents in the approximation error is \(\gamma \).

The maximal asymptotic convergence rate is given by \(g_E(\mu + i\pi \beta ^{-1})\) since \({\mathscr {C}}\) must be contained in the region of analyticity of \(O^\beta \) and the first singularity of \(O^\beta \) is at \(\mu + i\pi \beta ^{-1}\) (for \(O^\beta = F^\beta \) or \(G^\beta \)).

Examples of the equi-potential level sets \({\mathscr {C}}\) are given in Figure 5.

Using the Green’s function results of § 4.1.6, \(g_E( \mu + i \pi \beta ^{-1} ) = \mathrm {Re}\, G_E( \mu + i \pi \beta ^{-1} )\) where \(G_E\) is the integral (4.13). The asymptotic behaviour of this maximal asymptotic convergence rate for the separate \(\beta \rightarrow \infty \) and \(\mathsf {g}\rightarrow 0\) limits can be found in [37, 81]. Here, we consider the \(\beta ^{-1} + \mathsf {g} \rightarrow 0\) limit where the gap remains symmetric about the chemical potential \(\mu \).

To simplify the notation we consider \(I_- \cup I_+ = [-1,\varepsilon _-] \cup [\varepsilon _+,1]\) where \(\varepsilon _\pm = \mu \pm \frac{1}{2}\mathsf {g}\). By choosing to integrate (4.13) along the contour composed of the intervals \([1,\mu ]\) and \([\mu , \mu + i\pi \beta ^{-1}]\), we obtain

$$\begin{aligned} G_E(\mu + i\pi \beta ^{-1}) = G_E(\mu ) + \int _\mu ^{\mu + i\pi \beta ^{-1}} \frac{\zeta - z_3}{\sqrt{\zeta +1} \sqrt{\zeta - \varepsilon _-} \sqrt{\zeta - \varepsilon _+} \sqrt{\zeta - 1}} \mathrm {d}\zeta . \end{aligned}$$
(4.22)

Since \(g_E(\mu ) \sim \mathsf {g}\) as \(\mathsf {g} \rightarrow 0\) [37], we only consider the remaining term in (4.23).

For \(\zeta \in \mu + i [0,\pi \beta ^{-1} ]\), we have \(c^{-1} \leqslant |\sqrt{\zeta \pm 1}| \leqslant c\), and so the integral in (4.23) has the same asymptotic behaviour as

$$\begin{aligned}&\int _{\mu }^{\mu + i \pi \beta ^{-1} } \frac{\zeta - z_3}{\sqrt{\zeta - \varepsilon _-} \sqrt{\zeta - \varepsilon _+}} \mathrm {d}\zeta = \mathsf {g} \int _{\frac{1}{2}}^{\frac{1}{2}+\frac{i\pi \beta ^{-1}}{\mathsf {g}}} \frac{\sqrt{\zeta }}{\sqrt{\zeta - 1}} \mathrm {d}\zeta \nonumber \\&\quad + (\varepsilon _{-} -z_3) \int _{\frac{1}{2}}^{\frac{1}{2}+\frac{i\pi \beta ^{-1}}{\mathsf {g}}} \frac{1}{\sqrt{\zeta }\sqrt{\zeta -1}} \mathrm {d}\zeta , \end{aligned}$$
(4.23)

where we have used the change of variables \(\widetilde{\zeta } = \frac{\zeta - \varepsilon _-}{\varepsilon _+-\varepsilon _-}\).

Since the integrands are uniformly bounded along the domain of integration, (4.24) is \(\sim \beta ^{-1}\) as \(\beta \rightarrow \infty \).

The constant pre-factor in (4.21) is inversely proportional to the distance \(\mathrm {dist}\big ( {\mathscr {C}}, \sigma (\mathcal {H}) \big )\) between the contour \({\mathscr {C}} = \{ g_E = \gamma \}\) and the spectrum \(\sigma (\mathcal {H})\). In particular, since \(g_E\) is uniformly Lipschitz with constant \(L>0\) on the compact region bounded by \({\mathscr {C}}\), we have: there exists \(\lambda \in \sigma (\mathcal {H})\) and \(\xi \in {\mathscr {C}}\) such that

$$\begin{aligned} \mathrm {dist}\big ( {\mathscr {C}}, \sigma (\mathcal {H}) \big ) = |\xi - \lambda | \geqslant \frac{1}{L} |g_E(\xi ) - g_E(\lambda ) | = \frac{1}{L} \gamma . \end{aligned}$$

Therefore, choosing \(\gamma \) to be a constant multiple of \(g_E(\mu + i\pi \beta ^{-1})\), we conclude that the constant pre-factor C satisfies \(C \sim (\mathsf {g} + \beta ^{-1})^{-1}\) as \(\mathsf {g} + \beta ^{-1} \rightarrow 0\).

To extend the body-order expansion results to derivatives (in particular, to forces), we write the quantities of interest using resolvent calculus, apply Lemma 2 to bound the derivatives of the resolvent, and use the Hermite integral formula (4.20) to conclude: for \({\mathscr {C}}_1\), \({\mathscr {C}}_2\) simple closed positively oriented contours encircling the spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\) and \({\mathscr {C}}_1\), respectively, we have

$$\begin{aligned} \bigg | \frac{ \partial O_\ell ({\varvec{u}}) }{\partial {\varvec{u}}_m} - \frac{ \partial I_{X_N} O_\ell ({\varvec{u}}) }{\partial {\varvec{u}}_m} \bigg |&= \frac{1}{2\pi } \bigg | \oint _{\mathscr {C}_1} \big ( O(z) - I_{X_N}O(z) \big ) \frac{\partial \big ( \mathcal {H}({\varvec{u}}) - z \big )^{-1}_{\ell \ell }}{\partial {\varvec{u}}_m} \mathrm {d}z \bigg | \nonumber \\&= \frac{1}{4\pi ^2} \bigg | \oint _{\mathscr {C}_1} \oint _{\mathscr {C}_2} \frac{\ell _N(z)}{\ell _N(\xi )} \frac{O(\xi )}{\xi - z} \frac{\partial \big ( \mathcal {H}({\varvec{u}}) - z \big )^{-1}_{\ell \ell }}{\partial {\varvec{u}}_m} \mathrm {d}\xi \mathrm {d}z \bigg | \nonumber \\&\leqslant C e^{-\eta r_{\ell m}} \sup _{z \in {\mathscr {C}}_1, \xi \in {\mathscr {C}}_2} \bigg | \frac{\ell _N(z)}{\ell _N(\xi )} \bigg |. \end{aligned}$$
(4.24)

We conclude by choosing appropriate contours \({\mathscr {C}}_l = \{g_E = \gamma _l\}\) for \(l=1,2\) and applying (4.22). \(\square \)

4.2.1 The role of the point spectrum

To begin this section, we sketch the proof of Proposition 2.1.

Proof of Proposition 2.1

(i) Sup-norm perturbations. We suppose that \(\sup _k \big [ |\varvec{r}_k - \varvec{r}^{\mathrm {ref}}_k| + |v_k - v_k^{\mathrm {ref}}| \big ] \leqslant \delta \) for \(\delta >0\) sufficiently small such that

$$\begin{aligned} \big | h( {\varvec{u}}_{\ell k} ) - h( {\varvec{u}}^{\mathrm {ref}}_{\ell k} ) \big |&= \big | \nabla h( \xi _{\ell k} ) \cdot ( \varvec{r}_{\ell k} - \varvec{r}_{\ell k}^{\mathrm {ref}} ) \big | \leqslant C \delta e^{-\frac{1}{2} \gamma _0 r_{\ell k}}, \quad \text {and} \\ \big | t( {\varvec{u}}_{\ell m}, {\varvec{u}}_{km}) - t( {\varvec{u}}_{\ell m}^{\mathrm {ref}}, {\varvec{u}}_{km}^{\mathrm {ref}}) \big |&= \big | \nabla _1 t( \xi ^{(1)}_{\ell m}, \zeta ^{(1)}_{km} ) \cdot ( \varvec{r}_{\ell m} - \varvec{r}_{\ell m}^{\mathrm {ref}} )\\&\quad + \nabla _2 t( \xi ^{(2)}_{\ell m}, \zeta ^{(2)}_{k m} ) \cdot ( \varvec{r}_{k m} - \varvec{r}_{k m}^{\mathrm {ref}} ) \big | \\&\leqslant C \delta e^{-\frac{1}{2} \gamma _0(r_{\ell m} + r_{k m})}, \end{aligned}$$

where, \(\xi _{\ell k} \in [\varvec{r}_{\ell k}, \varvec{r}_{\ell k}^{\mathrm {ref}}]\), \(\xi ^{(l)}_{\ell m} \in [\varvec{r}_{\ell m}, \varvec{r}_{\ell m}^{\mathrm {ref}}]\), and \(\zeta ^{(l)}_{k m} \in [\varvec{r}_{km}, \varvec{r}_{km}^{\mathrm {ref}}]\). Therefore, if \(\psi \in \ell ^2\), we have

$$\begin{aligned} \left\| \big ( \mathcal {H}({\varvec{u}}) - \mathcal {H}({\varvec{u}}^{\mathrm {ref}}) \big ) \psi \right\| _{\ell ^2}^2&\leqslant C \delta ^2 \sum _{\ell k} e^{-\frac{1}{2} \gamma _0 r_{\ell k}} |\psi _k|^2 \leqslant C \delta ^2 \Vert \psi \Vert _{\ell ^2}^2. \end{aligned}$$
(4.25)

Therefore, applying standard results from perturbation theory [56, p. 291], we obtain

$$\begin{aligned} \mathrm {dist}\Big ( \sigma \big ( \mathcal {H}({\varvec{u}}) \big ), \sigma \big ( \mathcal {H}({\varvec{u}}^{\mathrm {ref}}) \big ) \Big ) \leqslant \left\| \mathcal {H}({\varvec{u}}) - \mathcal {H}({\varvec{u}}^{\mathrm {ref}}) \right\| _{\ell ^2 \rightarrow \ell ^2} \leqslant C\delta . \end{aligned}$$

(ii) Finite rank perturbations. The finite rank perturbation result has been presented in [70] in a slightly different setting. We sketch the main idea here for completeness.

Since the essential spectrum is stable under compact (in particular, finite rank) perturbations [56], the set

$$\begin{aligned} \sigma \big ( \mathcal {H}({\varvec{u}}) \big ) \setminus B_\delta \big ( \sigma \big ( \mathcal {H}({\varvec{u}}^{\mathrm {ref}}) \big ) \big ) \subseteq \sigma _\mathrm {disc}\big ( \mathcal {H}({\varvec{u}}) \big ) \setminus B_\delta \big ( \sigma _\mathrm {ess}\big ( \mathcal {H}({\varvec{u}}^{\mathrm {ref}}) \big ) \big ) \end{aligned}$$

is both compact and discrete and therefore finite. \(\square \)

Proof of Theorem 2.4

Suppose that \({\mathscr {C}}\) is a simple closed contour encircling the spectrum \(\sigma \big (\mathcal {H}({\varvec{u}})\big )\) and \((\lambda _s, \psi _s)\) are normalised eigenpairs corresponding to the finitely many eigenvalues outside \(I_- \cup I_+\). Therefore, we have that

$$\begin{aligned} O^\beta _\ell ({\varvec{u}}) - I_{X_N} O^\beta _\ell ({\varvec{u}})&= \frac{1}{2\pi i} \oint _{{\mathscr {C}}} \big ( O^\beta (z) - I_{X_N} O^\beta (z) \big ) \,\mathrm {tr}\big [\big ( z - \mathcal {H}({\varvec{u}}) \big )^{-1}\big ]_{\ell \ell } \mathrm {d}z \nonumber \\&\quad + \sum _s \big ( O^\beta (\lambda _s) - I_{X_N} O^\beta (\lambda _s) \big ) \big | [\psi _s]_{\ell } \big |^2. \end{aligned}$$
(4.26)

The first term of (4.27) may be treated in the same way as in the proof of Theorem 2.3. Moreover, derivatives of this term may be treated in the same way as in (4.25). It is therefore sufficient to bound the remaining term and its derivative.

Firstly, we note that the eigenvectors corresponding to isolated eigenvalues in the spectral gap have the following decay [17]: for \({\mathscr {C}}^\prime \) a simple closed positively oriented contour (or system of contours) encircling the \(\{\lambda _s\}\), we have that

$$\begin{aligned} \sum _s \big | [\psi _s]_{\ell } \big |^2&= \frac{1}{2\pi } \bigg | \oint _{\mathscr {C}^\prime } \big [ \big ( \mathcal {H}({\varvec{u}}) - z \big )^{-1} \big ]_{\ell \ell } \mathrm {d}z \bigg | \nonumber \\&= \frac{1}{2\pi } \bigg | \oint _{\mathscr {C}^\prime } \big [ \big ( \mathcal {H}({\varvec{u}}) - z \big )^{-1} - \big ( \mathcal {H}({\varvec{u}}^\mathrm {ref}) - z \big )^{-1} \big ]_{\ell \ell } \mathrm {d}z \bigg | \nonumber \\&\leqslant C e^{-\gamma _{\mathrm {CT}} [|\varvec{r}_\ell | - R_\mathrm {def}]}, \end{aligned}$$
(4.27)

where \(\gamma _\mathrm {CT}\) is the Combes-Thomas constant from Lemma 1 with \(\mathfrak {d} = \mathrm {dist}\big ({\mathscr {C}}^\prime , \sigma (\mathcal {H}({\varvec{u}}))\big )\). The constant pre-factor in (4.28) depends on the distance between the contour and the defect spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}) \big )\). Similar estimates hold for the derivatives. For full details on the derivation of (4.28), see [17, (5.18)–(5.21)].

Therefore, combining (4.28) and the Hermite integral formula, we conclude as in the proof of Theorem 2.3. \(\square \)

4.3 Non-linear body-order approximation

In this section, we prove Theorem 2.5 by applying the recursion method to reformulate the problem into a semi-infinite linear chain and replacing the far-field with vacuum.

4.3.1 Recursion method

In that follows, we briefly introduce the recursion method [49, 50], a reformulation of the Lanczos process [61], which generates a tri-diagonal (Jacobi) operator T [91] whose spectral measure is \(D_\ell \) and the corresponding sequence of orthogonal polynomials [40]. This process provides the basis for constructing approximations to the LDOS giving rise to nonlinear approximation schemes satisfying Theorem 2.5.

Recall that \(D_\ell \) is the LDOS satisfying (2.13). We start by defining \(p_0 :=1\), \(a_0 :=\int x \mathrm {d}D_\ell (x)\) and \(b_1 p_1(x) :=x - a_0\) where \(b_1\) is the normalising constant to ensure \(\int p_1(x)^2 \mathrm {d}D_\ell (x) = 1\). Then, supposing we have defined \(a_0, a_1, b_1, \dots , a_n, b_{n}\) and the polynomials \(p_0(x), \dots , p_n(x)\), we set

$$\begin{aligned}&b_{n+1} p_{n+1}(x) :=(x - a_n) p_n(x) - b_n p_{n-1}(x), \qquad \text {with} \\&\quad \int p_{n+1}(x)^2 \mathrm {d}D_\ell (x) = 1, \qquad a_{n+1} :=\int x p_{n+1}(x)^2 \mathrm {d}D_\ell (x). \end{aligned}$$
(4.28)

Then, \(\{p_n\}\) is a sequence of orthogonal polynomials with respect to \(D_\ell \) (i.e. \(\int p_n p_m \mathrm {d}D_\ell = \delta _{nm}\)) and we have that

$$\begin{aligned} T_N :=\Big ( \int x p_n p_m \mathrm {d}D_\ell \Big )_{ 0 \leqslant n,m \leqslant N } = \begin{pmatrix} a_0 &{} b_1 &{} &{} \\ b_1 &{} a_1 &{} \ddots &{} \\ &{} \ddots &{} \ddots &{} b_N \\ &{} &{} b_N &{} a_N \end{pmatrix} \end{aligned}$$
(4.29)

(see Lemma D.1 for a proof). Moreover, we denote by T the infinite symmetric tridiagonal matrix on \({\mathbb {N}}_0\) with diagonal \((a_n)_{n \in {\mathbb {N}}_0}\) and off-diagonal \((b_n)_{n\in {\mathbb {N}}}\).

Remark 15

It will also prove convenient for us to renormalise the orthogonal polynomials by defining \(P_n(x) :=b_n p_n(x)\) and \(b_0 :=1\); that is,

$$\begin{aligned} P_0(x)= & {} 1, \quad P_1(x) = x - a_0, \quad \text {and} \\ P_{n+1}(x)= & {} \frac{x-a_n}{b_n} P_n(x) - \frac{b_n}{b_{n-1}}P_{n-1}(x), \quad \text {for }n\geqslant 1 \end{aligned}$$
(4.30)
$$\begin{aligned} b_{n+1}^2= & {} \int P_{n+1}(x)^2 \mathrm {d}D_\ell (x), \quad \text {and} \quad a_{n+1} = \frac{\int x P_{n+1}(x)^2 \mathrm {d}D_\ell (x),}{b_{n+1}^2}.\nonumber \\ \end{aligned}$$
(4.31)

One advantage of this formulation is that it explicitly defines the coefficients \(\{b_n\}\).

Therefore, if we have the first \(2N+1\) moments \(\mathcal {H}_{\ell \ell }, \dots , (\mathcal {H}^{2N+1})_{\ell \ell }\), it is possible to evaluate \(Q_{2N+1}(\mathcal {H})_{\ell \ell }\) (that is, \(\int Q_{2N+1} \mathrm {d}D_\ell \)) for all polynomials \(Q_{2N+1}\) of degree at most \(2N+1\), and thus compute \(T_N\). In particular, for a fixed observable of interest O, we may write

$$\begin{aligned} \Theta _{2N+1}\big ( \mathcal {H}_{\ell \ell }, \dots , [\mathcal {H}^{2N+1}]_{\ell \ell } \big ) :=O(T_N)_{00}. \end{aligned}$$
(4.32)

Remark 16

In Appendix E we introduce more complex bond order potential (BOP) schemes based on the recursion method and show that they also satisfy Theorem 2.5.

4.3.2 Error estimates

Equation (4.35) states that the nonlinear approximation scheme given by \(\Theta _{2N+1}\) simply approximates the LDOS with the spectral measure of \(T_N\) corresponding to \(\varvec{e}_0 :=(1,0,\dots ,0)^\mathrm {T}\). We now show that \([(T_N)^n]_{00} = [T^n]_{00} = [\mathcal {H}^n]_{\ell \ell }\) for all \(n \leqslant 2N+1\) and thus we may apply (2.14) to conclude.

By the orthogonality, we have \([T^0]_{ij} = \int p_i(x) x^0 p_j(x) \mathrm {d}D_\ell (x) = \delta _{ij}\). Therefore, assuming \([T^n]_{ij} = \int p_i(x) x^n p_j(x) \mathrm {d}D_\ell (x)\), we can conclude that

$$\begin{aligned}{}[T^{n+1}]_{ij}&= \sum _{k} [T^n]_{ik} T_{kj} \nonumber \\&= \int p_i(x) x^n \big [ b_j p_{j-1}(x) + a_j p_j(x) + b_{j+1} p_{j+1}(x) \big ] \mathrm {d}D_\ell (x) \end{aligned}$$
(4.33)
$$\begin{aligned}&= \int p_{i}(x) x^{n+1} p_{j}(x) \mathrm {d}D_\ell (x). \end{aligned}$$
(4.34)

Here, we have applied (4.29) directly. In particular, if \(i=j=0\), we obtain \([T^n]_{00} = [\mathcal {H}^n]_{\ell \ell }\).

4.3.3 Analyticity

To conclude the proof of Theorem 2.5, we show that \(\Theta _{2N+1}\) as in (4.35) extents to an analytic function on some open set \(U \subset {\mathbb {C}}^{2N+1}\). Throughout this section, we use the rescaled orthogonal polynomials \(\{P_n\}\) from Remark 15.

For a polynomial \(P(x) = \sum _{j=0}^m c_j x^j\), we use the notation \(\mathcal {L} P(z_1,\dots ,z_m) :=c_0 + \sum _{j=1}^m c_j z_j\) for the linear function satisfying \(P(x) = \mathcal {L}P(x,x^2,\dots ,x^m)\). To extend the recurrence coefficients from (4.32), we start by defining

$$\begin{aligned}&b_0 = 1, \quad a_0(z_1) :=z_1, \quad P_1(x;z_1):=x - a_0(z_1) = x - z_1, \nonumber \\&\quad b_1^2(z_1,z_2) :=\mathcal {L}\big (x \mapsto P_1(x;z_1)^2\big )(z_1,z_2) = z_2 - z_1^2, \nonumber \\&\quad a_1(z_1,z_2,z_3) :=\frac{\mathcal {L}\big (x\mapsto xP_1(x;z_1)^2\big )(z_1,z_2,z_3)}{b_1^2(z_1,z_2)} = \frac{z_3 - 2 z_1z_2 + z_1^3}{z_2 - z_1^2}.\nonumber \\ \end{aligned}$$
(4.35)

To simplify the notation, we write \(\varvec{z}_{1:m}\) for the m-tuple \((z_1,\dots ,z_m)\). Given \(a_0(z_1), \dots , a_n( \varvec{z}_{1:2n+1})\) and \(b_1(\varvec{z}_{1:2}), \dots , b_n(\varvec{z}_{1:2n})\), we define \(P_{n+1}(x;\varvec{z}_{1:2n+1})\) to be the polynomial in x satisfying the same recursion as (4.32) but as a function of \(\varvec{z}_{1:2n+1}\):

$$\begin{aligned} P_{n+1}(x;\varvec{z}_{1:2n+1})&= \frac{x-a_n(\varvec{z}_{1:2n+1})}{b_n(\varvec{z}_{1:2n})} P_n(x;\varvec{z}_{1:2n-1}) \\&\quad - \frac{b_n(\varvec{z}_{1:2n})}{b_{n-1}(\varvec{z}_{1:2n-2})} P_{n-1}(x;\varvec{z}_{1:2n-3}). \end{aligned}$$

With this notation, we define

$$\begin{aligned} b_{n+1}^2(\varvec{z}_{1:2n+2})&:=\mathcal {L}\big ( x \mapsto P_{n+1}(x;\varvec{z}_{1:2n+1})^2 \big )(\varvec{z}_{1:2n+2}), \end{aligned}$$
(4.36)
$$\begin{aligned} a_{n+1}(\varvec{z}_{1:2n+3})&:=\frac{\mathcal {L}\big ( x\mapsto xP_{n+1}(x;\varvec{z}_{1:2n+1})^2 \big )(\varvec{z}_{1:2n+3})}{b_{n+1}^2(\varvec{z}_{1:2n+2}) } . \end{aligned}$$
(4.37)

Since \(P_{n+1}(x) = P_{n+1}(x;\mathcal {H}_{\ell \ell },\dots ,[\mathcal {H}^{2n+1}]_{\ell \ell })\), we have extended the definition of the recursion coefficients (4.34) to functions of multiple complex variables.

We now show that \(a_n(\varvec{z}_{1:2n+1})\) and \(b_n^2(\varvec{z}_{1:2n})\) are rational functions. As a preliminary step, we show that both \(P_{n+1}^2\) and \(\frac{P_{n+1} P_n}{b_n}\) are polynomials in x with coefficients given by rational functions of \(a_n,b_n^2\) and all previous recursion coefficients. This statement is clearly true for \(n = 0\): \(P_1^2 = (x - a_0)^2\) and \(\frac{P_1 P_0}{b_0} = x - a_0\). Therefore, by induction and noting that

$$\begin{aligned} P_{n+1}^2&= \left( \frac{x-a_n}{b_n} P_n \right) ^2 - 2 (x - a_n) \frac{P_nP_{n-1}}{b_{n-1}} + \frac{b_n^2}{b_{n-1}^2} P_{n-1}^2 \end{aligned}$$
(4.38)
$$\begin{aligned} \frac{P_{n+1} P_n}{b_n}&= \frac{x - a_n}{b_n^2} P_{n}^2 - \frac{P_nP_{n-1}}{b_{n-1}}, \end{aligned}$$
(4.39)

we can conclude. Therefore, by (4.38) and (4.39) and (4.40), we can apply another induction argument to conclude that \(a_{n+1}(\varvec{z}_{1:2n+3})\) and \(b_{n+1}^2(\varvec{z}_{1:2(n+1)})\) are rational functions.

We fix N and define the following complex valued tri-diagonal matrix

$$\begin{aligned} T_N(\varvec{z}):=\begin{pmatrix} a_0(z_1) &{} b_1^2(\varvec{z}_{1:2}) &{} &{} &{} \\ 1 &{} a_1(\varvec{z}_{1:3}) &{} b_2^2(\varvec{z}_{1:4}) &{} &{} \\ &{} 1 &{} \ddots &{} \ddots &{} \\ &{} &{} \ddots &{} \ddots &{} b_N^2(\varvec{z}_{1:2N}) \\ &{} &{} &{} 1 &{} a_N(\varvec{z}_{1:2N+1}) \end{pmatrix}. \end{aligned}$$
(4.40)

If \(z_j = [\mathcal {H}^j]_{\ell \ell }\) for each \(j = 1,\dots ,N\), (4.43) is similar to \(T_N\) from (4.31).

Now, on defining \(U :=\{ \varvec{z} \in {\mathbb {C}}^{2N+1} :b_n^2(\varvec{z}_{1:2n}) \not = 0 \,\, \forall n=1,\dots ,N\}\), the mapping \(U \rightarrow {\mathbb {C}}^{(N+1)\times (N+1)}\) given by \(\varvec{z} \mapsto T_N(\varvec{z})\) is analytic. Therefore, for appropriately chosen contours \({\mathscr {C}}_{\varvec{z}}\) encircling \(\sigma \big (T_N(\varvec{z}) \big )\), we have that

$$\begin{aligned} \Theta _{2N+1}(\varvec{z}) :=O\big (T_N(\varvec{z})\big )_{00} = -\frac{1}{2\pi i} \oint _{\mathscr {C}_{\varvec{z}}} O(\omega ) \Big [ \big ( T_N(\varvec{z}) - \omega \big )^{-1} \Big ]_{00} \mathrm {d}\omega . \end{aligned}$$
(4.41)

In particular, \(\Theta _{2N+1}\) is an analytic function on

$$\begin{aligned} \big \{ \varvec{z} \in U :O \text { analytic in an open neighbourhood of } \sigma \big ( T_N(\varvec{z}) \big ) \big \}. \end{aligned}$$

Remark 17

Since \({\mathbb {C}}^{2N+1} \setminus U\) is the zero set for some (non-zero) polynomial P in \(2N+1\) variables, it has \((2N+1)\)-dimensional Lebesgue measure zero [45].

Remark 18

In Appendix D we show that the eigenvalues of \(T_N(\varvec{z})\) are distinct for \(\varvec{z}\) in some open neighbourhood, \(U_0 \subset U\), of \({\mathbb {R}}^{2N+1}\), which leads to the following alternative proof. On \(U_0\), the eigenvalues and corresponding left and right eigenvectors can be chosen to be analytic: there exist analytic functions \(\varepsilon _j, \psi _{j}, \phi _{j}^\star \) for \(j = 0,\dots ,N\) such that

$$\begin{aligned} T_N(\varvec{z}) \psi _j(\varvec{z}) = \varepsilon _j(\varvec{z}) \psi _j(\varvec{z}), \quad \phi ^\star _j(\varvec{z}) T_N(\varvec{z}) = \varepsilon _j(\varvec{z}) \phi _j^\star (\varvec{z}), \quad \text {and} \quad \phi ^\star _i(\varvec{z}) \psi _j(\varvec{z}) = \delta _{ij}. \end{aligned}$$

(More precisely, we apply [44, Theorem 2] to obtain analytic functions \(\psi _j,\phi ^\star _j\) of each variable \(z_0,\dots ,z_{2N+1}\) separately and then apply Hartog’s theorem [60] to conclude that \(\psi _j,\phi ^\star _j\) are analytic as functions on \(U \subset {\mathbb {C}}^{2N+1}\).) Therefore, the nonlinear method discussed in this section can also be written in the form

$$\begin{aligned} \Theta _{2N+1} = \sum _{j=0}^N [\psi _j]_{0} [\phi ^\star _j]_0 \cdot \big ( O\circ \varepsilon _j \big ), \end{aligned}$$
(4.42)

which is an analytic function on \(\{ \varvec{z} \in U_0 :O \text { analytic at } \varepsilon _j(\varvec{z}) \text { for each } j\}\) (as it is a finite combination of analytic functions only involving products, compositions and sums).

4.4 Self-consistent tight binding models

We start with the following preliminary lemma:

Lemma 3

Suppose that \(T :\ell ^2(\Lambda ) \rightarrow \ell ^2(\Lambda )\) is an invertible bounded linear operator with matrix entries \(T_{\ell k}\) satisfying \( \big | T_{\ell k} \big | \leqslant c_T e^{-\gamma _T r_{\ell k}} \) for some \(c_T, \gamma _T > 0\).

Then, there exists an invertible bounded linear operator \(\overline{T} :\ell ^\infty (\Lambda ) \rightarrow \ell ^\infty (\Lambda )\) extending \(T:\ell ^2(\Lambda ) \rightarrow \ell ^2(\Lambda )\) (that is, \(\overline{T}\big |_{\ell ^2(\Lambda )} = T\)).

Proof

First, we denote the inverse of T and its matrix entries by \(T^{-1}:\ell ^2(\Lambda ) \rightarrow \ell ^2(\Lambda )\) and \(T^{-1}_{\ell k}\), respectively. Then, applying the Combes-Thomas estimate to T yields the off-diagonal decay estimate \(|T^{-1}_{\ell k}| \leqslant C e^{-\gamma _{\mathrm {CT}} r_{\ell k}}\) for some \(C, \gamma _\mathrm {CT} > 0\) [93].

Due to the off-diagonal decay properties of the matrix entries, the operators \(\overline{T}, \overline{T}^{-1} :\ell ^\infty (\Lambda ) \rightarrow \ell ^\infty (\Lambda )\) given by

$$\begin{aligned}{}[\overline{T}\phi ]_\ell :=\sum _{k \in \Lambda } T_{\ell k} \phi _k \qquad \text {and} \qquad [\overline{T}^{-1}\phi ]_\ell :=\sum _{k \in \Lambda } T^{-1}_{\ell k} \phi _k \end{aligned}$$

are well defined bounded linear operators with norms \(\sup _\ell \sum _{k \in \Lambda } |T_{\ell k}|\) and \(\sup _\ell \sum _{k \in \Lambda } |T^{-1}_{\ell k}|\), respectively. To conclude, we note that

$$\begin{aligned}{}[\overline{T}\overline{T}^{-1}\phi ]_{\ell }&= \sum _{k}\sum _m T_{\ell k} T^{-1}_{km} \phi _m = \sum _m [TT^{-1}]_{\ell m} \phi _m = \phi _\ell \end{aligned}$$
(4.43)

and so \(\overline{T}^{-1}\) is the inverse of \(\overline{T}\). Here, we have exchanged the summations over k and m by applying the dominated convergence theorem: \(\big |\sum _k T_{\ell k}T^{-1}_{km} \phi _m\big | \leqslant C e^{-\frac{1}{2}\min \{\gamma _T, \gamma _\mathrm {CT}\} r_{\ell m}} \Vert \phi \Vert _{\ell ^\infty }\) is summable over \(m \in \Lambda \). \(\square \)

Throughout the following proofs, we denote by \(B_r(\rho )\) the open ball of radius r about \(\rho \) with respect to the \(\ell ^\infty \)-norm. Moreover, we briefly note that the stability operator can be written as the product \({\mathscr {L}}(\rho ) :={\mathscr {F}}(\rho ) \nabla {w(\rho )}\), where [93]

$$\begin{aligned} {\mathscr {F}}(\rho )_{\ell k} :=\frac{1}{2\pi i} \oint _{\mathscr {C}} F^\beta (z) \Big [ \big ( \mathcal {H}( {\varvec{u}}(\rho ) ) - z \big )^{-1}_{\ell k} \Big ]^2 \mathrm {d}z, \end{aligned}$$
(4.44)

where \({\mathscr {C}}\) is a simple closed contour encircling the spectrum \(\sigma \big ( \mathcal {H}({\varvec{u}}(\rho )) \big )\).

Proof of Theorem 2.6

Since \(\rho \mapsto F^\beta ({\varvec{u}}(\rho ))\) is \(C^2\), and \(\big (I - {\mathscr {L}}(\rho ^\star )\big )^{-1}\) is a bounded linear operator, we necessarily have that \(\big (I - {\mathscr {L}}(\rho )\big )^{-1}\) is a bounded linear operator for all \(\rho \in B_r(\rho ^\star )\) for some \(r > 0\).

By applying Theorem 2.3, together with the assumption (EP), we obtain

$$\begin{aligned} \big |\big [ {\mathscr {L}}(\rho ) - {{\mathscr {L}}}_N(\rho ) \big ]_{\ell k}\big |&\leqslant \sum _m \bigg |\bigg [ \frac{\partial F^\beta _\ell ({\varvec{u}})}{\partial v_m} - \frac{\partial I_{X_N} F^\beta _\ell ({\varvec{u}})}{\partial v_m} \bigg ] \frac{\partial {w(\rho )}_m}{\partial \rho _k} \bigg |\\&\leqslant C \Big [ \sum _m e^{-\eta \, r_{\ell m}} e^{-\gamma _v\, r_{mk}} \Big ] e^{-\frac{1}{2}\gamma _N N} \\&\leqslant C e^{-\frac{1}{2}\min \{ \eta , \gamma _v \} r_{\ell k}} e^{-\frac{1}{2}\gamma _N N} \end{aligned}$$
(4.45)

for all \(\rho \in B_r(\rho ^\star )\). As a direct consequence, we have \(\Vert {{\mathscr {L}}}(\rho ) - {\mathscr {L}}_N(\rho ) \Vert _{\ell ^2 \rightarrow \ell ^2} \leqslant C e^{-\frac{1}{2}\gamma _N N}\) and we may choose N sufficiently large such that \(\Vert {\mathscr {L}}(\rho ) - {\mathscr {L}}_N(\rho ) \Vert _{\ell ^2 \rightarrow \ell ^2} < \Vert (I - {\mathscr {L}}(\rho ))^{-1} \Vert ^{-1}_{\ell ^2 \rightarrow \ell ^2}\). In particular, for such N, the operator \(I - {{\mathscr {L}}}_N(\rho ) :\ell ^2 \rightarrow \ell ^2\) is invertible with inverse bounded above in operator norm independently of N.

We now show that \(I - {\mathscr {L}}_N(\rho )\) satisfies the assumptions of Lemma 3. Using (4.47) and (EP), together with the Combes-Thomas estimate (Lemma 1), we conclude that

$$\begin{aligned} \big | {\mathscr {L}}_N(\rho )_{\ell k} \big | \leqslant C \sup _{z \in {\mathscr {C}}} |I_{X_N} F^\beta (z)| \sum _{m \in \Lambda } e^{-2\gamma _\mathrm {CT} r_{\ell m}} e^{-\gamma _v r_{mk}} \leqslant C e^{-\frac{1}{2}\min \{ 2\gamma _\mathrm {CT}, \gamma _v \} r_{\ell k} } \end{aligned}$$

for all \(\rho \in B_r(\rho ^\star )\). In particular, \(I - {\mathscr {L}}_N(\rho )\) extends to a invertible bounded linear operator \(\ell ^\infty \rightarrow \ell ^\infty \) and thus its inverse \(\big (I - {\mathscr {L}}_N(\rho )\big )^{-1} :\ell ^\infty \rightarrow \ell ^\infty \) is bounded.

Now, the mapping \(\rho \mapsto \rho - I_{X_N}F^\beta \big ({\varvec{u}}(\rho )\big )\) between \(\ell ^\infty \rightarrow \ell ^\infty \) is continuously differentiable on \(B_r(\rho ^\star )\) and the derivative at \(\rho ^\star \) is invertible (i.e. \(\big (I - {\mathscr {L}}_N(\rho ^\star )\big )^{-1}:\ell ^\infty \rightarrow \ell ^\infty \) is a well defined bounded linear operator). Since the map \(\rho \mapsto I_{X_N}{F}^\beta \big ({\varvec{u}}(\rho )\big )\) is \(C^2\), its derivative \({{\mathscr {L}}}_N\) is locally Lipschitz about \(\rho ^\star \) and so there exists \(L>0\) such that

$$\begin{aligned}&\big \Vert \big ( I - {{\mathscr {L}}}_N(\rho ^\star ) \big )^{-1} \big ( {{\mathscr {L}}}_N(\rho _1) - {{\mathscr {L}}}_N(\rho _2) \big ) \big \Vert _{\ell ^\infty \rightarrow \ell ^\infty }\\&\quad \leqslant L \Vert \rho _1 - \rho _2 \Vert _{\ell ^\infty } \qquad \text {for } \rho _1, \rho _2 \in B_r(\rho ^\star ). \end{aligned}$$

Moreover, by Theorem 2.3, we have that

$$\begin{aligned}&\big \Vert \big ( I - {{\mathscr {L}}}_N(\rho ^\star ) \big )^{-1} \big ( \rho ^\star - I_{X_N} {F}^\beta ({\varvec{u}}(\rho ^\star )) \big ) \big \Vert _{\ell ^\infty } \\&\quad \leqslant C \big \Vert F^\beta ({\varvec{u}}(\rho ^\star )) - I_{X_N} {F}^\beta ({\varvec{u}}(\rho ^\star )) \big \Vert _{\ell ^\infty } =:b_{N} \end{aligned}$$

where \(b_{N} \lesssim e^{-\gamma _N N}\). In particular, we may choose N sufficiently large such that \(2b_{N} L < 1\) and \(t^\star _{N} :=\frac{1}{L}( 1 - \sqrt{1 - 2 b_{N} L} ) < r\).

Thus, the Newton iteration with initial point \(\rho ^0 :=\rho ^\star \), defined by

$$\begin{aligned} \rho ^{i+1} = \rho ^i - \big ( I - {{\mathscr {L}}}_N(\rho ^i) \big )^{-1} \big ( \rho ^i - I_{X_N}{F}^\beta ({\varvec{u}}(\rho ^i)) \big ), \end{aligned}$$

converges to a unique fixed point \(\rho _N = I_{X_N}{F}^\beta ({\varvec{u}}(\rho _N))\) in \(B_{t^\star _{N}}(\rho ^\star )\) [102, 104]. That is, \(\Vert \rho _N - \rho ^\star \Vert _{\ell ^\infty } \leqslant t_N^\star \leqslant 2 b_{N}\). Here, we have used the fact that \(1 - \sqrt{1 - x} \leqslant x\) for all \(0 \leqslant x \leqslant 1\).

Since \(\rho _N \in B_r(\rho ^\star )\), we have \(I - {\mathscr {L}}_N(\rho _N) :\ell ^2 \rightarrow \ell ^2\) is invertible and thus Lemma 2.8 also holds. \(\square \)

Proof of Proposition 2.9

We proceed in the same way as in the proof of Theorem 2.6. In particular, since \(\rho _N\) is stable, if \(\Vert \rho ^0 - \rho _N\Vert _{\ell ^\infty }\) is sufficiently small, \((I - {\mathscr {L}}_N(\rho ^0))^{-1}\) is a bounded linear operator on \(\ell ^2\). Moreover, by the exact same argument as in the proof of Theorem 2.6, \(I - {\mathscr {L}}_N(\rho ^0):\ell ^\infty \rightarrow \ell ^\infty \) defines an invertible bounded linear operator. Also, \(I - {\mathscr {L}}_N(\rho )\) is Lipschitz in a neighbourhood about \(\rho ^0\) and

$$\begin{aligned} \big \Vert \big ( I - {{\mathscr {L}}}_N(\rho ^0) \big )^{-1} \big ( \rho ^0 - I_{X_N} {F}^\beta ({\varvec{u}}(\rho ^0)) \big ) \big \Vert _{\ell ^\infty }&\leqslant C \big \Vert \rho ^0 - \rho _N - \big ( I_{X_N} {F}^\beta ({\varvec{u}}(\rho ^0)) \\&\quad - I_{X_N} {F}^\beta ({\varvec{u}}(\rho _N)) \big ) \big \Vert _{\ell ^\infty } \\&\leqslant C \Vert \rho ^0 - \rho _N \Vert _{\ell ^\infty }. \end{aligned}$$

Here, we have used that

$$\begin{aligned}&\big | I_{X_N} {F}^\beta _\ell ({\varvec{u}}(\rho ^0)) - I_{X_N} {F}^\beta _\ell ({\varvec{u}}(\rho _N)) \big |\nonumber \\&\quad = \frac{1}{2\pi } \Big | \oint _{{\mathscr {C}}} I_{X_N} F^\beta (z) \big [ {\mathscr {R}}_z(\rho ^0) - {\mathscr {R}}_z(\rho _N) \big ]_{\ell \ell } \Big | \nonumber \\&\quad \leqslant C \sum _{k \in \Lambda } e^{-2\gamma _\mathrm {CT} r_{\ell k}} |v(\rho ^0)_k - v(\rho _N)_k| \nonumber \\&\quad \leqslant C \sum _{k \in \Lambda } e^{-2\gamma _\mathrm {CT} r_{\ell k}} \Big | \int _0^1 \sum _{m \in \Lambda } \frac{\partial v(t\rho ^0 + (1 - t) \rho _N)_k}{\partial \rho _m} \big [ \rho ^0 - \rho _N \big ]_m \mathrm {d}t \Big | \nonumber \\&\quad \leqslant C \sum _{m \in \Lambda } e^{-\frac{1}{2}\min \{2\gamma _\mathrm {CT}, \gamma _v\} r_{\ell m} } \big |\big [ \rho ^0 - \rho _N \big ]_m\big | \nonumber \\&\quad \leqslant C \Vert \rho ^0 - \rho _N\Vert _{\ell ^\infty }. \end{aligned}$$
(4.46)

Therefore, as long as \(\Vert \rho ^0 - \rho _N\Vert _{\ell ^\infty }\) is sufficiently small, we may apply the Newton iteration starting from \(\rho ^0\) to conclude. \(\square \)

Proof of Corollary 2.7

As a direct consequence of (4.51), we have that

$$\begin{aligned} \big | O_\ell ^\mathrm {sc}({\varvec{u}}) - I_{X_N} O_\ell \big ( {\varvec{u}}(\rho _N) \big )\big |&\leqslant \big | O_\ell \big ( {\varvec{u}}(\rho ^\star ) \big )- I_{X_N} O_\ell \big ( {\varvec{u}}(\rho ^\star ) \big ) \big | \\&\quad + \big | I_{X_N} O_\ell \big ( {\varvec{u}}(\rho ^\star ) \big ) - I_{X_N} O_\ell \big ( {\varvec{u}}(\rho _N) \big ) \big | \\&\leqslant C \big [ e^{-\gamma _N N} + \Vert \rho _N - \rho ^\star \Vert _{\ell ^\infty } \big ]\\&\leqslant C e^{-\gamma _N N}. \end{aligned}$$

Here, we have applied the standard convergence result (Theorem 2.3) with fixed effective potential. \(\square \)