Body-Ordered Approximations of Atomic Properties

We show that the local density of states (LDOS) of a wide class of tight-binding models has a weak body-order expansion. Specifically, we prove that the resulting body-order expansion for analytic observables such as the electron density or the energy has an exponential rate of convergence both at finite Fermi-temperature as well as for insulators at zero Fermi-temperature. We discuss potential consequences of this observation for modelling the potential energy landscape, as well as for solving the electronic structure problem.


Introduction
An atomistic potential energy landscape (PEL) is a mapping assigning energies E(r), or local energy contributions, to atomic structures r = {r } ∈ ∈ (R d ) , where is a general (possibly infinite) index set. High-fidelity models are provided by the Born-Oppenheimer PEL associated with ab initio electronic structure models such as tight-binding, Kohn-Sham density functional theory (DFT), Hartree-Fock, or even lower level quantum chemistry models [38,48,54,58,73,94]. Even now, however, the high computational cost of electronic structure models severely limits their applicability in material modelling to thousands of atoms for static and hundreds of atoms for long-time dynamic simulations.
There is a long and successful history of using surrogate models for the simulation of materials, devised to remain computationally tractable but capture as much detail of the reference ab initio PEL as possible. Empirical interatomic potentials are purely phenomenological and are able to capture a minimal subset of desired properties of the PEL, severely limiting their transferability [23,86]. The rapid growth in computational resources, increased both the desire and the possibility to match as much of an ab inito PEL as possible. A continuous increase in the complexity of parameterisations since the 1990s [6,7,36] has over time naturally led to a new generation of "machine-learned interatomic potentials" employing universal approximators instead of empirical mechanistic models. Early examples include symmetric polynomials [11,80], artificial neural networks [8] and kernel methods [5]. A striking case is the Gaussian approximation potential for Silicon [4], capturing the vast majority of the PEL of Silicon of interest for material applications.
The purpose of the present work is, first, to rigorously evaluate some of the implicit or explicit assumptions underlying this latest class of interatomic potential models, as well as more general models for atomic properties. Specifically, we will identify natural modelling parameters as approximation parameters and rigorously establish convergence. Secondly, our results indicate that nonlinearities are an important feature, highlighting some superior theoretical properties. Finally, unlike existing nonlinear models, we will identify explicit low-dimensional nonlinear parameterisations yet prove that they are systematic. In addition to justifying and supporting the development of new models for general atomic properties, our results establish generic properties of ab initio models that have broader consequences, e.g. for the study of the mechanical properties of atomistic materials [15,17,32,93]. The application of our results to the construction and analysis of practical parameterisations (approximation schemes) that exploit our results will be pursued elsewhere.
Our overarching principle is to search for representations of properties of ab initio models in terms of simple components, where "simple" is of course highly context-specific. To illustrate this point, let us focus on modelling the potential energy landscape (PEL), which motivated this work in the first place. Pragmatically, we require that these simple components are easier to analyse and manipulate analytically or to fit than the PEL. For many materials (at least as long as Coulomb interaction does not play a role), the first step is to decompose the PEL into site energy contributions, E(r) = ∈ E (r), (1.1) where one assumes that each E is local, i.e., it depends only weakly on atoms far away. In previous works we have made this rigorous for the case of tightbinding models of varying complexity [14,16,17,93]. In practise, one may therefore truncate the interaction by admitting only those atoms r k with r k := |r k −r | < r cut as arguments. Typical cutoff radii range from 5Å to 8Å, which means that on the order 30 to 100 atoms still make important contributions. Thus the site energy E is still an extremely high-dimensional object and short of identifying low-dimensional features it would be practically impossible to numerically approximate it, due to the curse of dimensionality. A classical example that illustrates our search for such low-dimensional features is the embedded atom model (EAM) [23], which assigns to each atom ∈ a site energy While the site energy E eam remains high-dimensional, the representation is in terms of three one-dimensional functions φ, ρ, F which are easily represented for example in terms of splines with relatively few parameters. Such a low-dimensional representation significantly simplifies parameter estimation, and vastly improves generalisation of the model outside a training set. Unfortunately, the EAM model and its immediately generalisations [6] have limited ability to capture a complex ab initio PEL. Still, this example inspires our search for representations of the PEL involving parameters that are • low-dimensional, • short-ranged.
Following our work on locality of interaction [14,16,17,93] we will focus on a class of tight-binding models as the ab initio reference model. These can be seen either as discrete approximations to density functional theory [38] or alternatively as electronic structure toy models sharing many similarities with the more complex Kohn-Sham DFT and Hartree-Fock models.
To control the dimensionality of representations, a natural idea is to to consider a body-order expansion, where r k := r k −r and we say that V n (r k 1 , . . . , r k n ) is an (n+1)-body potential modelling the interaction of a centre atom and n neighbouring atoms {k 1 , . . . , k n }. This expansion was traditionally truncated at body-order three (N = 2) due to the exponential increase in computational cost with N . However, it was recently demonstrated by Shapeev's moment tensor potentials (MTPs) [80] and Drautz' atomic cluster expansion (ACE) [25] that a careful reformulation leads to models with at most linear N -dependence. Indeed, algorithms proposed in [2,80] suggest that the computational cost may even be N -independent, but this has not been proven. Even more striking is the fact that the MTP and ACE models which are both linear models based on a body-ordered approximation, currently appear to outperform the most advanced nonlinear models in regression and generalisation tests [66,106]. These recent successes are in stark contrast with the "folklore" that body-order expansions generally converge slowly, if at all [10,25,27,46,86]. The fallacy in those observations is typically that they implicitly assume a vacuum cluster expansion (cf. § 2.2). Indeed, our first set of main results in § 2.4 will be to demonstrate that a rapidly convergent body-order approximation can be constructed if one accounts for the chemical environment of the material. We will precisely characterise the convergence of such an approximation as N → ∞, in terms of the Fermitemperature and the band-gap of the material.
In the simplest scheme we consider, we achieve this by considering atomic properties [O(H)] , where H is a tight-binding Hamiltonian and O an analytic function. Approximating O by a polynomial on the spectrum σ (H) results in an approximation of the atomic property [ p(H)] , which is naturally "body-ordered". To obtain quasi-optimal approximation results, naive polynomial approximation schemes (e.g. Chebyshev) are suitable only in the simplest scenarios. For the insulating case we leverage potential theory techniques which in particular yield quasi-optimal approximation rates on unions of disconnected domains. Our main results are obtained by converting these into approximation results on atomic properties, analysing their qualitative features, and taking care to obtain sharp estimates in the zero-Fermi-temperature limit.
These initial results provide strong evidence for the accuracy of a linear bodyorder approximation in relatively simple scenarios, and would for example be useful in a study of the mechanical response of single crystals with a limited selection of possible defects. However, they come with limitations that we discuss in the main text. In response, we consider a much more general framework, generalizing the theory of bond order potentials [55], that incorporates our linear body-ordered model as well as a range of nonlinear models. We will highlight a specific nonlinear construction with significantly improved theoretical properties over the linear scheme.
For both the linear and nonlinear body-ordered approximation schemes we prove that they inherit regularity, symmetries and locality of the original quantity of interest.
Finally, we consider the case of self-consistent tight-binding models such as DFTB [33,59,78]. In this case the highly nonlinear charge-equilibration leads in principle to arbitrarily complex intermixing of the nuclei information, and thus arbitrarily high body-order. However, our results on the body-ordered approximations for linear tight-binding models mean that each iteration of the self-consistent field (SCF) iteration can be expressed in terms of a low body-ordered and local interaction scheme. This leads us to propose a self-similar compositional representation of atomic properties that is highly reminiscent of recurrant neural network architectures. Each "layer" of this representation remains "simple" in the sense that we specified above.

Tight binding model
We suppose is a finite or countable index set. For ∈ , we denote the state of atom by u = (r , v , Z ) where r ∈ R d denotes the position, v the effective potential, and Z the atomic species of . Moreover, we define r k := r k − r , r k := |r k |, and u k := (r k , v , v k , Z , Z k ). For functions f of the relative atomic positions u k , the gradient denotes the gradient with respect to the spatial variable: ∇ f (u k ) := ∇ ξ → f ((ξ, v , v k , Z , Z k )) ξ =r k . The whole configuration is denoted by u = (r, v, Z ) = ({r } ∈ , {v } ∈ , {Z } ∈ ).
For a given configuration u, the tight binding Hamiltonian takes the following form: (TB) For , k ∈ and N b atomic orbitals per atom, we suppose that where h and t have values in R N b ×N b , are independent of the effective potential v, and are continuously differentiable with for some h 0 , γ 0 > 0. Moreover, we suppose the Hamiltonian satisfies the following symmetries: 3) are independent of the atomic sites , k, m ∈ .
(ii) Pointwise bounds on |h(u k )| and |t (u m , u km )| are normally automatically satisfied since most linear tight binding models impose finite cut-off radii. Moreover, the assumption on the derivatives |∇h(u k )| and |∇t (u m , u km )| states that there are no long range interactions in the model. In particular, we are assuming that Coulomb interactions have been screened, a typical assumption in many practical tight binding codes [20,68,71].
(iii) The Hamiltonian is symmetric and thus the spectrum is real.
(iv) The operators H(u) and H(Qu) are similar, and thus have the same spectra.
(vi) The entries of H(u) k ∈ R N b ×N b will be denoted H(u) ab k for 1 a, b N b . When clear from the context, we drop the argument (u) in the notation.
The assumptions (TB) define a general three-centre tight binding model, whereas, if t ≡ 0, a simplification made in the majority of tight binding codes, we say (TB) is a two-centre model [38].
The choice of potential in (TB) defines a hierarchy of tight binding models. If v = const, (TB) defines a linear tight binding model, a simple yet common model [14,16,17,70]. In this case, we implicitly assume that the Coulomb interactions have been screened, a typical assumption made in practice for a wide variety of materials [20,68,71,72]. Supposing that v is a function of a self-consistent electronic density, we arrive at a non-linear model such as DFTB [33,59,78]. Abstract variants of these nonlinear models have been analysed, for example, in [93,99]. Through much of this article we will treat r, v as independent inputs into the Hamiltonian, but will discuss their connection and self-consistency in § 2.7. For a finite system u (that is, with a finite set), we consider analytic observables of the density of states [14,93]: for functions O : R → R that can be analytically continued into an open neighbourhood of σ H(u) , we consider that where (λ s , ψ s ) are normalised eigenpairs of H(u). Many properties of the system, including the particle number functional and Helmholtz free energy, may be written in this form [14,16,70,93]. By distributing these quantities amongst atomic positions, we obtain a well-known spatial decomposition [14,16,35,38], (2.4) For infinite systems, we may define O (u) through the thermodynamic limit [14,16] or via the holomorphic functional calculus; see § 4.1.2 for further details. When discussing derivatives of the local observables, we will simplify notation and write

Local observables
Although the results in this paper apply to general analytic observables, our primary interest is in applying them to two special cases. A local observable of particular importance is the electron density; for inverse Fermitemperature β ∈ (0, +∞] and fixed chemical potential μ, we use the notation of (2.4) to define (2.6) Throughout this paper F β (u) := F β (u) ∈ will denote a vector and so (2.6) reads ρ = F β (u). In § 2.7, we consider the case where the effective potential is a function of the electron density (2.6) (that is, v = w(ρ) for some w : R → R ) which leads to the self-consistent local observables where u(ρ) := r, w(ρ), Z .

Remark 2.
All the results of this paper also hold for the off-diagonal entries of the density matrix (ρ k := tr F β H(u) k ) without any additional work. This fact will be clear from the proofs. It is likely though that additional properties related to the off-diagonal decay (near-sightedness) and spatial regularity further improve the "sparsity" of the density matrix. A complete analysis would go beyond the scope of this work.
The second observable we are particularly interested in is the site energy, which allows us to decompose the total potential energy landscape into localised "atomic" contributions. In the grand potential model for the electrons, which is appropriate for large or infinite condensed phase systems [14], it is defined as The total grand potential is defined as G β (u) [14,70].
For β < ∞, the functions F β ( · ) and G β ( · ) are analytic in a strip of width πβ −1 about the real axis [17,Lemma 5.1]. To define the zero Fermi-temperature observables, we assume that μ lies in a spectral gap (μ ∈ σ H(u) ; see § 2.1.3). In this case, F β ( · ) and G β ( · ) extend to analytic functions in a neighbourhood of σ H(u) for all β ∈ (0, ∞]. In order to describe the relationship between the various constants in our estimates and the inverse Fermi-temperature or spectral gap (in the case of insulators), we will state all of our results for O β = F β or G β . Other analytic quantities of interest can be treated similarly with constants depending, e.g., on the region of analyticity of the corresponding function z → O(z).

Metals, insulators, and defects
As we can see from (2.4), the structure of the spectrum σ H(u) will have a key role in the analysis. Firstly, by (TB), H(u) is a bounded self-adjoint operator on 2 ( ×{1, . . . , N b }) and thus the spectrum is real and contained in some bounded interval. In order to keep the mathematical results general, we will not impose any further restrictions on the spectrum. However, to illustrate the main ideas, we briefly describe typical spectra seen in metals and insulating systems.
In the case where u describes a multi-lattice in R d formed by taking the union of finitely many shifted Bravais lattices, the spectrum σ H(u) is the union of finitely many continuous energy bands [57]. That is, there exist continuous functions, ε α : BZ → R, on the Brillouin zone BZ, a compact connected subset of R d , such that In particular, in this case, σ H(u) = σ ess H(u) is the union of finitely many intervals on the real line. The band structure {ε α } relative to the position of the chemical potential, μ, determines the electronic properties of the system [89]. In metals μ lies within a band, whereas for insulators, μ lies between two bands in a spectral gap. Schematic plots of these two situations are given in Figure 1.
In particular, if u ref describes a multilattice, then, since local perturbations in the defect core are of finite rank, the essential spectrum is unchanged and we obtain finitely many eigenvalues bounded away from the spectral bands. Moreover, a small global perturbation can only result in a small change in the spectrum. Again, a schematic plot of this situation is given in Figure 2.
For the remainder of this paper, we consider the following notation: and max I − μ min I + . Moreover, we define g := min I + − max I − 0, and (2.10) The constants in Definition 1 are also displayed in Figure 2. The constant g in Definition 1 is slightly arbitrary in the sense that as long as where δ is the constant from Proposition 2.1), then there exists a finite set {λ j } as in (2.9). Choosing smaller g reduces the size of the set {λ j }.

Vacuum cluster expansion
For a system of M identical particles X 1 , . . . , X M , a maximal body-order N , and a permutation invariant energy E = E({X 1 , . . . , X M }), we may consider the vacuum cluster expansion, where the n-body interaction potentials V (n) are defined by considering all isolated clusters of j n atoms: The expansion (2.12) is exact for N = M. The vacuum cluster expansion is the traditional and, arguably, the most natural many-body expansion of a potential energy landscape. However, in many systems, it converges extremely slowly with respect to the body-order N and is thus computationally impractical. An intuitive explanation for this slow convergence is that, when defining the body-order expansion in this way, we are building an interaction law for a condensed or possibly even crystalline phase material from clusters in vacuum where the bonding chemistry is significantly different. Although this observation appears to be "common knowledge" we were unable to find references that provide clear evidence for it. However, some limited discussions and further references can be found in [10,25,27,46,86]. Our own approach employs an entirely different mechanism, which in particular incorporates environment information and leads to an exponential convergence of an N -body approximation. Technically, our approximation is not an expansion, that is, the n-body terms V (n) of the classical cluster expansion are replaced by terms that depend also on the highest body-order N . We will provide a more technical discussion contrasting our results with the vacuum cluster expansion in § 2.6.

A general framework
Before we consider two specific body-ordered approximations, we present a general framework which both incorporates many (linear-scaling) electronic structure methods from the literature (e.g. the kernel polynomial method (KPM) [82], bond-order potentials (BOP) [26,39,55,74], and quadrature-based methods [69,87,88]), and illustrates the key features needed for a convergent scheme: To that end, we introduce the local density of states (LDOS) [38] which is the (positive) measure D supported on σ (H) such that for n ∈ N 0 . (2.13) Existence and uniqueness follows from the spectral theorem for normal operators (e.g. see [1,Theorem 6.3.3] or [92]). In particular, (2.4) may be written as the integral O (u) = O dD . Then, on constructing a (possibly signed) unit measure D N with exact first N moments (that is, x n dD N (x) = tr[H n ] for n = 1, . . . , N ), we may define the approximate local observable O N (u) := O dD N , and obtain the general error estimates 14) where P N denotes the set of polynomials of degree at most N , and · op is the operator norm on a function space (S, · ∞ ). For example, we may take S to be the set of functions analytic on an open set containing C , a contour encircling supp D − D N , and consider Alternatively, we may consider S = L ∞ supp(D − D N ) leading to the total variation operator norm. Equation (2.14) highlights the key generic features that are crucial ingredients in obtaining convergence results: • Analyticity. The potential theory results of § 4.1.5 connect the asymptotic convergence rates for polynomial approximation to the size and shape of the region of analyticity of O. • Spectral Pollution. While suppD ⊂ σ (H), this need not be true for D N . Indeed, if suppD N introduces additional points within the band gap, this may significantly slow the convergence of the polynomial approximation; cf. § 2.6. • Regularity of D N . Roughly speaking, the first term of (2.14) measures how "well-behaved" D N is. In particular, if D N is positive, then this term is bounded independently of N , whereas, if D N is a general signed measure, then this factor contributes to the asymptotic convergence behaviour.
In the sections to follow, we introduce linear ( § 2.4) and nonlinear ( § 2.5) approximation schemes that fit into this general framework. Moreover, in § 2.6, we also write the vacuum cluster expansion as an integral against an approximate LDOS. In order to complement the intuitive explanation for the slow convergence of the vacuum cluster expansion, we investigate which of the requirements listed above fail.
In the appendices, we review other approximation schemes that fit into this general framework such as the quadrature method (Appendix D), numerical bond order potentials (Appendix E), and the kernel polynomial method (Appendix F).

Linear body-ordered approximation
We will construct two distinct but related many-body approximation models. To construct our first model we exploit the observation that polynomial approximations of an analytic function correspond to body-order expansions of an observable.
An intuitive approach is to write the local observable in terms of its Chebyshev expansion and truncate to some maximal polynomial degree. The corresponding projection operator is a simple example of the kernel polynomial method (KPM) [82] and the basis for analytic bond order potentials (BOP) [74]. We discuss in Appendix F that these schemes put more emphasis on the approximation of the local density of states (LDOS) and, in particular, exploit particular features of the Chebyshev polynomials to obtain a positive approximate LDOS. Since our focus is instead on the approximation of observables, we employ a different approach that is tailored to specific properties of the band structure and leads to superior convergence rates for these quantities.
For a set of N + 1 interpolation points X N = {x j } N j=0 , and a complex-valued function O defined on X N , we denote by I X N O the degree N polynomial interpolant of x → O(x) on X N . This gives rise to the body-ordered approximation .
We may connect (2.15) to the general framework in § 2.3 by defining 16) and j are the node polynomials corresponding to X N = {x j } N j=0 (that is, j are the polynomials of degree N with j (x i ) = δ i j ). (2.17) Proof. ( has finite body order. Each term in (2.18) depends on the central atom , the n − 1 neighbouring sites 1 , . . . , n−1 , and the at most n additional sites arising from the three-centre summation in the tight binding Hamiltonian (TB). In particular, (2.15) has body order at most 2N . See § 4.2 for a complete proof including an explicit definition of the V nN .
If one uses Chebyshev points as the basis for the body-ordered approximation (2.15), the rates of convergence depend on the size of the largest Bernstein ellipse (that is, ellipses with foci points ±1) contained in the region of analyticity of z → O(z) [95]. This leads to a exponentially convergent body-order expansion in the metallic finite-temperature case (see § 4.1.4 for the details).
However, the resulting estimates deteriorate in the zero-temperature limit. Instead, we apply results of potential theory to construct interpolation sets X N that are adapted to the spectral properties of the system (see § 4.1.5 for examples) and (i) do not suffer from spectral pollution, and (ii) (asymptotically) minimise the total variation of D N ,lin which, in this context, is the Lebesgue constant [95] for the interpolation operator I X N . This leads to rapid convergence of the body-order approximation based on (2.15). The interpolation sets X N depend only on the intervals I − , I + from Definition 1 (see also Figure 2 where O β = F β or G β and C 1 , C 2 , η > 0 are independent of N . The asymptotic convergence rate γ := lim N →∞ γ N is positive and exhibits the asymptotic behaviour In this asymptotic relation, we assume that the limit g → 0 is approached symmetrically about the chemical potential μ. Remark 3. Higher derivatives may be treated similarly under the assumption that higher derivatives of the tight binding Hamiltonian (TB) exist and are short ranged.

The role of the point spectrum
We now turn towards the important scenario when a localised defect is embedded within a homogeneous crystalline solid. Recall from § 2.1.3 (see in particular Fig. 2) that this gives rise to a discrete spectrum, which "pollutes" the band gap [70]. Thus, the spectral gap is reduced and a naive application of Theorem 2.3 leads to a reduction in the convergence rate of the body-ordered approximation. We now improve these estimates by showing that, away from the defect, we obtain improved pre-asymptotics, reminiscent of similar results for locality of interaction [17].
In that follow, we fix u satisfying Definition 1. While improved estimates may be obtained by choosing {λ j } as interpolation points, leading to asymptotic exponents that are independent of the defect, in practice, this requires full knowledge of the point spectrum. Since the point spectrum within the spectral gap depends on the whole atomic configuration, the approximate quantities of interest corresponding to these interpolation operators would no longer satisfy Proposition 2.2.
Remark 4. This phenomenon has been observed in the context of Krylov subspace methods for solving linear equations Ax = b where outlying eigenvalues delay the convergence by O(1) steps without affecting the asymptotic rate [30]. Indeed, since the residual after n steps may be written as r n = p n (A)r 0 where p n is a polynomial of degree n, there is a close link between polynomial approximation and convergence of Krylov methods.
On the other hand, we may use the exponential localisation of the eigenvectors corresponding to isolated eigenvalues to obtain pre-factors that decay exponentially as |r | → ∞.
where O β = F β or G β and C 3 , C 4 > 0 are independent of N . The asymptotic convergence rate γ def := lim N →∞ γ def N is positive and we have In these asymptotic relations, we assume that the limits g def , g → 0 are approached symmetrically about the chemical potential μ.
In practice, Theorem 2.4 means that, for atomic sites away from the defectcore, the observed pre-asymptotic error estimates may be significantly better than the asymptotic convergence rates obtained in Theorem 2.3.

Remark 5. (Locality) (i) By Theorem 2.4, and the locality estimates for the exact observables O
β [17], we immediately obtain corresponding locality estimates for the approximate quantities: We investigate another type of locality in Appendix B where we show that various truncation operators result in approximation schemes that only depend on a small atomic neighbourhood of the central site. An exponential rate of convergence as the truncation radius tends to infinity is obtained.

Remark 6. (Connection to the general framework)
The fact that the exponents in Theorem 2.4 depend on the discrete eigenvalues of H(u) can be seen from the general estimate (2.14) applied to the approximate LDOS D N ,lin from (2.16): • Spectral Pollution. We choose the interpolation points so that the support of D N ,lin lies within σ H(u) and so spectral pollution does not play a role, • Regularity of D N .lin . The total variation of D N ,lin can be estimated by the Lebesgue constant [95] for the interpolation operator I X N : This quantity depends on the discrete eigenvalues within the band gap.

A non-linear representation
The method presented in § 2.4 approximates local quantities of interest by approximating the integrand O : C → C with polynomials. As we have seen, this leads to approximation schemes that are linear functions of the spatial correlations {[H n ] } n∈N . In this section, we construct a non-linear approximation related to bond-order potentials (BOP) [26,39,55] and show that the added non-linearity leads to improved asymptotic error estimates that are independent of the discrete spectra lying within the band gap. In this way, the nonlinearity captures "spectral information" from H rather than only approximating O : C → C without reference to the Hamiltonian.
Applying the recursion method [49,50], a reformulation of the Lanczos process [61], we obtain a tri-diagonal (Jacobi) operator T on 2 (N 0 ) whose spectral measure is the LDOS D [91] (see § 4.3.1 for the details). We then truncate T by taking the ). By showing that the first N moments of D N ,nonlin are exact, we are able to apply (2.14) to obtain the following error estimates. The asymptotic behaviour of the exponent in these estimates follows by proving that the spectral pollution of D N ,nonlin in the band gap is sufficiently mild.

Remark 7.
It is important to note that N : U → C can be constructed without knowledge of H because, as we have seen, if the discrete eigenvalues are known a priori, then Theorem 2.5 is immediate from Theorem 2.4 by adding finitely many additional interpolation points on the discrete spectrum.
In particular, the fact that N is a material-agnostic nonlinearity has potentially far-reaching consequences for material modelling. Remark 9. (Quadrature Method) Alternatively, we may use the sequence of orthogonal polynomials [40] corresponding to D as the basis for a Gauss quadrature rule to evaluate local observables. This procedure, called the Quadrature Method [51,69], is a precursor of the bond order potentials. Outlined in Appendix D, we show that it produces an alternative scheme also satisfying Theorem 2.5.
The linear-scaling spectral Gauss quadrature (LSSGQ) method [87] is based upon this idea, albeit in the context of finite difference approximations to the DFT Hamiltonian. However, since the resulting discrete Hamiltonian in [87] is banded, the analysis of the present work may be readily applied. Therefore, Theorem 2.5 provides rigorous justification for the exponential rate of convergence for increasing body-order (number of quadrature points), complementing the intuitive explanations and numerical experiments of [87].
Since the convergence results are independent of system size, we obtain a linearscaling method, a result that complements the intuitive explanation [87, (56)], and numerical evidence [87, Fig. 5].

Remark 10. (Convergence of Derivatives)
In this more complicated nonlinear setting, obtaining results such as (2.21) is more subtle. We require an additional assumption on D , which we believe maybe be typically satisfied, but we currently cannot justify it and have therefore postponed this discussion to Appendix C. We briefly mention, however, that if D is absolutely continuous (e.g., in periodic systems), we obtain

The vacuum cluster expansion revisited
For ∈ , we denote by H ;K the Hamiltonian matrix corresponding to the finite subsystem { } ∪ K ⊂ : for k 1 , k 2 ∈ { } ∪ K , For an observable O, the vacuum cluster expansion as detailed in § 2.2 is constructed as follows: Therefore, on defining the spectral measure D ; the are normalised eigenpairs of H ;K , we may write the vacuum cluster expansion as in § 2.3: While D N ,vac is a generalised signed measure (with values in R ∪ {±∞}), all moments are finite. More specifically, if we absorb the effective potential and two centre terms into the three centre summation by writing H k 1 k 2 = m H k 1 k 2 m , see (4.16), we have Equation (2.30) follows from the proof of Proposition 2.2, see (4.19). In particular, the first N moments of D N ,vac are exact. Therefore, we may apply the general error estimate (2.14) and describe the various features of D N ,vac which provide mathematical intuition for the slow convergence of the vacuum cluster expansion: • Spectral Pollution. When splitting the system up into arbitrary subsystems as is the case in the vacuum cluster expansion, one expects significant spectral pollution in the band gaps, leading to a reduction in the convergence rate, • Regularity of D N ,vac . The approximate LDOS is a linear combination of countably many Dirac deltas and does not have bounded variation. Moreover, D N ,vac has values in R ∪ {±∞}.

Self-consistency
Throughout this section, we suppose that the effective potential is a function of a self-consistent electron density: that is, (2.6) becomes the following nonlinear equation: where u(ρ) := r, w(ρ), Z . We shall assume that the effective potential satisfies the following: Remark 11. (i) For a smooth function w : R → R, the effective potential w(ρ) := w(ρ ) satisfies (EP). This leads to the simplest abstract nonlinear tight binding models discussed in [93,99].
r m e −τ r m (for some τ > 0) also fits into this general framework. This setting already covers many important modelling scenarios and also serves as a crucial stepping stone towards charge equilibration under full Coulomb interaction, which goes beyond the scope of the present work.
The main result of this section is the following: if there exists a self-consistent solution ρ to (2.31), then we can approximate ρ with self-consistent solutions to the following approximate self-consistency equation: for sufficiently large N . The operator I X N F β is a linear body-ordered approximation of the form we analyzed in detail in § 2.4.
To do this, we require a natural stability assumption on the electronic structure problem, which was employed for example in [93,99,100]: Remark 12. (Stability) (i) The stability condition of Theorem 2.6 is a minimal starting assumption that naturally arises from the analysis [93,99,100].
For example, if ρ is a stable self-consistent electron density, then there exists φ (m) ∈ 2 ( ) such that [93]: (ii) As noted in [99] (in a slightly simpler setting), the stability condition of Theorem 2.6 is automatically satisfied for multi-lattices with ∇w positive semi-definite. In fact, in this case the stability operator is negative semi-definite.
where γ N are the constants from Theorem 2.3 applied to u(ρ ).
In order for this result to be of any practical use, we need to solve the non-linear equation (2.32) for the electron density via a self-consistent field (SCF) procedure. Supposing we have the electron density ρ i and corresponding state u i := u(ρ i ) after i iterations, we diagonalise the Hamiltonian H(u i ) and hence evaluate the output density ρ out = I X N F β (u i ). At this point, since the simple iteration ρ i+1 = ρ out does not converge in general, a mixing strategy, possibly combined with Anderson acceleration [19], is used in order to compute the next iterate. The analysis of such mixing schemes is a major topic in electronic structure and numerical analysis in general and so we only present a small step in this direction.
A more thorough treatment of these SCF results is beyond the scope of this work. See [12,53,63] for recent results in the context of Hartree-Fock and Kohn-Sham density functional theory. For a recent review of SCF in the context density functional theory, see [101].
Remark 13. It is clear from the proofs of Theorems 2.6 and 2.9 that as long as the approximate scheme F β,N satisfies then we may approximate (2.31) with approximate self-consistent solutions ρ N = F β,N u(ρ N ) . In particular, as long as we have the estimate from Remark 10 (see Appendix C for the technical details), then we may use the nonlinear approximation scheme N from Theorem 2.5 in Theorems 2.6 and 2.9 . In this case, we obtain error estimates that are (asymptotically) independent of the discrete spectrum.

Remark 14.
In the linear-scaling spectral Gauss quadrature (LSSGQ) method [87], a self-consistent field iteration analogous to (2.32) is proposed. In particular, with the caveats outlined in Remark 13 taken into consideration, Theorem 2.6 goes some way to rigorously justify the exponential rate of convergence observed numerically in [87, Fig. 4].

Conclusions and Discussion
The main result of this work is a sequence of rigorous results about bodyordered approximations of a wide class of properties extracted from tight-binding models for condensed phase systems, the primary example being the potential energy landscape. Our results demonstrate that exponentially fast convergence can be obtained, provided that the chemical environment is taken into account. In the spirit of our previous results on the locality of interaction [16,17,93], these provide further theoretical justification-albeit qualitative-for widely assumed properties of atomic interactions. More broadly, our analysis illustrates how to construct general low-dimensional but systematic representations of high-dimensional complex properties of atomistic systems. Our results, as well as potential generalisations, serve as a starting point towards a rigorous end-to-end theory of multi-scale and coarse-grained models, including but not limited to machine-learned potential energy landscapes.
In the following paragraphs we will make further remarks on the potential applications of our results, and on some apparent limitations of our analysis.

Representation of atomic properties
Our initial motivation for studying the body-order expansion was to explain the (unreasonable?) success of machine-learned interatomic potentials [5,8,80], and our remarks will focus on this topic, however in principle they apply more generally.
Briefly, given an ab initio potential energy landscape (PEL) E QM for some material one formulates a parameterised interatomic potential and then "learn" the parameters θ by fitting them to observations of the reference PEL E QM . A great variety of such parameterisations exist, including but not limited to neural networks [8], kernel methods [5] and symmetric polynomials [2,25,80]. Symmetric polynomials are linear regression schemes where each basis function has a natural body-order attached to it. It is particularly striking that for very low body-orders of four to six these schemes are able to match and often outperform the more complex nonlinear regression schemes [66,80,106]. Our analysis in the previous sections provides a partial explanation for these results, by justifying why one may expect that a reference ab initio PEL intrinsically has a low body-order. Moreover, classical approximation theory can now be applied to the body-ordered components as they are finite-dimensional to obtain new approximation results where the curse of dimensionality is alleviated.
Our results on nonlinear representations are less directly applicable to existing MLIPs, but rather suggest new directions to explore. Still, some connections can be made. The BOP-type construction of § 2.5, points towards a blending of machine-learning and BOP techniques that have not been explored to the best of our knowledge. A second interesting connection is to the overlap-matrix based fingerprint descriptors (OMFPs) introduced in [105] where a global spectrum for a small subcluster is used as a descriptor, while (3.1) can be understood as taking the projected spectrum as the descriptor. Thus, Theorem 2.5 suggests (1) an interesting modification of OMFPs which comes with guaranteed completeness to describe atomic properties; and (2) a possible pathway towards proving completeness of the original OMFPs. Finally, our self-consistent representation of § 2.7 motivates how to construct compositional models, reminiscent of artificial neural networks, but with minimal nonlinearity that is moreover physically interpretable. Although we did not pursue it in the present work, this is a particularly promising starting point to incorporate meaningful electrostatic interaction into the MLIPs framework.

Linear body-ordered approximation: the preasymptotic regime
Possibly the most significiant limitation of our analysis of the linear bodyordered approximation scheme is that the estimates deteriorate when defects cause a pollution of the point spectrum. Here, we briefly demonstrate that this appears to be an asymptotic effect, while in the pre-asymptotic regime this deterioration is not noticable.
To explore this we choose a union of intervals E ⊇ σ (H) and a polynomial P N of degree N and note We then construct interpolation sets (Fejér sets) such that the corresponding polynomial interpolant gives the optimal asymptotic approximation rates (for details of this construction, see §4.1.5- §4.1.8). We then contrast this with a best L ∞ (E)approximation, and with the nonlinear approximation scheme from Theorem 2.5. We will observe that the non-linearity leads to improved asymptotic but comparable pre-asymptotic approximation errors.

Fig. 3. Approximation errors for Chebyshev projection (green), polynomial interpolation in
Fejér sets on E j (black), best L ∞ (E j ) polynomial approximation (blue), and, for j = 2, errors in the nonlinear approximation scheme (red). We also plot the corresponding predicted asymptotic rates (from (4.5), (4 As a representative scenario we consider the Fermi-Dirac distribution F β (z) = (1+e βz ) −1 with β = 100 and both the "defect-free" case Then, for fixed polynomial degree N and j ∈ {1, 2}, we construct the (N + 1)-point Fejér set for E j and the corresponding polynomial interpolant I j,N F β . Moreover, we consider a polynomial P j,N of degree N minimising the right hand side of (3.2) for E = E j . Then, in Figure 3, we plot the errors F β − I j,N F β L ∞ (E j ) and F β − P j,N L ∞ (E j ) for both j = 1 (Fig. 3a) and j = 2 (Fig. 3b) against the polynomial degree N together with the theoretical asymptotic convergence rates for best L ∞ (E j ) polynomial approximation (4.15).
What we observe is that, as expected, introducing the interval [c, d] into the approximation domain drastically affects the asymptotic convergence rate and the errors in the approximation based on interpolation. While the best approximation errors follow the asymptotic rate for larger polynomial degree, it appears that, pre-asymptotically, the errors are significantly reduced. We also see that the approximation errors are significantly better than the general error estimate Moreover, in Figure 3b, we plot the errors when using a nonlinear approximation scheme satisfying Theorem 2.5. In this simple experiment, we consider the Gauss quadrature rule N := I X 1 While D does not correspond to a physically relevant Hamiltonian, the same procedure may be carried out for any measure supported on E 1 with supp D ∩ [c, d] finite. Then plotting the errors |F β − N |, we observe improved asymptotic convergence rates that agree with that of the "defect-free" case from Figure 3a. However, the improvement is only observed in the asymptotic regime which corresponds to body-orders never reached in practice.

Preliminaries
Here, we introduce the concepts needed in the proofs of the main results.

Hermite integral formula
If, in addition, C encircles {z}, then The proof of these facts is a simple application of Cauchy's integral formula, [3,95].

Resolvent calculus
where C is a simple closed positively oriented contour (or system of contours) contained in the region of analyticity of O and encircling the spectrum σ (H).
The following Combes-Thomas resolvent estimate [21] will play a key role in the analysis: Then, there exists a constant C > 0 such that and γ CT := c min{1, d} and c > 0 depends on h 0 , γ 0 , d and min =k r k .
Proof. A proof with γ CT depending instead on dist z, σ H(u) can be found in [16]. A low-rank update formula leads to the improved "defect-independent" result [17] where the exponent only depends on the distance between z and the reference spectrum. See [93] for an explicit description of γ CT in terms of the constants γ 0 , d and the non-interpenetration constant min =k r k .
A key observation for arguments involving forces (or more generally, derivatives of the analytic quantities of interest) is that the Combes-Thomas estimate allows us to bound derivatives of the resolvent operator: where γ CT is the Combes-Thomas constant from Lemma 1 and γ 0 is the constant from (TB).
Proof. This result can be found in the previous works [14,16,17], but we give a brief sketch for completeness. Derivatives of the resolvent have the following form: The result follows by applying the Combes-Thomas resolvent estimates together with the fact that the Hamiltonian is short-ranged (TB).
Assuming that the Hamiltonian has higher derivatives that are also short-ranged, higher order derivatives of the resolvent can be treated similarly [16].

Local observables
Firstly, we note that F β ( · ) is analytic away from the simple poles at πβ −1 (2Z+1). Moreover, G β ( · ) can be analytically continued onto the open set C \ μ + ir : r ∈ R, |r | πβ −1 [17]. Therefore, we may consider (4.3) with O = F β or G β and a contour C β encircling σ (H) and avoiding C \ μ + ir : r ∈ R, |r | πβ −1 . Therefore, we may choose C β so that the constant d, from Lemma 1, is proportional to β −1 . Moreover, if there is a spectral gap, the constant d is uniformly bounded below by a positive constant multiple of g as β → ∞.
In the case of insulators at zero Fermi-temperature, we take C ∞ encircling σ (H(u)) ∩ (−∞, μ) and avoiding the rest of the spectrum. Therefore, we may choose C ∞ so that the constant d, from Lemma 1, is proportional to g.

Chebyshev Projection and Interpolation in Chebyshev Points
For O β = F β or G β , these estimates give an exponential rate of convergence with exponent depending on ∼ β −1 . Indeed, after scaling H so that the spectrum is contained in and we conclude by directly applying (4.5). The same estimate also holds for I N (or any polynomial).
For full details of all the statements made in this subsection, see [95].

Classical logarithmic potential theory
In this section, we give a very brief introduction to classical potential theory in order to lay out the key notation. For a more thorough treatment, see [75] or [37,62,76,95]. It can be seen from the Hermite integral formula (4.2) that the approximation error for polynomial interpolation may be determined by taking the ratio of the size of the node polynomial X at the approximation points to the size of X along an appropriately chosen contour. Logarithmic potential theory provides an elegant mechanism for choosing the interpolation points so that the asymptotic behaviour of X can be described.
We suppose that E ⊂ C is a compact set. We will see that choosing the interpolation nodes as to maximise the geometric mean of pairwise distances provides a particularly good approximation scheme: Any set F n ⊂ E attaining this maximum is known as a Fekete set. It can be shown that the quantities δ n (E) form a decreasing sequence and thus converges to what is known as the transfinite diameter: τ (E) := lim n→∞ δ n (E).
We let n (z) denote the node polynomial corresponding to a Fekete set and note that Therefore, rearranging (4.8), we obtain lim n→∞ n τ (E). In fact, this inequality can be replaced with equality, showing that Fekete sets allow us to describe the asymptotic behaviour of the node polynomials on the domain of approximation.
To extend these results, it is useful to recast the maximisation problem (4.7) into the following minimisation problem, describing the minimal logarithmic energy attained by n particles lying in E with the repelling force 1/|z i − z j | between particles i and j lying at positions z i and z j , respectively: Fekete sets can therefore be seen as minimal energy configurations and described by the normalised counting measure ν n := 1 n n j=1 δ z j where F n = {z j } n j=1 . The minimisation problem (4.9) may be extended for general unit Borel measures μ supported on E by defining the logarithmic potential and corresponding total energy by The infimum of the energy over the space of unit Borel measures supported on E, known as the Robin constant for E, will be denoted −∞ < V E +∞. The capacity of E is defined as cap(E) := e −V E and is equal to the transfinite diameter [34]. Using a compactness argument, it can be shown that there exists an equilibrium measure ω E with I (ω E ) = V E and, in the case V E < ∞, by the strict convexity of the integral, ω E is unique [77].
V E for all z ∈ C, with equality holding on E except on a set of capacity zero (we say this property holds quasi-everywhere).
Moreover, if cap E > 0, then it can be shown that the normalised counting measures, ν n , corresponding to a sequence of Fekete sets weak-converges to ω E . Since U ν n (z) = 1 n log 1 | n (z)| , the weak-convergence allows one to conclude that uniformly on compact subsets of C \ E. Here, we have defined the Green's function g E (z) := V E − U ω E (z), which describes the asymptotic behaviour of the node polynomials corresponding to Fekete sets. We therefore wish to understand the Green's function g E . , respectively. We also plot the image of an 10 × 10 equi-spaced grid. A parameter problem is solved in order to obtain z 3 and thus ω 3 and ω 2 = ω 4 whereas the other constants are fixed. Here, we take z 1 = −1, z 2 = −ε, z 4 = ε, z 5 = 1, ω 1 = iπ, ω 5 = 0 with ε = 0.3

Construction of the Green's function
Now we restrict our attention to the particular case where E ⊂ R is a union of finitely many compact intervals of non-zero length.
It can be shown that the Green's function g E satisfies the following Dirichlet problem on C \ E [75]: In fact, it can be shown that (4.11) admits a unique solution [75] and thus (4.11) is an alternative definition of the Green's function. Using this characterisation, it is possible to explicitly construct the Green's function g E as follows. In the upper half plane, , G E (min E) = iπ , and G E (max E) = 0. Using the symmetry of E with respect to the real axis, we may extend Re(G E (z)) to the whole complex plane via the Schwarz reflection principle. Then, one can easily verify that this analytic continuation satisfies (4.11). Since the image of G E is a (generalised) polygon, z → G E (z) is an example of a Schwarz-Christoffel mapping [29]. See Figure 4 for the case E = [−1, −ε] ∪ [ε, 1]. We shall briefly discuss the construction of the Schwarz-Christoffel mapping G E for E = [−1, ε − ]∪[ε + , 1]. We define the pre-vertices z 1 = −1, z 2 = ε − , z 4 = ε + , z 5 = 1 and wish to construct a conformal map G E with G E (z k ) = ω k as in Figure 4. For simplicity, we also define z 0 := −∞ and z 6 := ∞ and observe that because the image is a polygon, arg G E (z) must be constant on each interval (z k−1 , z k ) and z k+1 ), and α k π is the interior angle of the infinite slit strip at vertex ω k (that is, α 1 = α 2 = α 4 = α 5 = 1 2 and α 3 = 2). After defining z α := |z| α e iα arg z where arg z ∈ (−π, π], we can see that for z ∈ (z k−1 , z k ), we have arg 5 j=k (z − z j ) α j −1 = 5 j=k (α j − 1)π and so the jump in the argument of z → 5 j=1 (z − z j ) α j −1 is (1 − α k )π at z k as in (4.12). Therefore, integrating this expression, we obtain Since G E (1) = A, we take A = 0 (to ensure (4.11c) holds). Moreover, since the real part of the integral is ∼ log |z| as |z| → ∞, we apply (4.11b) to conclude B = 1. Finally, we can choose z 3 such that Re G E (z) = 0 for all z ∈ E; that is, For more details, see [37]. We use the Schwarz-Christoffel toolbox [29] in matlab to evaluate (4.13) and plot Figure 5. For the simple case E := [−1, 1], by the same analysis, we can disregard z 2 , z 3 , z 4 and ω 2 , ω 3 , ω 4 and integrate the corresponding expression to obtain the closed form G [−1,1] (z) = log(z + √ z − 1 √ z + 1). A similar analysis allows one to construct conformal maps from the upper half plane to the interior of any polygon. For further details, rigorous proofs and numerical considerations, see [31].

Interpolation nodes
The only difficulty in obtaining (4.10) in practice is the fact that Fekete sets are difficult to compute. An alternative, based on the Schwarz-Christoffel mapping G E , are Fejér points. For equally spaced points {ζ j } n j=1 on the interval i[0, π], the n th Fejér set is defined by {G −1 E (ζ j )} n j=1 . Fejér sets are also asymptotically optimal in the sense that (4.10) is satisfied where n is now the node polynomial corresponding to n-point Fejér set.
Another approach is to use Leja points which are generated by the following algorithm: for fixed z 1 , . . . , z n , the next interpolation node z n+1 is constructed by maximising n j=1 |z j − z| over all z ∈ E. Sets of this form are also asymptotically optimal [90] for any choice of z 1 ∈ E. Since we have fixed the previous nodes z 1 , . . . , z n , the maximisation problem for constructing z n+1 is much simpler than that of (4.7).
More generally, if the normalised counting measure corresponding to a sequence of sets {z j } n j=1 ⊂ E weak-converges to the equilibrium measure ω E , then the corresponding node polynomials satisfy (4.10).

Fig. 5.
Equi-potential curves C r k := {z ∈ C : e g E (z) = r k } for both metals (a) and insulators (b) where 1 2 (r k −r −1 k ) = kπ β for k ∈ {1, 2, 3, 4, 5} and β = 10. In the case of metals (a), the equi-potential curves agree with Bernstein ellipses. We also plot the poles of F β ( · ) which determine the maximal admissible integration contours: for (a), we can take contours C r for all r < r 1 and, for (b), the contour C r 2 can be used for all positive Fermi-temperatures (we have chosen the gap carefully so that C r 2 self-intersects at μ). Shown in black crosses are 30 Fejér points in each case. To create these plots we consider an integral formula for the Green's function z → g E (z) [37] and use the Schwarz-Christoffel matlab toolbox [28,29] to approximate these integrals For the simple case where E = [−1, 1], many systems of zeros or maxima of sequences of orthogonal polynomials are asymptotically optimal in the sense of (4.10). In fact, since the equilibrium measure for [−1, 1] is the arcsine measure [76] dμ [−1,1] any sequence of sets with this limiting distribution is asymptotically optimal. An example of particular interest are the Chebyshev points {cos jπ n } 0 j n given by the n +1 extreme points of the Chebyshev polynomials defined by T n (cos θ) = cos nθ .

Asymptotically optimal polynomial approximations
Suppose that E is the union of finitely many compact intervals of non-zero length and O : E → C extents to an analytic function in an open neighbourhood of E. On defining C γ := {z ∈ C : g E (z) = γ }, we denote by γ the maximal constant for which O is analytic on the interior of C γ . We let P N be the best L ∞ (E)-approximation to O in the space of polynomials of degree at most N and suppose that I N is a polynomial interpolation operator in N + 1 points satisfying (4.11). Then, the Green's function g E determines the asymptotic rate of approximation for not only polynomial interpolation, but also for best approximation: (4.15) For a proof that the asymptotic rate of best approximation is given by the Green's function see [76]. The result for polynomial interpolation uses the Hermite integral formula and (4.10), see (4.20) and (4.22), below.

Linear body-order approximation
In this section, we use the classical logarithmic potential theory from § 4.1.5 to prove the approximation error bounds for interpolation. However, we first show that polynomial approximations lead to body-order approximations: Proof of Proposition 2.2. We first simplify the notation by absorbing the effective potential and two-centre terms into the three-centre summation: there the first two terms in the outer summation are c 0 and c 1 H . Now, for a fixed body-order (n + 1), and k 1 < · · · < k n with k l = , we construct V nN (u ; u k 1 , . . . , u k n ) by collecting all terms in (4.17) with 0 j |X | − 1 and { , 1 , . . . , j−1 , m 1 , . . . , m j } = { , k 1 , . . . , k n }. In particular, the maximal body-order in this expression is 2(|X | − 1) for three-centre models and |X | − 1 in the two-centre case.
Proof of Theorem 2.3. We let N (x) := j (x − x N j ) be the node polynomial for X N := {x N j } N j=0 . Again, we fix the configuration u and consider H := H(u). Supposing that C is a simple closed positively oriented contour encircling σ (H), we apply the Hermite integral formula (4.2) to obtain that , (4.20) where At this point we apply standard results of classical logarithmic potential theory (see, § 4.1.5 or [62]) and conclude by noting that if the interpolation points are asymptotically distributed according to the equilibrium distribution corresponding to E := I − ∪ I + , then after applying (4.10), we have that Here, the equilibrium distribution and the Green's function g E (z) are concepts introduced in § 4.1.5 and § 4.1.6. Therefore, by choosing the contour C := {ξ ∈ C : g E (ξ ) = γ } for 0 < γ < g E (μ + iπβ −1 ), the asymptotic exponents in the approximation error is γ .
The maximal asymptotic convergence rate is given by g E (μ + iπβ −1 ) since C must be contained in the region of analyticity of O β and the first singularity of O β is at μ Examples of the equi-potential level sets C are given in Figure 5.
Using the Green's function results of § 4.1.
where G E is the integral (4.13). The asymptotic behaviour of this maximal asymptotic convergence rate for the separate β → ∞ and g → 0 limits can be found in [37,81]. Here, we consider the β −1 + g → 0 limit where the gap remains symmetric about the chemical potential μ.
For ζ ∈ μ + i[0, πβ −1 ], we have c −1 | √ ζ ± 1| c, and so the integral in (4.23) has the same asymptotic behaviour as where we have used the change of variables ζ = ζ −ε − ε + −ε − . Since the integrands are uniformly bounded along the domain of integration, The constant pre-factor in (4.21) is inversely proportional to the distance dist C , σ (H) between the contour C = {g E = γ } and the spectrum σ (H). In particular, since g E is uniformly Lipschitz with constant L > 0 on the compact region bounded by C , we have: there exists λ ∈ σ (H) and ξ ∈ C such that Therefore, choosing γ to be a constant multiple of g E (μ+iπβ −1 ), we conclude that the constant pre-factor C satisfies C ∼ (g + β −1 ) −1 as g + β −1 → 0.
To extend the body-order expansion results to derivatives (in particular, to forces), we write the quantities of interest using resolvent calculus, apply Lemma 2 to bound the derivatives of the resolvent, and use the Hermite integral formula (4.20) to conclude: for C 1 , C 2 simple closed positively oriented contours encircling the spectrum σ H(u) and C 1 , respectively, we have . (4.25) We conclude by choosing appropriate contours C l = {g E = γ l } for l = 1, 2 and applying (4.22).

The role of the point spectrum
To begin this section, we sketch the proof of Proposition 2.1.

Proof of Proposition 2.1. (i) Sup-norm perturbations.
We suppose that sup k |r k − r ref Therefore, applying standard results from perturbation theory [56, p. 291], we obtain Cδ.
(ii) Finite rank perturbations. The finite rank perturbation result has been presented in [70] in a slightly different setting. We sketch the main idea here for completeness.
Since the essential spectrum is stable under compact (in particular, finite rank) perturbations [56], the set is both compact and discrete and therefore finite.
Proof of Theorem 2.4. Suppose that C is a simple closed contour encircling the spectrum σ H(u) and (λ s , ψ s ) are normalised eigenpairs corresponding to the finitely many eigenvalues outside I − ∪ I + . Therefore, we have that (4.27) The first term of (4.27) may be treated in the same way as in the proof of Theorem 2.3. Moreover, derivatives of this term may be treated in the same way as in (4.25). It is therefore sufficient to bound the remaining term and its derivative.
Firstly, we note that the eigenvectors corresponding to isolated eigenvalues in the spectral gap have the following decay [17]: for C a simple closed positively oriented contour (or system of contours) encircling the {λ s }, we have that 28) where γ CT is the Combes-Thomas constant from Lemma 1 with d = dist C , σ (H(u)) . The constant pre-factor in (4.28) depends on the distance between the contour and the defect spectrum σ H(u) . Similar estimates hold for the derivatives. For full details on the derivation of (4.28), see [17, (5.18)-(5.21)]. Therefore, combining (4.28) and the Hermite integral formula, we conclude as in the proof of Theorem 2.3.

Non-linear body-order approximation
In this section, we prove Theorem 2.5 by applying the recursion method to reformulate the problem into a semi-infinite linear chain and replacing the far-field with vacuum.

Recursion method
In that follows, we briefly introduce the recursion method [49,50], a reformulation of the Lanczos process [61], which generates a tri-diagonal (Jacobi) operator T [91] whose spectral measure is D and the corresponding sequence of orthogonal polynomials [40]. This process provides the basis for constructing approximations to the LDOS giving rise to nonlinear approximation schemes satisfying Theorem 2.5.
Recall that D is the LDOS satisfying (2.13). We start by defining p 0 := 1, a 0 := xdD (x) and b 1 p 1 (x) := x − a 0 where b 1 is the normalising constant to ensure p 1 (x) 2 dD (x) = 1. Then, supposing we have defined a 0 , a 1 , b 1 , . . . , a n , b n and the polynomials p 0 (x), . . . , p n (x), we set Then, { p n } is a sequence of orthogonal polynomials with respect to D (i.e. p n p m dD = δ nm ) and we have that (see Lemma D.1 for a proof). Moreover, we denote by T the infinite symmetric tridiagonal matrix on N 0 with diagonal (a n ) n∈N 0 and off-diagonal (b n ) n∈N .

Remark 15.
It will also prove convenient for us to renormalise the orthogonal polynomials by defining P n (x) := b n p n (x) and b 0 := 1; that is, (4.34) One advantage of this formulation is that it explicitly defines the coefficients {b n }.
Therefore, if we have the first 2N + 1 moments H , . . . , (H 2N +1 ) , it is possible to evaluate Q 2N +1 (H) (that is, Q 2N +1 dD ) for all polynomials Q 2N +1 of degree at most 2N +1, and thus compute T N . In particular, for a fixed observable of interest O, we may write (4.35) Remark 16. In Appendix E we introduce more complex bond order potential (BOP) schemes based on the recursion method and show that they also satisfy Theorem 2.5.

Remark 18.
In Appendix D we show that the eigenvalues of T N (z) are distinct for z in some open neighbourhood, U 0 ⊂ U , of R 2N +1 , which leads to the following alternative proof. On U 0 , the eigenvalues and corresponding left and right eigenvectors can be chosen to be analytic: there exist analytic functions ε j , ψ j , φ j for j = 0, . . . , N such that (More precisely, we apply [44, Theorem 2] to obtain analytic functions ψ j , φ j of each variable z 0 , . . . , z 2N +1 separately and then apply Hartog's theorem [60] to conclude that ψ j , φ j are analytic as functions on U ⊂ C 2N +1 .) Therefore, the nonlinear method discussed in this section can also be written in the form which is an analytic function on {z ∈ U 0 : O analytic at ε j (z) for each j} (as it is a finite combination of analytic functions only involving products, compositions and sums).

Self-consistent tight binding models
We start with the following preliminary lemma: Proof. First, we denote the inverse of T and its matrix entries by T −1 : 2 ( ) → 2 ( ) and T −1 k , respectively. Then, applying the Combes-Thomas estimate to T yields the off-diagonal decay estimate |T −1 k | Ce −γ CT r k for some C, γ CT > 0 [93].
Due to the off-diagonal decay properties of the matrix entries, the operators T , T −1 : ∞ ( ) → ∞ ( ) given by are well defined bounded linear operators with norms sup k∈ |T k | and sup k∈ |T −1 k |, respectively. To conclude, we note that and so T −1 is the inverse of T . Here, we have exchanged the summations over k and m by applying the dominated convergence theorem: Throughout the following proofs, we denote by B r (ρ) the open ball of radius r about ρ with respect to the ∞ -norm. Moreover, we briefly note that the stability operator can be written as the product L (ρ) := F (ρ)∇w(ρ), where [93] where C is a simple closed contour encircling the spectrum σ H(u(ρ)) .
Proof of Theorem 2.6. Since ρ → F β (u(ρ)) is C 2 , and I − L (ρ ) −1 is a bounded linear operator, we necessarily have that I − L (ρ) −1 is a bounded linear operator for all ρ ∈ B r (ρ ) for some r > 0. By applying Theorem 2.3, together with the assumption (EP), we obtain for all ρ ∈ B r (ρ ). As a direct consequence, we have In particular, for such N , the operator I −L N (ρ) : 2 → 2 is invertible with inverse bounded above in operator norm independently of N .
We now show that I − L N (ρ) satisfies the assumptions of Lemma 3. Using (4.47) and (EP), together with the Combes-Thomas estimate (Lemma 1), we conclude that for all ρ ∈ B r (ρ ). In particular, I − L N (ρ) extends to a invertible bounded linear operator ∞ → ∞ and thus its inverse I − L N (ρ) −1 : ∞ → ∞ is bounded. Now, the mapping ρ → ρ − I X N F β u(ρ) between ∞ → ∞ is continuously differentiable on B r (ρ ) and the derivative at ρ is invertible (i.e. I − L N (ρ ) −1 : ∞ → ∞ is a well defined bounded linear operator). Since the map ρ → I X N F β u(ρ) is C 2 , its derivative L N is locally Lipschitz about ρ and so there exists L > 0 such that Moreover, by Theorem 2.3, we have that In particular, we may choose N sufficiently large such that 2b N L < 1 and t N : Thus, the Newton iteration with initial point ρ 0 := ρ , defined by converges to a unique fixed point ρ N = I X N F β (u(ρ N )) in B t N (ρ ) [102,104]. That is, ρ N − ρ ∞ t N 2b N . Here, we have used the fact that 1 − √ 1 − x x for all 0 x 1.
Proof of Proposition 2.9. We proceed in the same way as in the proof of Theorem 2.6. In particular, since ρ N is stable, if ρ 0 − ρ N ∞ is sufficiently small, Here, we have used that Therefore, as long as ρ 0 − ρ N ∞ is sufficiently small, we may apply the Newton iteration starting from ρ 0 to conclude.
Proof of Corollary 2.7. As a direct consequence of (4.51), we have that Here, we have applied the standard convergence result (Theorem 2.3) with fixed effective potential. Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Notation
Here we summarise the key notation: Suppose γ N (r c ) and γ def N (r c ) are the rates of approximation from Theorems 2.4 and 2.5 when applied to H r c . Then γ N (r c ) → γ N and γ def N (r c ) → γ def N as r c → ∞, with an exponential rate.
Proof. We first note that Therefore, applying (TB), we obtain To conclude we choose a suitable contour C and apply the Combes-Thomas estimate (Lemma 1) together with (B.4): As a direct consequence of (B.4), we have also have [56]. This means that for sufficiently large r c , we obtain the same rates of approximation when applying Theorems 2.4 and 2.5 to H r c .

B.2. Truncation
One downside of the banded approximation is that the truncation radius depends on the maximal polynomial degree (e.g. see (B.2)). In this section, we consider truncation schemes that only depend on finitely many atomic sites independent of the polynomial degree: where the restriction of the Hamiltonian has been introduced in (2.26). On defining the quantities where the operators I X N are given by Theorem 2.3, we obtain a sparse representation of the N -body approximation depending only on finitely many atomic sites, independently of the maximal body-order N .
Proposition B.2. Suppose u satisfies Definition 1. Fix 0 < β ∞ and suppose that, if β = ∞, then g, g def > 0. Then, Proof. Applying the Hermite integral formula (4.1) directly, we conclude that I X N O β (z) is bounded uniformly in N along a suitably chosen contour C := {g E = γ } (examples of such contours are given in Figure 5). It is important to note that the contour C must be chosen to encircle both σ (H) and σ ( H r c ).
In the following, we let γ CT be the Combes-Thomas exponent from Lemma 1 corresponding to H.
Similarly to (B.7), we obtain This concludes the proof.
The fact that the exponents of Proposition B.2 are independent of the defect states within the band gap is in the same spirit to the improved locality estimates of [17].

Remark 19.
(Divide-and-conquer Methods) This truncation scheme is closely related to the divide-and-conquer method for solving the electronic structure problem [103]. In this context the system is split into many subsystems that are only related through a global choice of Fermi level. In our notation, this method consists of constructing N DAC smaller Hamiltonians H r c , j centred on the atoms j (for j = 1, . . . , N DAC ) and approximating the quantities O (u) for in a small neighbourhood of j by calculating tr O H r c , j . That is, the eigenvalue problem for the whole system is approximated by solving N DAC smaller eigenvalue problems in parallel. In particular, this method leads to linear scaling algorithms [42]. Theorem B.2 then ensures that the error in this approximation decays exponentially with the distance between and the exterior of the subsystem centred on j .
A similar error analysis in the context of divide-and-conquer methods in Kohn-Sham density functional theory can be found in [18].

Remark 21.
(Non-linear schemes) One may be tempted to approximate the Hamiltonian with the truncation, H r c , and then apply the nonlinear scheme of Theorem 2.5. In doing so, we obtain the following error estimates: A problem with this analysis is that the constant γ N (r c ) in (B.13) arises by applying Theorem 2.5 to H r c rather than the original system H. In particular, this means that γ N (r c ) depends on the spectral properties of H r c rather than H. Since spectral pollution is known to occur when applying naive truncation schemes [64], the choice of H r c is important for the analysis. In particular, it is not clear that γ N (r c ) → γ N in general. This is in contrast the the result of Proposition B.1.

Appendix C. Convergence of Derivatives in the Nonlinear Approximation Scheme
As mentioned in Remark 10, the results of this section depend on the "regularity" properties of D : where g ν g E is the minimal carrier Green's function of ν [85].
Under the regularity condition of Definition 2, we obtain results analogous to (2.21): Theorem C.1. Suppose that u satisfies Definition 1 and ∈ is such that D ∈ Reg. Then, with the notation of Theorem 2.5, we in addition have More generally, if the regularity assumption is not satisfied, it may still be the case that Theorem C.1 holds but with reduced locality exponent η. To formulate this result, we require the notion of minimal carrier capacity: (ii) If c ν > 0, then there exists a minimial carrier equilibrium distribution ω ν , a (uniquely defined) unit measure with supp ω ν ⊂ E satisfying (v) Suppose c ν > 0. Then, on defining ν n to be the discrete unit measure giving equal weight to each of the zeros of p n ( · ; ν), the condition that where ω E is the equilibrium distribution for E, is equivalent to ν ∈ Reg [85,Thm. 3.1.4]. In particular, this justifies (4.10).
We therefore arrive at the corresponding result for ∈ for which the corresponding LDOS has positive minimal carrier capacity: Proposition C.2. Suppose that u satisfies Definition 1 and ∈ such that c D > 0. Then, with the notation of Theorem 2.5, we in addition have and η > 0 is the constant from Theorem C.1.
The proofs of Theorem C.1 and Proposition C.2 follow from the following estimates on the derivatives of the recursion coefficients {a n , b n }, and the locality of the tridiagonal operators T N , together with the asymptotic upper bounds (i.e. Definition 2 or Remark 22).
In the following, we denote by T ∞ the infinite symmetric matrix on N 0 with diagonal (a n ) n∈N 0 and off-diagonal (b n ) n∈N .
(i) For each r ∈ N, we have γ r,N ∼ d N as d N → 0.
Remark 24. The fact that g σ (T ∞ ) does not depend on the discrete eigenvalues of T ∞ means that asymptotically the locality estimates do not depend on defect states in the band gap arising due to perturbations satisfying Proposition 2.1, for example. Indeed, this has been shown more generally for operators with off-diagonal decay [17]. We show an alternative proof using logarithmic potential theory.
We will assume Lemmas C.4 and C.3 for now and return to their proofs below. We first add on a constant multiple of the identity, cI , to the operators {T N } so that the spectra are contained in an interval bounded away from {0}. Moreover, we translate the integrand by the same constant: O(z) : = O(z − c). Then, we extend T N to an operator on 2 (N 0 ) by defining [T N ψ] i = N j=0 [T N ] i j ψ j for 0 i N and [T N ψ] i = 0 otherwise. We therefore choose a simple closed contour (or system of contours) C encircling N σ (T N ) so that Therefore, applying Lemma C.3, a simple calculation reveals that Proof of Lemma C. 3. The proof follows from the following identities: To do this, it will be convenient to renormalise the orthogonal polynomials as in Remark 15 (that is, we consider P n (x) := b n p n (x)). Moreover, we define b −1 := 1. Using the shorthand ∂ := ∂ ∂ u m , we therefore obtain: ∂b −1 = ∂b 0 = 0, ∂ P −1 (x) = ∂ P 0 (x) = 0, and for all n 0. By noting ∂ P 1 (x) = −∂a 0 and applying (C.8), we can see that ∂ P n is a polynomial of degree n − 1 for all n 0. Therefore, since P n is orthogonal to all polynomials of degree n − 1, we have which concludes the proof of (C.7).
To prove a similar formula for the derivatives of a n , we first state a useful identity which will be proved after the conclusion of the proof of (C.7): Therefore, we have that (C.12) Applying (C.11) for k n − 1, we can see that ∂a n can be written as for some coefficients d 1,k , d 0,k . Using (C.11) and assuming the result for k n −1, we have for all k n − 1.
Proof of (C.11). We have that where l.o.t. ("lower order term") denotes a polynomial of degree strictly less than n that changes from one line to the next. That is, since c 11 = −∂a 0 = ∂ a 0 b 0 b 0 , we apply an inductive argument to conclude that Proof of Lemma C.4. The first statement is the Combes-Thomas resolvent estimate (Lemma 1) for tridiagonal operators (which, in particular, satisfy the off-diagonal decay assumptions of Lemma 1).
To obtain the asymptotic estimates of (ii), we apply a different approach based on the banded structure of the operators. Since T N is tri-diagonal, [(T N ) n ] i j = 0 if |i − j| > n. Therefore, for any polynomial P of degree at most |i − j| − 1, we have [9] . (C. 19) We may apply the results of logarithmic potential theory (see (4.15)), to conclude. Here, it is important that |σ (T ∞ ) \ σ (T N )| remains bounded independently of N so that, asymptotically, (C.19) has exponential decay with exponent g σ (T ∞ ) . The proof that |σ (T ∞ ) \ σ (T N )| is uniformly bounded can easily be shown when considering the sequence of orthogonal polynomials generated by T ∞ . A full proof is given in parts (ii) and (iv) of Lemma D.1.
Proof. The idea behind the proofs are standard in the theory of Gauss quadrature (e.g. see [40]) but, for the convenience of the reader, they are collected together in D.3.

Remark 25.
The quadrature rule discussed in this section can be seen as the exact integral with respect to the following approximate LDOS This measure has unit mass by Lemma D.

D.1. Error Estimates.
Applying Remark 25, together with (2.14), we have: for every polynomial P 2N +1 of degree at most 2N + 1, Now, since σ H ⊂ I − ∪{λ j }∪ I + where {λ j } is a finite set, we may apply part (iv) of Lemma D.1 to conclude that the number of points in X N \ I − ∪ I + is bounded independently of N . Accordingly, we may apply (4.15) with E = I − ∪ I + , to obtain the following asymptotic bound where O is analytic on {z : g E (z) < γ }.
In particular, we obtain the stated asymptotic behaviour.  [24,85]. In this paper, we only require the much milder property that the number of eigenvalues in the gap remains uniformly bounded in the limit N → ∞. For a more general discussion of spectral pollution, see [13,64].

D.3. Proof of Lemma D.1
Proof of (i). First note that p 0 p 1 dD = 0. We assume that p 0 , . . . , p n are mutually orthogonal with respect to D , and note that, and applying (D.5). Equation (D.5) also justifies the tri-diagonal structure (4.31).
Proof of (ii). We may rewrite the recurrence relation (4.29) as x p(x) = T N p(x)+ b N +1 p N +1 (x)e N where p(x) := 1, p 1 (x), . . . , p N (x) T , [e N ] j = δ j N , and T N is the tri-diagonal matrix (4.31). In particular, each ε j ∈ X N is an eigenvalue of T N (with eigenvector p(ε j )).
Proof of (iii). Since T N is symmetric, the spectrum is real. Now, for each ε j ∈ X N = σ (T N ), the matrix (T N − ε j ) ¬N ¬0 formed by removing the N th row and 0 th column is lower-triangular with diagonal (b 1 , . . . , b N ). Since each b i > 0, (T N − ε j ) ¬N ¬0 has full rank and thus ε j is a simple eigenvalue of T N .
Proof of (iv). Suppose that (after possibly relabelling) ε 0 , ε 1 ∈ X N ∩ [a, b]. After defining R(x) := N j=2 (x − ε j ), a polynomial of degree N − 1, and noting (x − ε 0 )(x − ε 1 ) > 0 on supp D , we obtain contradicting part (i). Proof of (v). We may write P 2N +1 = p N +1 q N + r N where q N , r N are polynomials of degree at most N and note that [ p N +1 (H)q N (H)] = 0 by (i) and P 2N +1 (ε j ) = r N (ε j ) since X is the set of zeros of p N +1 . Therefore, 10) In (D.9) we have used the fact that polynomial interpolation in N + 1 distinct points is exact for polynomials of degree at most N . Proof of (vi). j (x) 2 is a polynomial of degree 2N and so, by (v), we have Moreover, N j=0 j (x) is a polynomial of degree N equal to one on X N (a set of N + 1 distinct points) and so N j=0 j (x) ≡ 1. Finally, N j=0 w j = N j=0 j (x) d D (x) = 1.

Appendix E. Numerical Bond-Order Potentials (BOP)
In mathematical terms, the idea behind BOP methods is to replace the local density of states (LDOS) with an approximation using only the information from the truncated tri-diagonal matrix T N (and possibly additional hyper-parameters). Since the first N coefficients contain the same information as the first 2N + 1 moments H , . . . , [H 2N +1 ] , this approach is closely related to the method of moments [22].
Equivalently, the resolvent [(z − H) −1 ] , which can be written conveniently as the continued fraction expansion is replaced with an approximation G N only involving the coefficients from T N . For example, for fixed terminator t ∞ , we may define .

(E.2)
Truncating (E.1) to level N , which is equivalent to replacing the far-field of the linear chain with vacuum and choosing t ∞ = 0, results in a rational approximation to the resolvent and thus a discrete approximation to the LDOS. We have seen that truncation of the continued fraction in this way leads to an approximation scheme satisfying Theorem 2.5.
Alternatively, the far-field may be replaced with a constant linear chain with a N + j = a ∞ and b N + j = b ∞ for all j 1 leading to the square root terminator [38,49,97]. More generally, one may choose any "approximate" local density of states D and construct a corresponding terminator that encodes the information from D [52,65]. For example, D (x) := 1 b ∞ π 1 − x−a ∞ 2b ∞ 2 results in the square root terminator. While we are unaware of any rigorous results, there is numerical evidence [52] to suggest that the error in the approximation scheme is related to the smoothness of the difference D − D . Equivalently, we may choose any bounded symmetric tri-diagonal (Jacobi) operator T N with diagonal a 0 , a 1 , . . . , a N , a N +1 , . . . and off-diagonal b 1 , . . . , b N ,  b N +1 , . . . . That is, we may evaluate the recursion method exactly to level N and append the far-field boundary condition { a n , b n } n N +1 to the semi-infinite linear chain. This approach also includes the case t ∞ = 0 as in § 4.3 by choosing a n = b n = 0 for all n.
With this in hand, we define where D 2N +1,BOP is the appropriate spectral measure corresponding to T N .

E.1. Error estimates
Since [( T N ) n ] 00 = [(T N ) n ] 00 = [(T ∞ ) n ] 00 is independent of the far-field coefficients { a j , b j } for all n 2N + 1, we can immediately see that the first 2N + 1 moments of D 2N +1,BOP agree with those of D . In particular, we may immediately apply (2.14) to obtain error estimates that depend on supp D − D 2N +1,BOP .
Therefore, as long as the far-field boundary condition is chosen so that there are only finitely many discrete eigenvalues in the band gap independent of N , the more complicated BOP schemes converge at least as quickly as the t ∞ = 0 case. Intuitively, if the far-field boundary condition is chosen to capture the behaviour of the LDOS (e.g. the type and location of band-edge singularities), then the integration against the signed measure D − D 2N +1,BOP as in (2.14) may lead to improved error estimates. A rigorous error analysis to this affect is left for future work.

E.2. Analyticity
Since T N is bounded and symmetric, the spectrum σ ( T N ) is contained in a bounded interval of the real line. In particular, we can apply the same arguments as in (4.44) to conclude that (E.3) defines a nonlinear approximation scheme given by an analytic function on an open subset of C 2N +1 .

Appendix F. Kernel Polynomial Method & Analytic Bond Order Potentials
We first introduce the Kernel Polynomial Method (KPM) for approximating the LDOS [82,83,98]. In this section, we scale the spectrum and assume that σ (H) ⊂ [−1, 1].
For a sequence of kernels K N (x, y), we define the approximate quantities of interest However, truncation of the Chebyshev series in this way leads to artificial oscillations in the approximate LDOS known as Gibbs oscillations [43]. Moreover, without damping these oscillations, the approximate LDOS need not be positive. However, on defining we obtain a positive approximate LDOS [98] where the damping coefficients d n := (1 − n N ) reduce the effect of Gibbs ringing. In practice, one may instead choose the Jackson kernel [47].
The problem with the above analysis in practice is that the damping factors that we have introduced mean that more moments [H n ] are required in order to obtain good approximations to the LDOS. Instead, analytic BOP methods [74,79] compute the first N rows of the tridiagonal operator T ∞ , thus obtaining the first 2N + 1 moments exactly. Then, a far-field boundary condition (such as a constant infinite linear chain) is appended to form a corresponding Jacobi operator T N as in Appendix E. Now, since higher order moments of T N can be efficiently computed, we may evaluate the following approximate LDOS Efficient implementation of analytic BOP methods can be carried out using the BOPfox program [47].