Canonical quantum observables for molecular systems approximated by ab initio molecular dynamics

It is known that ab initio molecular dynamics based on the electron ground state eigenvalue can be used to approximate quantum observables in the canonical ensemble when the temperature is low compared to the first electron eigenvalue gap. This work proves that a certain weighted average of the different ab initio dynamics, corresponding to each electron eigenvalue, approximates quantum observables for any temperature. The proof uses the semiclassical Weyl law to show that canonical quantum observables of nuclei-electron systems, based on matrix valued Hamiltonian symbols, can be approximated by ab initio molecular dynamics with the error proportional to the electron-nuclei mass ratio. The result covers observables that depend on time-correlations. A combination of the Hilbert-Schmidt inner product for quantum operators and Weyl's law shows that the error estimate holds for observables and Hamiltonian symbols that have three and five bounded derivatives, respectively, provided the electron eigenvalues are distinct for any nuclei position and the observables are in the diagonal form with respect to the electron eigenstates.

Background. Given a quantum system defined by the HamiltonianĤ(x, −i ∇) acting on L 2 (R N ) the quantum canonical ensemble at the inverse temperature β = 1/(k B T ) is described by the density operatorρ = e −βĤ . A quantum observable is defined by a Hermitian, densely defined operatorÂ on L 2 (R N ) and the quantum canonical ensemble average is obtained from the normalized trace of the product as Tr (ρÂ) Trρ . The Weyl quantization establishes a connection between the operator HamiltonianĤ(x, −i ∇) and its real-valued symbol function H(x, p) defined on the classical phase space R N × R N . The semiclassical analysis for → 0 shows that the quantum observables can be approximated by the classical Gibbs ensemble average where the function A(·, ·) is the symbol of the Weyl quantized operatorÂ. The first mathematical result of such a semiclassical limit was obtained by Wigner [17]. Wigner introduced the "Wigner"-function, based on solutions to the Schrödinger equation with scalar potentials, and made an expansion in the Planck constant to relate the canonical quantum observable to the classical Gibbs phase-space average.
We study molecular dynamics approximations of canonical ensemble averages for quantum observables of the nuclei-electrons system. The role of the semiclassical parameter is played by the mass ratio m e /m n between electrons mass m e (the light particles) and the nuclei mass m n (the heavy particles). We assume the atomic units (a.u.) in which = 1 and the mass of electrons m e = 1, thus our small parameter is 1/M ≪ 1 where M is the mass of nuclei in atomic units. Note that in atomic units the proton mass is m p ≈ 1836. The Boltzmann constant is k B = 1 in atomic units hence β = 1/T . The Hamiltonian of this system consists of the kinetic energy of nuclei and the electronic kinetic energy operator together with the operator describing interaction between electrons and nucleî In this work we treat the electronic kinetic energy operator and the interaction operator,Ĥ e = − 1 2 ∆ xe +V e (x, x e ) as a matrix valued potential V : R N → C d defining the new Hamiltonian where I is the d×d identity matrix. The matrix-valued potential V is then obtained by approximat-ingĤ e on a finite dimensional subspace of electronic states. We assume that this approximation results in the Hermitian matrix valued confining potential V with non-degenerate eigenvalues. By including the electron part as a matrix-valued operator one can derive the limit as the electronnuclei mass ratio 1/M tends to zero, see [15] which, in Section 6, includes an overview of previous results. This limit can then be approximated by ab initio molecular dynamics simulations for nuclei, with the potential generated by the electron eigenvalue problem, see [10,12], based on the nuclei and electron scale separation using the Born-Oppenheimer approximation [13,14]. Such molecular dynamics simulations have the benefit to require less computational effort than to solve the Schrödinger equation with time dependent electrons. If the temperature is small in comparison to the minimal difference of the second and first eigenvalue of the electron potentialV , the probability for the quantum system to be in excited electron states is negligible and the canonical ab initio molecular dynamics based on the electron ground state yields accurate approximations. When the temperature is not small compared to this electron eigenvalue gap, the probability to be in excited states is substantial and the molecular dynamics associated with the electronic ground state energy will not yield accurate approximation of quantum observables.

1.2.
Overview of results. We address an important question which seems mathematically open: How to modify the canonical ab initio molecular dynamics in order to accurately approximate quantum observables based on matrix valued potentials and all temperatures ?
We derive molecular dynamics methods that accurately, in mathematical sense, approximate a quantum observable also in the case where the temperature can be large compared to the first spectral gap. The approximation consists of a weighted sum of molecular dynamics observables for the scalar Hamiltonians which are the eigenvalues of the original matrix-valued Hamiltonian symbol. Furthermore, the weights, which are the probabilities to be in the corresponding electron states, are determined precisely as molecular dynamics observables. For instance, molecular systems with light atoms and applications with laser heating have been simulated more accurately by taking several electron states into account, see e.g. [1], [8] and Section 5.3.4 in [12].
Section 2 presents analysis with quantum observables not depending on time. In Section 3 we study observables that are correlations in time or depending on time correlations. The main result is that the weighted sums of molecular dynamics observables approximate canonical quantum observables, based on Schrödinger Hamiltonians with matrix-valued potentials for any positive temperature. The approximation error is bounded by the square root of electron-nuclei mass ratio, M −1/2 , times a constant, provided the observable symbols are in the diagonal form with respect to the electron eigen-states and have up to three derivatives bounded in L 2 (R N ×R N ), the Hamiltonian symbol has five derivatives and the electron eigenvalues are distinct for any nuclei position. An improved approximation error O(M −1 ) holds with additional assumptions. The main mathematical tool is the semiclassical Weyl law, as formulated e.g. in [18,15].
The semiclassical Weyl law has been used before, see [15], to approximate canonical quantum observables, based on general matrix valued Hamiltonians, including infinite dimension d = ∞, by phase-space averages. For general Hamiltonian operators the classical approximation with O(M −1 ) accuracy includes O(M −1/2 ) perturbations of the leading order Hamiltonian dynamics, as described in [15] by insightful analysis of the Hamiltonian flow based on perturbed Hamiltonian functions and symplectic forms.
With the aim to obtain sharper error estimates, we focus on a simpler case from the analysis perspective, when the nuclei kinetic energy contribution to the Hamiltonian operator is defined by the Laplacian, as in (1.2), and d is finite. We exploit this simplified structure and the assumption of non crossing electron eigenvalues, to construct global projections to the electron states related to the adiabatic approximation. We show how these projections, in the special case of the Hamiltonian (1.2), can be determined by a nonlinear eigenvalue problem solved by a power expansion provided by Cauchy-Kovalevsky's theorem. This new approach with a non-linear eigenvalue problem has an advantage in providing more precise error estimates. The construction is based on a convergent power series and not only on a semiclassical expansion which has been used to derive semiclassical limits elsewhere.
Another novelty of our approach is that our error estimates are not using the Calderon Vaillancourt theorem that bounds the L 2 operator norm of a symbol by L ∞ estimates of derivatives of 3 order N of the symbol. Instead we take advantage of the Hamiltonian form (1.2) to derive error estimates that combine the Hilbert-Schmidt norm of quantum observables and Weyl's law to obtain new bounds in terms of the L 2 (R N × R N ) norm of remainder symbols based on three and five derivatives of the observables and the Hamiltonian, respectively. The new bounds of the remainder symbols given in Lemmas 3.11 and 3.12 are based on Hermitian properties of the Moyal composition. The constant in our O(M −1 ) approximation result in Theorem 3.2 depends e.g. on the L 2 (R N × R N ) norm of three derivatives of the observable, while estimates based on the Calderon-Vaillancourt theorem require similar bounds on derivatives of order N ≫ 1, which in practise can be very large so that O(M −1 ) would not lead to useful error estimates for computational approximations with realistic values of M .
In conclusion, we make a step in the direction of precisely estimating constants for semiclassical approximations. There are two main new mathematical ideas that take advantage of the particular form of the Hamiltonian (1.2): a non-linear eigenvalue problem that avoids the traditional asymptotic expansion and an error estimate of remainder terms that use up to fourth order derivatives only. We think these two ideas can be useful to further trace constants in situations with e.g. avoided crossings, degenerate and crossing electron eigenvalues, vanishing temperature and more general Hamiltonians.
At the core of the canonical ensemble is the Gibbs distribution, which has the important property that it is the marginal distribution for a subsystem weakly coupled to a heat bath, where the composite system is assumed to have the microcanonical distribution, see e.g. [4]. In Appendix A we present a variant of this property, assuming instead that the marginal distribution of the subsystem is determined by the subsystem Hamiltonian. Another basic property is that the Gibbs density is a time independent solution of the Liouville-von Neumann equation. Appendix A also includes a comparison of the classical and quantum Gibbs densities with respect to the classical and quantum Liouville equations. In Section 2.3 and Appendix B we present numerical results of simple model problems where the quantum and molecular dynamics observables are compared.

The Schrödinger equation and Gibbs ensembles
2.1. Problem formulation and Weyl quantization. We consider the matrix valued Schrödinger operatorĤ where V : R N → C d×d is a Hermitian matrix valued confining potential and I ⊗ ∆ is the d × d identity matrix times the Laplacian on R N , modeling the nuclei kinetic energy. Hence, the quantum model consists of N/3 nuclei whose coordinates are in R 3 and the wave functions Φ n ∈ L 2 (R N , C d ) are vector-valued with d components. We use the notation H := Approximation of the electronic part of the Hamiltonian using the electron eigenvalue problem gives the matrix potential (operator) V (x) defined for each nuclei configuration x ∈ R N . Here M ≫ 1 is the nuclei-electron mass ratio, assumed to be much larger than one. The setting with individual nuclei masses and a diagonal mass matrix M can be transformed to the form (2.1) by introducing the new coordinates M 1/2 1x = M 1/2 x, which transforms the Hamiltonian into To handle the spectrum ofĤ we assume that the smallest eigenvalue λ 1 (x) of V (x) tends to infinity as |x| → ∞. This assumption implies that the spectrum ofĤ is discrete, see [3]. More precisely, let Φ n : R N → C d and E n ∈ R for n = 1, 2, 3, . . . be solutions of the eigenvalue problem (2.1)ĤΦ n = E n Φ n , then the set of eigenfunctions {Φ n } ∞ n=1 forms a complete basis of the Hilbert space H. To have a complete set of eigenfunctions in H is used for the analysis of the canonical quantum ensemble average in this work, although it is not crucial. The approach we present is based on the concept of the trace of quantum operators introduced by von Neumann, [16].
Notation and Weyl calculus. Since our analysis provides error estimates for approximating quantum observables with classical ones it is natural to use tools of Weyl calculus that defines correspondence between operators on H and their symbols in a suitable functional space. Here we introduce the notation used throughout this paper.
For functions u, v ∈ H we denote the scalar product and the corresponding norm u 2 H = R N |u(x)| 2 dx. The space of smooth rapidly decaying matrixvalued functions, i.e., Schwartz space, is denoted S(R N × R N , C d×d ) and it is abbreviated as S. We define the Fourier transform F : We emphasize that the Fourier transform of a symbol A is denoted by F[A] whileÂ denotes the Weyl quantization of the symbol A. We also use the notation (A) instead of the simpleÂ, in particular in long expressions such as (Ae −βH ) . We define the Weyl quantization operator of the matrix-valued symbol A ∈ S as the mapping A →Â that assigns to the symbol A the linear operatorÂ : For example, the symbol H(x, p) : The definition (2.3) implies that any quantisationÂ is an integral operator The expression above shows that K A is the Fourier transform in the second argument of the symbol A(x, p) and consequently the Weyl quantization is well defined for symbols in S ′ , the space of tempered distributions. An important property of the Weyl quantization is given by the Moyal product A#B of two symbols A, B The Moyal product provides correspondence between the algebra of operators and the algebra of their symbols by identifying composition of two operators with the Weyl quantization of the Moyal product of their symbols, more precisely Further properties and background on the Weyl calculus are presented, e.g., in [18,6].
The principal idea in this work is to study the trace of operators on H with kernels K A defined by (2.4). The trace is a composition of the trace in L 2 (R N ) and the trace of d × d matrices. The two different traces are defined by TrB := ∞ n=1 Φ n ,BΦ n , for an operatorB on H, The analysis here uses the fact that the H-trace of a Weyl operator based on a d × d matrix valued symbol is equal to the phase-space average of its symbol trace. Indeed we have by (2.4) We introduce the coordinate z = (x, p) ∈ R 2N in the phase space and its Lebesgue measure dz = dx dp. In fact also the composition of two Weyl operators is determined by the phase-space average as follows.
Lemma 2.1. The composition of two Weyl operatorsÂ andB, with A ∈ S and B ∈ S satisfies where A(z)B(z) is the matrix product of the two d × d matrices A(z) and B(z).
Although this result is not new, see [15], we include a proof since it is important for the work here.
Proof. The kernel of the composition is × e iM 1/2 (x−x ′ )·p+(x ′ −y)·p ′ dp ′ dp dx ′ so that the trace of the composition becomes Tr A(y, p)B(y, p ′ ) e iM 1/2 y ′ ·(p−p ′ ) dp ′ dp dy ′ dy Tr A(y, p)B(y, p) dp dy , using the change of variables (y, y ′ ) = (x + x ′ )/2, x − x ′ , which verifies the claim. 6 The isometry between Weyl operators with the Hilbert-Schmidt inner product, Tr (Â * B ), and the corresponding L 2 (R N × R N , C d×d ) symbols obtained by Lemma 2.1 shows how to extend from symbols in S to L 2 (R N × R N , C d×d ) by density of S in L 2 (R N × R N , C d×d ), see [15]. We will use the Hilbert-Schmidt norm Â 2 HS = Tr (Â * Â ) = Tr (Â 2 ), to estimate Weyl operators.

2.2.
Gibbs density operator and its approximation. The quantum statistical average of the (time-independent) observableÂ in the canonical ensemble at the inverse temperature β is given by Similarly the time-dependent or time-correlation observables becomes , and related Tr (Â τ (B 0ρ +ρB 0 ) Tr (ρ) , (2.9) whereĈ τ := e iτ M 1/2ĤĈ e −iτ M 1/2Ĥ is the quantum observable at time τ withĈ 0 =Ĉ, as presented more precisely in Section 3.
The Weyl quantization provides a correspondence between quantum operatorsÂ and their classical symbols A. The quantum canonical ensemble is described by the density matrix operatorρ = e −βĤ while the classical canonical ensemble is defined by the Gibbs distribution µ(dz) ∼ e −βH dz on the phase space z = (x, p).
Lemma 2.1 suggests that the correspondence between the quantum and classical Gibbs observables can be achieved if we use as the density matrix operator the Weyl quantization e −βH . In Section 3 we derive error estimates that show that quantum observables such as those in (2.9) can be approximated by a classical Gibbs observables in (1.1), derived from the Hamiltonian symbol H, and thus linked to the classical molecular dynamics. Theorems 3.2, 3.6 and 3.7 prove error estimates both if the density operator used in (2.9) isρ = e −βĤ and if it is replaced with Weyl quantization e −βH .
Replacing the standard density operator e −βĤ with the operator e −βH raises a question which one should be taken as a reference for error analysis. We show in the proof of Theorem 3.6 that the observables based on these two operators differ by a quantity of order O(M −1 ) when the number of particles, N , is small compared to M . Thus in this case one can use either of them. However, the standard density operator e −βĤ is the stationary solution of the (quantum) Liouvillevon Neumann equation while the corresponding symbol is not the time-independent solution of the classical Liouville equation. On the other hand starting with the classical Gibbs density e −βH the corresponding Weyl quantization gives the proposed density operator e −βH which is not a stationary solution of the Liouville-von Neumann equation. We discuss this issue in Appendix A.
The trace property of Lemma 2.1 shows that an approximation of the Gibbs observable, where the order of the operations of exponentiation and quantization have been reversed, satisfies and in the normalized form Tr ( e −βH ) We will use the diagonalized form of the Hamiltonian symbol and similarly of transformed observables. More precisely, letÃ : R 2N → C d×d be a symbol in the Schwartz class and consider symbols in the matrix product form A(z) =Ψ(x)Ã(z)Ψ * (x), whereΨ(x) is the d × d matrix composed of the eigenvectors to V (x) as columns, i.e., andλ k (x), k = 1, . . . , d, denote the eigenvalues of V (x) in the increasing order. HereΨ * (x) is the Hermitian adjoint ofΨ(x) in C d and δ jk is the Kronecker delta. ThenΨ(x) diagonalizes H(z) and it and A(z) satisfy: (2.12) The diagonal property of e −βH shows that the trace satisfies which only requires the diagonal part ofÃ. We obtain by (2.10) the approximate Gibbs quantum observable as a sum R 2N e −βH jj (z) dz and each term can be written in canonical ensemble form: Lemma 2.2. The approximate canonical ensemble average satisfies, for β > 0, (2.13) Tr ( e −βHÂ ) R 2N e −βH jj (z ′ ) dz ′ with the weights given by respective probability to be in the state j (2.14) In Section 3 we analyze the trace based on the time independent Gibbs density operator e −βĤ and on e −βH , using instead the transformationÂ =ΨÂΨ * for a diagonal symbolĀ : [0, ∞)×R 2N → C d×d with an orthogonal matrix Ψ : R N → C d×d diagonalizing a non-linear perturbation of the eigenvalue problem (2.11). Remark 2.3. Since Ψ ·j (Ψ ·j ) * is the projection to the electron state j, a method to handle projections to electron states is to normalize with respect to that state and use

Computational demonstration.
In order to demonstrate computational consequences of the presented analysis we formulate a simple model problem for comparing quantum observables with observables obtained from molecular dynamics. We consider a model HamiltonianĤ and I is the 2 × 2 identity matrix. The corresponding Schrödinger eigenvalue equationĤΦ n = E n Φ n , where E n ∈ R and Φ n : R → R 2 represents a model with a heavy particle with coordinate x ∈ R and a two state light electron particle. The details of the example, numerical implementation, and specific values of the parameters are described in Appendix B.
2.3.1. Equilibrium observables. First we focus on equilibrium position observables. We demonstrate numerically an estimate of the difference of the quantum and classical canonical observables, given by (2.8) and the right hand side in (2.13), respectively. We view the averaging for the quantum, Here (E n , Φ n (x)) denotes an eigen-pair of the operatorĤ and λ k (x), k = 1, 2 is the k-th eigenvalue of the matrix-valued function V (x), such that λ 1 (x) ≤ λ 2 (x). Figure 2a shows the classical density µ cl from molecular dynamics on the electron ground state, corresponding to q 1 = 1 and q 2 = 0, and the density given by the values of q k in (2.16). For the choice of parameters these probabilities are q 1 = 0.8 and q 2 = 0.2. Figure 3a shows the Schrödinger and classical densities µ qc (x) and µ cl (x), computed using (2.15) and (2.16). We note that the classical density computed using (2.16) approximates the quantum density quite well, whereas the Figure 2a shows that the classical density computed using only the electron ground state is not close to the quantum density. The potential V (x) that was used in these numerical tests is described in Appendix B, and has eigenvalues λ k (x) depicted in Figure 1.
Results from numerical simulations demonstrate the error analysis of Section 3, in particular that the L 1 as well as L ∞ difference between µ qc and µ cl decreases as O(M −1 ). We note the striking agreement between the Schrödinger and molecular dynamics equilibrium densities for M = 1000 in Figure 3a. The pointwise difference for M = 1, 10, 100, 1000 is given in Figure 3b. Figure 2b shows the increasing variation of the molecular dynamics density as the temperature decreases. We conclude that although, the densities vary substantially for different temperature, the molecular dynamics density approximates the canonical Schrödinger density well, with the error O(M −1 ) shown also in Figure 4. Molecular dynamics approximation of quantum observables in the micro canonical ensemble typically have larger errors, see [2].

2.3.2.
Time-correlated observables. Next we demonstrate that the derived method and analysis is also applicable to observables depending on time correlations. In particular, we test the position observable x 0 at time t = 0 with the position observable x τ at time t = τ . The time evolution of the position observable operator is given, in Heisenberg representation, aŝ Applied to the special case in this example, withĤ as in Section 2.3.1, Theorems 3.2, 3.6 and 3.7 show that with the initial condition z j 0 = (x 0 , p 0 ) = z 0 , and λ j as in Figure 1, precisely defined in Appendix B.1. With the same values for T and δ as in the case of equilibrium observables we show the position correlation observable for different correlation times τ , and three different mass ratios M , in Figure 5a. Figure 5b demonstrates that the L ∞ -error is inverse proportional to the mass ratio M in agreement with the error analysis of Section 3.

Time correlated observables
In this section we study canonical quantum observables for correlations in time, namely and the related variant Tr Â τ (B 0 e −βĤ + e −βĤB 0 ) , based on the time dependent operatorÂ τ , which for τ ∈ R is defined by Example. For instance, the observable for the diffusion constant uses the time-correlationx(τ ) ·x(0) whereÂ τ =x τ I andB 0 =x 0 I and To analyze the time evolution ofÂ τ we use transformed variables: assume that Ψ : R N → C d×d and Ψ(x) is an orthogonal matrix with the Hermitian transpose Ψ * (x) and defineĀ : The matrix Ψ(x) we will use is defined precisely below and it approximates the matrixΨ(x) that diagonalizes the potential matrix V (x) in the sense that Ψ(x)−Ψ(x) = O(M −1 ). We also assume thatB 0 =Ψ B 0Ψ * , and we restrict our study to the case where the d × d matrix symbolsĀ(0, ·) =Ā 0 andB 0 are diagonal, as motivated in (3.7). Let α be any complex number and define for t ∈ R the exponential Differentiation shows that and we conclude that The composition rule A power expansion of the exponential in (3.5), see [18], yields the semiclassical expansion This expansion is defined for symbols in the Schwartz class and Lemma 3.11 provides an extension to a larger set of symbols. Due to the special from of the Hamiltonian studied here, namely H(x, p) = |p| 2 2 I + V (x), all terms of the semiclassical expansion for (3.6) with higher powers than M −1 drop out and we obtain a simple sum of only two terms: Lemma 3.1. Any orthogonal two times differentiable matrix valued mapping Ψ : Proof. The composition in (3.5) shows that which by the property Ψ * Ψ = I can be written which by (3.3) and (3.4) impliesΨ * e tαĤΨ = e tαĤ and we obtain by (3.1) and (3.2) that The next step is to determine Ψ so thatH = Ψ * #H#Ψ is diagonal (or approximately diagonal in (3.18) ), in order to makeHĀ t −Ā tH small since it appears in the expansion of the compositions in . To haveHĀ t −Ā tH small then also requiresĀ t to be diagonal (or almost diagonal). In the case whenH is diagonal, the quantizationĤ is diagonal and thenÂ remains diagonal if it initially is diagonal, since then d dtÂ Consequently we restrict our study to observables whereĀ 0 =Ā(0) are diagonal.
We haveH Therefore the aim is to choose the orthogonal matrix Ψ so that it is a solution or an approximate solution to the non linear eigenvalue problem where Λ is diagonal with Λ jj =: λ j . Such a transformation Ψ is an O(M −1 ) perturbation of the eigenvectors to V (x), provided the eigenvaluesλ j (x) of V (x) do not cross and M is sufficiently large. Section 3.1 shows that (3.8) has a unique solution, if the potential V is real analytic, and that the solution can be approximated by an asymptotic expansion.
3.1. Solution of the nonlinear eigenvalue problem. This section presents a version of the standard regular perturbation analysis of matrix eigenvalue problems, cf. [9], which shows that the nonlinear eigenvalue problem (3.8) can be written as a nonlinear system of first order partial differential equations solved by a power expansion provided by Cauchy-Kovalevsky's theorem or by an asymptotic expansion.
We define for small ǫ ∈ R the matrix whereV andB are real symmetric d × d matrices, depending on a parameter x ∈ R N and on another parameter M ∈ R. The non linear eigenvalue problem (3.8) can then be written as (3.10) We assume that the matricesV andB are m times continuously differentiable as a function of x.
Differentiating with respect to ǫ, the eigenvalue problem with the eigenvalues λ k (ǫ) ∈ R and the corresponding normalized real valued eigenvectors (3.11) The remaining component in the ψ k (ǫ) direction becomes zero by the normalization In particular, the non linear eigenvalue problem (3.8) is in the form of (3.9), based on (3.10), hence it can be written as the system of d 2 + d partial differential equations with the initial data and 0 < ǫ < 1 4M . There is a power series solution for large M in the case that V is real analytic and the eigenvaluesλ k are distinct for all x: since then alsoλ andΨ are real analytic and Cauchy-Kovalesky's theorem, see [5], yields a local solution to the nonlinear system of partial differential equations (3.13).
The assumption of distinct eigenvalues λ 1 (0) < λ 2 (0) < . . . < λ d (0) and the combination of (3.12), (3.11), and establishes, for sufficiently small ǫ, and differentiation of (3.11), with respect to the parameter x ∈ R N , yields The induction (3.16) uses this estimate for ǫ = 1/(4M ) in (3.15) and (3.17). An alternative to solve (3.8) exactly, which requires less regularity on V , is the following approximate solution based on fixed point iterations to obtain an asymptotic expansion. Let S(C) denote an orthogonal matrix of eigenvectors to a d × d Hermitian matrix C, with the columns in the order of the eigenvalues, so that e.g. S(V (x)) = Ψ 1 (x)Ψ 2 (x) . . .Ψ d (x) as in (2.11). Let Ψ[1] = S(V (x)) and define (3.14) Ψ Assume that the eigenvaluesλ j of V are distinct and V ∈ [C m (R N )] d 2 . The regular perturbation theory of real symmetric matrices in (3.11) shows that for sufficiently large M and any k ≤ m where O k denotes an order relation that is allowed to depend on k. Then induction in j, for j ≤ k, shows that as follows: we have so that regular perturbation theory implies that the left hand side is diagonalized by and there is a constant K k,j such that is diagonal withΛ jj =λ j as in (2.11) and r 0 is the term with the factor 1 , which only depends on the x-coordinate. Here κ ≤ m, where V ∈ [C m (R N )] d 2 and we remind that We see that consists of a diagonal part and a small coupling O(M −κ+1/2 ) part. This asymptotic recursion for Ψ[j] is typically not convergent, therefore the error term r 0 = O(M −κ ) may be large if κ is large, unless M is very large, which is a reason to avoid large values of κ.

3.2.
Approximation in time of observables. The transformation to diagonalH andĀ 0 yields restrictions to the set of observables A 0 that we can analyse. The aim of this section is to describe these restrictions and present the time dependent molecular dynamics observable that will be used to approximateĀ t .
3.2.1. Time dependent molecular dynamics. The symbolȂ : approximatesĀ t , as we shall see in Lemma 3.9 below. By writing the Poisson bracket in the right hand side explicitly, we see that equation (3.19) is a scalar linear hyperbolic partial differential equation for each component: This partial differential equation can be solved by the method of characteristics, which generates molecular dynamics paths as follows. LetȂ jj be constant along the characteristic with initial data (x 0 , p 0 ) = z 0 and the Hamiltonian (H 0 (x, p)) jj = ( |p| 2 2 I + Λ(x)) jj = |p| 2 2 + λ j (x). For each j we have where the equality in the left hand side holds because the Hamiltonian system is autonomous and the Poisson bracket in the right hand side is obtained from the chain rule differentiation at s = 0. We conclude that (3.19) holds forȂ constructed by (3.21).

3.2.2.
The set of allowed observables. Our approximation of canonical quantum observables becomes implicit in the following sense. Given a Hamiltonian symbol H we can determine electron states Ψ so thatH = Ψ * #H#Ψ is diagonal. For any diagonal initial symbolsĀ 0 andB 0 , we will for instance show that the molecular dynamics observable approximates the quantum observable Tr (Â τB0 e −βĤ ), where To approximate a given quantum observable is in this formulation possible only ifĀ 0 = Ψ * #A 0 #Ψ andB 0 = Ψ * #B 0 #Ψ are diagonal (or almost diagonal). From a given diagonalĀ 0 it is therefore direct to determine A 0 but the opposite to first choose A 0 then requires to verify ifĀ 0 is diagonal. The set of such A 0 is in the special case whereĀ 0 only depends on x given by A 0 = ΨĀ 0 Ψ * , withĀ 0 any diagonal matrix. If in addition A 0 (x) = a(x)I, for any scalar function a : R N → R, we obtain A 0 (x) =Ā 0 (x) = a(x)I and similarly for B 0 .
We note that if eigenvalue surfaces cross, i.e. if λ j (x) = λ j+1 (x) for some j and x, then ∇Ψ(x) may not be in L 2 (R N ). We have assumed that the observable symbolsĀ 0 andB 0 are diagonal in the same coordinate transformation Ψ[κ] that approximately diagonalizes the Hamiltonian, in the composition way (3.18). Example of observables that cannot be diagonalized by the same transformation as the Hamiltonian are A 0 (z) = xΨ ·1 (x)(Ψ ·2 ) * (x) and B 0 (z) = xΨ ·1 (x)(Ψ ·2 ) * (x) and the correlation based on these observables are then not applicable in Theorems 3.2, 3.6 and 3.7, in contrast to the projections Ψ ·j (Ψ ·j ) * in Remark 2.3.

3.3.
Assumptions and theorems. This section states our main results on molecular dynamics approximation of quantum observables in the canonical ensemble. The molecular dynamics is based on the Hamiltonian system formulated in (3.21-3.23). The observableĀ jj (0, z j t (z 0 )), for each j ∈ {1, . . . , d}, is constant along the molecular dynamics path and provides the classical approximation of the corresponding quantum observable as we shall see in Theorems 3.2, 3.6 and 3.7. By assuming that the potential V is real analytic, the nonlinear eigenvalue problem (3.8) generates a matrix Ψ of eigenvectors and eigenvalues λ j that yields a diagonal HamiltonianH =H 0 with vanishing remainder r 0 = 0, as defined in (3.18 then there is a constant c, depending on C, such that the canonical ensemble average satisfies where z j τ solves (3.22).
We note that Tr ( e −βH ) and that although V is assumed to be analytic, this assumption is used only to construct Ψ and λ. For instance, the constant c does not depend on the size of the fourth order derivatives of λ.
Here we have used the quantization of the classical density, e −βH , as discussed in Section 2.2. Our related results for the density e −βĤ require similar but more assumptions on the regularity of the data given below. We note that all assumptions are based on a fixed number of derivatives not depending on N . Assumption 3.3. Assume κ = 1 or κ = 2 in (3.18) defines the remainder r 0 , the eigenvalues Λ, the HamiltonianH 0 and that there is a constant C such that Assumption 3.5. Assume κ = 2 in (3.18) defines the remainder r 0 , the eigenvalues Λ, the Hamil-tonianH 0 and that there is a constant C such that  Tr (e −βĤ ) and q j = q j (H) is defined by (2.14). If in addition Assumption 3.5 holds then the estimate in By comparing instead to the Weyl quantized classical Gibbs density e −βH 0 we have the following more accurate error estimate, that only requires Assumption 3.4.

31)
where z j τ solves (3.22) and q j = q j (H 0 ) is defined by (2.14). 3.4. Structure of the proofs. It is useful to split our estimation into two parts where the first part is the approximation error of the Gibbs density operator, which is estimated in Lemma 3.8, and the second part is the approximation error of the dynamics of the observable, which is estimated in Lemma 3.9.   The results in Lemmas 3.8 -3.10 have clear limitations since the error estimate of the approximation of the Gibbs density operator in Lemmas 3.8 and 3.10 is not uniform in N and T −1 and the approximation error of the observable dynamics in Lemma 3.9 is not uniform in τ . This means that many particles and low temperature yields a large approximation error of the density operator. The approximation error of the observable dynamics depends exponentially on time, e cτ , but c is uniform in N provided the assumptions in Theorem 3.2 hold uniformly in N . In conclusion by combining (2.13), (3.32) and Lemmas 3.8 -3.10, we obtain the theorems.
The three proofs of the lemmas, in Section 4, have the same structure with three steps -find an error representation, estimate remainder terms in Moyal expansions and evaluate the tracedescribed roughly as follows.

3.4.2.
Estimation of remainder terms and evaluation of the trace. We will use the composition rule (3.5) to estimate remainder terms in the error representation (3.34). Expansion of the exponential in the composition rule leads to the so called Moyal expansion. The usual estimates of the remainder terms in Moyal expansions determine the L 2 (R N ) operator norm from L ∞ norm estimates of order N derivatives of the remainder symbol, using the Calderon-Vaillancourt theorem, see [18,Theorem 4.23]. To avoid derivatives of high order, if N is large, we instead estimate the remainder terms in the form Tr (RĈ), for Hermitian operatorsR andĈ on L 2 (R N ), by the L 2 norms of their symbols. We use the Hilbert-Schmidt inner product, Tr (R * Ĉ ), and the corresponding Hilbert-Schmidt norm, R 2 HS = Tr (R * R ) = Tr (R 2 ), and Lemma 2.1 as follows Tr C 2 (z) dz .
where the last steps indicated by " = . . . ≤ " are explained in the proof of Lemmas 3.11 and 3.12 using the Fourier transform of the Dirac measure δ(z − z ′ ), that i∇ x · ∇ p is anti-Hermitian (so that (e i 2M 1/2 ∇x·∇p ) * = e − i 2M 1/2 ∇x·∇p is unitary) and applying Young's inequality to convolutions of Fourier transforms. Here F denotes the Fourier transform (2.2). In our proof of Theorems 3.7 and 3.2 and to prove (3.30) in Theorem 3.6 we need this estimate only in the special case where one function depends only on the x coordinate, i.e. A(x), and then the right hand side becomes is related to 2(N + 1) derivatives of A in L 2 (R 2N ), see (3.37). The special form of the Hamiltonian, namely H(x, p) = |p| 2 2 I + V (x), is essential to only obtain the case A(x)#C(x, p) L 2 (R 2N ) in our analysis for the O(M −1/2 ) bound in (3.30), which by Lemma 3.12 is bounded by A L ∞ (R N ) C L 2 (R 2N ) , and not from (3.40). An L 1 -bound on the Fourier transform of a function, which is required in (3.28) and (3.29) to obtain the accuracy O(M −1 ) in (3.30), is more demanding on regularity than the 22 L ∞ -norm of the function. For instance, we have (3.37) The eigenvalue functionsλ j (x) and their Laplacian are typically proportional to the number of particles, since the Hamiltonian is the energy of the system. Therefore, the corresponding estimates in the first row of (3.25) are bounded by a constant proportional to N , while (3.27) can be uniform with respect to N . Also the remainder term r 0 , related to 4M −1 ∇Ψ * · ∇Ψ in (3.18), may be proportional to N . Therefore also the estimate λ j −λ j L ∞ (R N ) = O(M −1 ), obtained from (3.18) with κ = 1, may have a constant proportional to N .

3.5.
Remainder terms in the Moyal composition. The following two lemmas estimate remainder terms in the Moyal expansions of the compositions that we will use below. Here F[C] denotes the standard Fourier transform of C, see 2.2. We also use the abbreviation z ≡ ( Lemma 3.11. Assume C : R 2N → C d×d and D : R 2N → C d×d and that there exist constants M γ , N γ such that for integer multi-indices γ = (γ 1 , . . . , γ 2N ) for all |γ| ≤ m + 1 , then the composition has the expansion where the remainder r ∈ L 2 (R 2N ) satisfies If C(x, p) = A(x) depends only on the x-coordinate and

38)
or if C(x, p) = A(p) depends only on the p-coordinate and ∂ γ p A L ∞ (R N ) ≤ M γ , for all |γ| ≤ m + 1 then (3.39) Lemma 3.12. Assume C : R 2N → C d×d and D : R 2N → C d×d belong to L 2 (R 2N ) and in addition one of these functions has its Fourier transform in L 1 (R 2N ), then and if A : R N → C d×d depends only on the x-coordinate (or only on the p-coordinate) and is bounded in L ∞ (R N ) then Proof. The proof has three steps: error representations, estimation of remainder terms and evaluation of the trace.

(4.4)
We have r 2 = (r ′ 2 ) * and the next step shows that Step 2. Estimation of remainder terms. The remainder representations (3.38) and (3.39) applied to the x and p dependent terms inH 0 separately implies which can be written (4.5) We have by (3.41) in Lemma 3.12, where A is the function of x and D is the function of (x ′ , p ′ ) in the estimates of r 0 and r 2 in (3.18) and (4.5), Here we see that r 0 depends only on the x-coordinate and the composition in r 2 has one factor that also depends only on the x-coordinate. Therefore, by Lemma 3.11 and (3.41) we obtain provided there holds (4.8) Step 3. Evaluation of the trace. The Hilbert-Schmidt inner product, Tr (B * Ĉ ), for symmetric operators on L 2 (R N ), and its Cauchy's inequality imply together with (4.1)  Tr Ā 2 (0, z) dz and we assume that the initial data satisfies (4.10) The basis {Φ n } ∞ n=1 of solutions to the Schrödinger equation (2.1) implies In the special case whereĀ 0 =B 0 = I, we similarly obtain and by (2.7) Tr (e −βH 0 (z) ) dz , Tr (e −βH 0 (z) ) dz + O(M −1/2 ) .

Proof of Lemma 3.9.
Proof. This proof is analogous to the proof of Lemma 3.8 and has three similar steps: error representation, estimation of remainder terms, and evaluation of the trace.

(4.21)
We note that the assumption on λ j is compatible with the assumption thatλ j (x) tends to infinity as |x| → ∞, which is imposed to have a discrete spectrum ofĤ. The choice Ψ = Ψ [2] yields λ j =λ j + O(M −1 ). The eigenvalues λ j have four bounded derivatives if the eigenvaluesλ j of the potential V are distinct and also the corresponding eigenvectors Ψ [1] and Ψ [2] have four bounded derivatives, which requires the fifth order derivatives of the potential V to be bounded.
Step 3. Evaluation of the trace. Define the diagonal matrix

4.4.
Weyl quantization estimates: proof of Lemmas 3.11 and 3.12. The purpose of this section is to prove Lemmas 3.11 and 3.12 that estimate the remainder terms r i , i = 0, 2 in the L 2 (R 2N ) norm, and avoid derivatives of high order N , using that the Hilbert-Schmidt norm of operators can be bounded based on the L 2 (R 2N ) norm of their symbols, as illustrated in (3.35) and (3.36). The precise estimates of remainder terms in the Moyal expansion of the composition of two Weyl quantizations are presented here using Hermitian properties of the operator valued The Moyal expansions, see [18], , (4.26) are well defined for the symbols A, C and D in the Schwartz class, viewing the exponential as a Fourier multiplicator. We begin with the first expansion. 4.4.1. The case A(x)#D(x, p). We study the remainder term for the expansion of the exponential using and apply the Fourier transform F defined for f (x ′ , p ′ ) by The remainder can be evaluated by the inverse Fourier transform and Taylor expansion of the exponential function . (4.28) The remainder is therefore r(x, p) (4.29) 4.4.2. The case C(x, p)#D(x, p). As above we obtain We can therefore write the remainder as R(x, p)  |e Integration by parts, property (4.32) and the fact that the differential operators ∇ x ′ ·∇ p and ∇ x ·∇ p ′ commute and are symmetric in then by (4.30) we can expand the derivative inR(z, z ′ , s) and collect the derivatives with respect to z and z ′ in the functionsC andD as The Fourier transform in the u-direction is the convolution The right hand side in (4.33) becomes which can be written |r n (ξ, that proves (3.40) in Lemma 3.12 and combined with (4.31), (4.33) and (4.34) it also establishes Lemma 3.11.

38
The estimate (3.41) of A(x)#D(x, p) L 2 (R 2N ) follows similarly from (4.29) by integration by parts . The case A(p)#D(x, p) is analogous to A(x)#D(x, p).
In conclusion, these estimates bound the remainder symbols r 0 and r 2 used in Lemmas 3.8-3.10. Finally, using that Schwartz functions are dense in L p (R 2N ), for any p ≥ 1, we have proved Lemmas 3.11 and 3.12.
The corresponding density matrix symbol ρ q is not a time-independent solution to the classical Liouville equation, since 0 = iM 1/2 (ρ q #H −H#ρ q ) = {ρ q ,H}, and the classical Gibbs density is not a time-independent solution to the quantum Liouville-von Neumann equation, since iM 1/2 (e −βH 0 #H 0 −H 0 #e −βH 0 ) = {e −βH 0 ,H 0 } = 0. We are lead to the question which Gibbs density to use and why use any Gibbs measure. This question is analyzed regarding the timedependence and the classical behavior in the following two subsections.
A.1. Why should we use the Gibbs density? In Statistical Mechanics books the Gibbs density is often derived as the marginal distribution of a subsystem weakly coupled to a heat bath, where the composite system is assumed to have the microcanonical distribution, see [4]. Here we give a variant of this derivation, assuming instead that the marginal distribution of the subsystem is determined by the subsystem Hamiltonian.
In molecular dynamics simulations one often wants to determine properties of a large macroscopic system with many particles, say N ∼ 10 23 . Such large particle systems cannot yet be simulated in a computer and one may then ask for a setting where a smaller system has similar properties as the large. Therefore, we seek an equilibrium density matrix that has the property that the marginal distribution for a subsystem has the same density as the whole system. We will below motivate how this assumption together with an assumption of weak coupling between the subsystem and the whole system leads to the Gibbs measure; in fact it is enough to assume that the marginal distribution for the subsystem has an equilibrium density which is a function of the Hamiltonian for the subsystem.
Assume that the Hamiltonian symbol has been diagonalized and consider one component so that the Hamiltonians are scalar valued, with coordinates z s ∈ R 2n and Hamiltonian H s (z s ) in the subsystem and coordinates z b ∈ R 2(N −n) and Hamiltonian H b (z b , z s ) for a large heat bath environment system, i.e. N ≫ n. The whole system has the Hamiltonian H(z) = H s (z s )+H b (z b , z s ) with z = (z s , z b ) ∈ R 2N . On the classical side, F (H) for any differentiable function F : R → (0, ∞) yields a non normalized solution to the time independent Liouville equation. We assume that The non normalized marginal distribution for the subsystem is then which by the mean value theorem is equal to for somez b ∈ R 2(N −n) , that depends on H s (z s ). We note thatz b may be non unique. For given F and H b we introduce the notationz If the heat bath is uncoupled to the system, the valuez b does not depend on H s , i.e.
We make two additional assumptions.
Assumption A.2. The coupling between the heat bath and the system is weak in the sense that This assumption expresses that the coupling energy between the subsystem and the heat bath is much smaller than the subsystem energy H s (z s ). The second assumption is that the marginal distribution for the subsystem is a function of the subsystem Hamiltonian. In conclusion, molecular dynamics simulations often seek the property that the classical equilibrium for the subsystem is the same as for the larger environment, since the subsystem is supposed to model a larger system. We have shown that the classical Gibbs density is the only differentiable function with this property, in the sense of the derivation based on Assumption A.1, A.2 and A.3, while we do not know if the symbol for the quantum density matrix ρ q has the same property for a large number of particles. Therefore, we prefer to use the Weyl quantized classical Gibbs density e −βH 0 as our reference density, as in Theorem 3.7 and 3.2, since its drawback of being non time-independent solution to the quantum Liouville-von Neumann equation is mild, in the sense that the time dependent perturbation is small for long time, as shown in Section A.2.
Remark A.5. Let us finally informally motivate why it seems difficult to find time independent solutions to the quantum Liouville equation that are not functions of the Hamiltonian. An equilibrium density operator must commute with the Hamiltonian operator, by the Liouville-von Neumann equation, and consequently it is diagonalized by the same transformation as the Hamiltonian. The diagonalized density operator is then a function of the eigenvalues of the Hamiltonian operator if they are distinct, by mapping the eigenvalues of the Hamiltonian to the eigenvalues of the density operator. Assume that this mapping can be extended to a continuous mapping F : R → R + . We can then write the density operator as a function of the Hamiltonian, namelyρ = F (Ĥ).
A.2. Time-dependent density operators. One criterion for a density operator is that it is a time independent or approximately time-independent solution to the quantum Liouville-von Neumann equation, so that measurements of the observable at different times vary little. Here we show that the time dependent perturbation of observables based on the initial density matrix e −βH , namely the quantization of the classical Gibbs density, is small up to time t ≪ M , which in some sense justifies to use the density matrix e −βH . We include a motivation for an estimate which is uniform in the total number of particles N , based on a small system weakly coupled to a large heat bath.
As in (3.7) we see that v is diagonal since ρ 0 is diagonal. By Duhamel's principle we havê The remainder h includes error terms, as (p · ∇ x ) 3 Λ and (p · ∇)∆Λ, that typically are large proportional to N . To obtain estimates that do not increase with N , we need a new setting: we consider as in Section A.1 a smaller system, with coordinates (x s , p s ) = z s ∈ R 2n , weakly coupled to a large heat bath, with coordinates (x b , p b ) = z b ∈ R 2(N −n) , where n ≪ N . We assume weak coupling, as in (A.3), which here means that Λ(x s , x b ) = Λ s (x s ) + Λ b (x s , x b ) satisfies (A.6) |∇ xs Λ b (x s , x b )| ≪ 1 .
Let z = (z s , z b ) ∈ R 2N . To understand the heat bath perturbation with respect to the system Hamiltonian,H s , we can, on the operator level, study e −γĤs e −γĤ b = e −γ(Ĥs+Ĥ b )−γĤ δ based on the Baker-Campel-Hausdorff expansion forĤ δ , which to leading order satisfieŝ This perturbation is small in the sense that it is bounded with respect to N provided the coupling (A.6) is weak. Here γ = β or γ = iM 1/2 t for β, t ∈ R. Does the remainder e −βĤ also give a perturbation that is bounded with respect to N ? Integration by parts as follows in fact shows that the perturbation caused by h is bounded with respect to N . Introduce the notation D σ := A τ +σ #B σ (z) . We have by Lemma 3.11