1 Introduction

Realistic physical systems are far too large for being treated at the level of single particles in full resolution, and thus, a common approach is to consider small systems whose computation can be carried out at reasonable computational costs. The implicit assumption is that the corresponding statistical mechanics and thermodynamics represent the true physical situations within an acceptable degree of precision when the finite size effects are negligible in comparison with some reference quantity of interest. While this situation occurs in many fields of physics, chemistry and material science, a field in which such an approximation is routinely used is molecular simulation [1, 2]. Molecular simulation has made an enormous progress in the latest decades in successfully studying classical and quantum particle systems, but without the possibility of simulating a small system as a representative of an ideal infinite system, its power would have been modest due to the limitation of computational resources and the difficulties of data storage [3].

When replacing an infinite or very large system by a considerably smaller subsystem, the modelling error can be large and so can be the statistical error when averages over finitely many particles are considered [4, 5]. From this perspective, criteria that allow for a precise estimate of the finite size effects play a key role in the assessment of the quality of a simulation study. In a previous work [6], we have derived two-sided Bogoliubov bounds for the interface free energy required for the separation of a classical infinite system into weakly interacting small subsystems that can be used as an error indicator of the model fidelity as discussed above. The upper and lower bounds of the interface free energy provide quantitative and computable error bounds to quantify the relevance of the size effects. It is moreover possible to derive tight variational versions of the bounds that can be the basis for systematic improvements of available approximate bounds.

The aim of this paper is to generalise the bounds of Ref. [6] for classical particle systems to quantum systems. The classical Bogoliubov bounds rely on a change of measure of the underlying equilibrium probability measure and the non-negativity of the relative entropy between these probability measures; the difficulty here is the non-commutativity of quantum mechanical observables and density operators that makes a straightforward extension of the reasoning of the classical case difficult. Our approach that yields an exact quantum mechanical analogue of the classical bounds is based on the non-negativity of the relative entropy and additional trace inequalities for non-commuting self-adjoint operators. In contrast to the classical framework, there are various different (and sensible) notions of relative entropy between the statistical distributions of quantum systems [7,8,9]; see also Eq. (1.29) in Ref. [10]. It turns out that the natural relative entropy analogue for our purposes is the von Neumann relative entropy that has been introduced by Umegaki [11], since (a) it yields formally the same bounds as in the classical case and (b) it can be estimated by Monte Carlo methods. (We emphasize that there are different notions of divergences between probability measures in the classical case, too, beyond the relative entropy that is also known as Kullback–Leibler divergence; cf. Ref. [12]).

The paper is organised as follows: We first review the results of Ref. [6] in Sect. 2 and then introduce the von Neumann relative entropy between the statistical operators associated with two quantum systems in Sect. 3. Based on the properties of the von Neumann relative entropy, we derive bounds for the interface free energy in the quantum mechanical canonical ensemble in Sect. 4. The results are summarised and briefly discussed in Sect. 6.

2 Two-sided Bogoliubov inequality for classical systems

In this section, we review the key concepts and results of Ref. [6] which are mandatory for the extension to the quantum case. The interface energy resulting from the separation of a large system in independent subsystems, at positive temperature \(T > 0\), is defined as the difference between the free energy of the system and the free energy of the uncoupled subsystems. Specifically, we consider a classical system bound to a volume \(\Omega \subset \mathbb {R}^n\) which is described by a Hamiltonian \(H : \Omega \rightarrow \mathbb {R}\). Assume that H can be decomposed according to \(H = H_0 + U\) where \(H_0 := \sum _{i=1}^{d} H_i\), \(H_i : \Omega _i \rightarrow \mathbb {R}\), \(\Omega _i \subset \mathbb {R}^n\), \(\bigcup _{i=1}^{d} \Omega _i = \Omega \), is the Hamiltonian describing the \(d \in \mathbb {N}\) non-interacting subsystems and \(U : \Omega \rightarrow \mathbb {R}\) is the coupling energy between those systems. It will be assumed that the functions \(H_0\) and U are continuous, sufficiently fast growing at infinity and bounded from below. The partition functions Z and \(Z_0\) associated with H and \(H_0\) are given by

$$\begin{aligned} Z = \int \limits _{\Omega } \mathrm {e}^{-\beta H(x)} \, \mathrm {d}^n x \quad \text {and}\quad Z_0 = \int \limits _{\Omega } \mathrm {e}^{-\beta H_0(x)} \, \mathrm {d}^n x . \end{aligned}$$

We will refrain from indicating the dependency of the partition function on \(\beta \), \(\Omega \) and the particle numbers. In the following, we provide definitions and properties that are required for the final derivation of the upper and lower bounds of the interface energy.

Definition 2.1

(Interface energy) The difference in free energy \(\Delta F\) between the coupled system described by H and the uncoupled subsystems described by \(H_0\) is called interface energy, and it is given by

$$\begin{aligned} \Delta F := F - F_0 = - \beta ^{-1} \log \left( \frac{Z}{Z_0}\right) \ . \end{aligned}$$

We briefly review the situation for a classical statistical ensemble. To this end, let \((\Omega , \Sigma , P)\) be a probability space (or: ensemble) where \(\Sigma =\mathcal {B}(\Omega )\) denotes the \(\sigma \)-Algebra of Borel subsets of \(\Omega \). For convenience, we consider only probability measures with probability density function (pdf); specifically, we assume that there is an integrable, nonnegative function \(p :\Omega \rightarrow [0,\infty )\) such that for all \(A \in \Sigma \), it holds that

$$\begin{aligned} P(A) = \int \limits _{A} p(x) \, \mathrm {d}^n x . \end{aligned}$$

Definition 2.2

(Relative entropy) Let \(f, g : \Omega \rightarrow [0, \infty )\) be two pdfs on \(\Omega \). Assume that

$$\begin{aligned} \int \limits _{\{x \in \Omega \ : \ g(x) = 0\}} f(x) \, \mathrm {d}^n x = 0 . \end{aligned}$$

The relative entropy (also known as Kullback–Leibler divergence KL(fg)) between f and g is then defined as

$$\begin{aligned} R(f, g) := \int \limits _{\Omega } \log \left( \frac{f(x)}{g(x)}\right) f(x) \, \mathrm {d}^n x . \end{aligned}$$

In case that the integral of f over the set of zeros of g does not vanish (i.e. if \(g \ne 0\) does not hold almost everywhere on \(\Omega \) with respect to the probability measure P induced by the density f), one defines \(R(f, g) := \infty \). Note that the definition of R is based on the limit \(\lim _{x \rightarrow 0} x \log (x) = 0\).

It is a simple consequence of Jensen’s inequality that \(R(f, g) \ge 0\), with equality if and only if \(f = g\) holds P-almost everywhere [6].

2.1 Two-sided Bogoliubov inequality

In the following, it will always be assumed that the first argument of the relative entropy is strictly positive. This implies that null sets of the measure P are Lebesgue null sets. Let us denote by p and \(p_0\) the pdfs of the canonical ensemble associated with the Hamiltonians H and \(H_0\), i.e.

$$\begin{aligned} p := \frac{1}{Z} \, \mathrm {e}^{-\beta H} \quad \text {and}\quad p_0 := \frac{1}{Z_0} \, \mathrm {e}^{-\beta H_0} . \end{aligned}$$
(1)

The expectation of any integrable random variable (or: observable) O in the respective ensemble can then be written as

$$\begin{aligned} {\mathbf {E}}_p[O] := \int \limits _{\Omega } O(x) p(x) \, \mathrm {d}^n x \quad \text {and}\quad {\mathbf {E}}_{p_0}[O] := \int \limits _{\Omega } O(x) p_0(x) \, \mathrm {d}^n x . \end{aligned}$$

Theorem 2.3

(Two-sided Bogoliubov inequality) If the previous assumptions are satisfied, it follows that

$$\begin{aligned} {\mathbf {E}}_p[U] \le \Delta F \le {\mathbf {E}}_{p_0}[U] . \end{aligned}$$
(2)

For the full details of the proof, we invite the reader to consult Ref. [6], here we sketch it only in its essence:

Proof

As \(H_0\) and U are continuous functions growing sufficiently fast at infinity, it follows that U is integrable with respect to the densities p and \(p_0\), i.e. the expectation values are well-defined. Furthermore, p and \(p_0\) are strictly positive by construction. The non-negativity of the relative entropy implies:

$$\begin{aligned} 0 \le R(p, p_0)&= \int \limits _{\Omega } \log \left( \frac{p(x)}{p_0(x)}\right) p(x) \, \mathrm {d}^n x\\&= \int \limits _{\Omega } \left[ \log \left( \frac{\mathrm {e}^{-\beta H(x)}}{\mathrm {e}^{-\beta H_0(x)}}\right) + \log \left( \frac{Z_0}{Z}\right) \right] p(x) \, \mathrm {d}^n x \\&= - \beta \int \limits _{\Omega } \Bigl (H(x) - H_0(x)\Bigr ) p(x) \, \mathrm {d}^n x - \log \left( \frac{Z}{Z_0}\right) \int \limits _{\Omega } p(x) \, \mathrm {d}^n x \\&= - \beta \int \limits _{\Omega } U(x) p(x) \, \mathrm {d}^n x - \log \left( \frac{Z}{Z_0}\right) \ , \end{aligned}$$

that is

$$\begin{aligned} {\mathbf {E}}_p[U] = \int \limits _{\Omega } U(x) p(x) \, \mathrm {d}^n x \le - \beta ^{-1} \log \left( \frac{Z}{Z_0}\right) = \Delta F . \end{aligned}$$

Similarly, by interchanging the arguments of the relative entropy R, one obtains

$$\begin{aligned} 0&\le R(p_0, p) = \beta \int \limits _{\Omega } U(x) p_0(x) \, \mathrm {d}^n x + \log \left( \frac{Z}{Z_0}\right) = \beta \, {\mathbf {E}}_{p_0}[U] + \log \left( \frac{Z}{Z_0}\right) \ , \end{aligned}$$

i.e. the desired upper bound

$$\begin{aligned} \Delta F \le {\mathbf {E}}_{p_0}[U] . \end{aligned}$$

\(\square \)

Theorem 2.3 is a rigorous and powerful criterion to estimate the amount of statistical errors stemming from the microscopic nature of a system divided into non-interacting subsystems. It allows for a quantitative justification of the computation for a small system instead of a computationally unfeasible ideal system: simulating representative small subsystems in lieu of the fully coupled system is justified if the interface energy \(\Delta F\) is negligible compared to the energy scale of each subsystem. If the criterion holds, then the complexity of the molecular simulation is reduced from, roughly, \(\mathcal {O}\bigl ((3N)^{2}\bigr )\) to \(\mathcal {O}\bigl ((3N_1)^{2} + \cdots + (3N_d)^{2}\bigr )\) where \(N_k\), \(k = 1, \dotsc , d\), is the number of particles in the k-th subsystem and \(N = N_1 + \cdots + N_d\). If the criterion does not hold at a satisfactory level, then one has to revise the model of the system by modifying the interaction potential U or by changing the size of the subsystems in order to incorporate effects resulting from the environment [13].

Remark 2.4

The upper bound on \(\Delta F\) is the well-known Bogoliubov inequality or Peierls–Bogoliubov or Gibbs–Bogoliubov inequality [14,15,16]. It is possible to improve the bounds using the Gibbs variational principle; specifically, for any integrable random variable \(\phi \) and any positive pdf f, it holds that

$$\begin{aligned} {\mathbf {E}}_{p}[U + \beta ^{-1} \phi ] - \beta ^{-1} \log \bigl ({\mathbf {E}}_{p_0}[\mathrm {e}^{\phi }]\bigr ) \le \Delta F \le {\mathbf {E}}_{f}[U] + \beta ^{-1} R(f, p_0) . \end{aligned}$$
(3)

3 Statistical operator and quantum relative entropy

The bounds of Theorems 2.3 were proved in the framework of classical statistical mechanics, the natural question which arises at this point is whether an extension to quantum systems within the framework of quantum statistical mechanics is possible. This problem is addressed in this and in the following section. Specifically, in this section we will proceed with an elaboration of the framework of quantum statistics that allows the generalisation of the previous results. The key point in using the statistical or density operator (also known as density matrix) as the analogue of the phase-space pdf in the classical case is various trace inequalities that allow us to extend the concept of relative entropy to an equivalent quantum definition. The notion of relative entropy for quantum systems is not unique (e.g. Refs. [7, 8]), and it turns out that the suitable concept for our purposes is the classical definition of Umegaki [11], also termed the von Neumann relative entropy in quantum information theory [17]; cf. also Ref. [18].

3.1 Some density matrix theory

We start by recapitulating the key concepts of statistical quantum mechanics, referring to the standard textbook of Zeidler [19, Ch. 5.17]. To begin with, we consider a quantum system on a complex separable Hilbert space \(\mathcal {H}\). Given an orthonormal basis \((\psi _n)_{n \in \mathbb {N}} \subset \mathcal {H}\) and a sequence \((p_n)_{n \in \mathbb {N}} \subset \mathbb {R}\) of nonnegative real numbers with the properties

$$\begin{aligned} \forall n \in \mathbb {N}\ : \ 0 \le p_n \le 1 \quad \text {and}\quad \sum _{n=1}^{\infty } p_n = 1 \ , \end{aligned}$$

we refer to \(\Psi := (\psi _n, p_n)_{n \in \mathbb {N}}\) as a statistical state of the system where \(p_n\) is interpreted as the probability of finding the system in the state \(\psi _n\). In the following, only mixed states, characterised by \(p_n < 1\) for all \(n \in \mathbb {N}\), are of interest. If \(T : \mathcal {H}\supset {{\,\mathrm{dom}\,}}(T) \rightarrow \mathcal {H}\) is a self-adjoint linear operator representing an observable, we define the expectation of T in the statistical state \(\Psi \) by

$$\begin{aligned} {\mathbf {E}}_{\Psi }[T] := \sum _{n=1}^{\infty } p_n \left\langle \psi _n, T\psi _n\right\rangle . \end{aligned}$$

Note that the expectation comprises statistical averaging over the weights \(p_n\) resulting from the statistical nature of the state \(\Psi \) as well as quantum-mechanical averaging \(\left\langle \psi _n, T \psi _n\right\rangle \) resulting from the non-deterministic nature of quantum theory [20, Ch. 2.1.1].

Definition 3.1

(Statistical operator) A bounded self-adjoint linear operator \(\rho : \mathcal {H}\rightarrow \mathcal {H}\) is called statistical operator if there are numbers \((p_n)_{n \in \mathbb {N}} \subset [0, 1]\) with the property \(\sum _{n=1}^{\infty } p_n = 1\) and an orthonormal basis \((\psi _n)_{n \in \mathbb {N}} \subset \mathcal {H}\) such that the action of \(\rho \) on \(\psi \in \mathcal {H}\) is given by

$$\begin{aligned} \rho \psi := \sum _{n=1}^{\infty } p_n \left\langle \psi _n, \psi \right\rangle \psi _n \ . \end{aligned}$$
(4)

Note that for all \(m \in \mathbb {N}\), \(\psi _m\) is an eigenfunction of \(\rho \) with corresponding eigenvalue \(p_m\), i.e. \(\rho \psi _m = p_m \psi _m\). Furthermore, there is a one-to-one correspondence between statistical operators \(\rho \) and statistical states \(\Psi \) given by Eq. (4) (see [19, Ch. 5.17, Prop. 2]). Statistical operators are trace-class, hence compact. Therefore, they possess a discrete spectrum of eigenvalues with a corresponding orthonormal system of eigenvectors \((e_n)_{n \in \mathbb {N}} \subset \mathcal {H}\) such that

$$\begin{aligned} {{\,\mathrm{Tr}\,}}(\rho ) = \sum _{n=1}^{\infty } \left\langle e_n, \rho e_n\right\rangle = \sum _{n=1}^{\infty } \sum _{m=1}^{\infty } p_m \left\langle \psi _m, e_n\right\rangle \left\langle e_n, \psi _m\right\rangle = \sum _{m=1}^{\infty } p_m = 1 . \end{aligned}$$

Another relevant operator in this context, which is related to the concept of entropy, is

$$\begin{aligned} \rho \log \rho : \mathcal {H}\rightarrow \mathcal {H}, \quad (\rho \log \rho ) \psi := \sum _{n=1}^{\infty } p_n \log (p_n) \left\langle \psi _n, \psi \right\rangle \psi _n . \end{aligned}$$
(5)

(The mapping \(\rho \mapsto \rho \log \rho \) is operator convex.) The expectation of an observable T in a statistical state \(\Psi \) can now be expressed in terms of the statistical operator \(\rho \) as follows:

$$\begin{aligned} {\mathbf {E}}_{\rho }[T] = \sum _{n=1}^{\infty } p_n \left\langle \psi _n, T \psi _n\right\rangle = \sum _{n=1}^{\infty } \left\langle \rho \psi _n, T \psi _n\right\rangle = \sum _{n=1}^{\infty } \left\langle \psi _n, \rho T \psi _n\right\rangle = {{\,\mathrm{Tr}\,}}(\rho T) . \end{aligned}$$
(6)

Given a Hamiltonian \(H : \mathcal {H}\supset {{\,\mathrm{dom}\,}}(H) \rightarrow \mathcal {H}\) with associated partition function \(Z = {{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta H})\) which we assume to be finite, the canonical ensemble is described by the statistical operator

$$\begin{aligned} \rho := \frac{ \mathrm {e}^{-\beta H}}{{{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta H})} = \frac{\mathrm {e}^{-\beta H}}{Z} . \end{aligned}$$

This is the quantum-mechanical generalisation of the pdf p defined in Eq. (1). A representation of \(\rho \) of the form (4) is given in terms of the eigenfunctions \((\phi _n)_{n \in \mathbb {N}}\) of the Hamiltonian: letting \(H \phi _n = E_n \phi _n\), we have \(p_n = \mathrm {e}^{- \beta E_n} / \sum _{n=1}^{\infty } \mathrm {e}^{-\beta E_n} = \mathrm {e}^{- \beta E_n} / Z\).

The two-sided Bogoliubov inequality is basically a consequence of the non-negativity of the quantum-mechanical relative entropy that we define next. For a detailed overview of the history and the current mathematical status of the Bogoliubov inequality for quantum systems, we refer the reader to the textbook by Zagrebnov on Gibbs semigroups [21].

Definition 3.2

(Relative entropy [11]) Let \(\rho , \sigma : \mathcal {H}\rightarrow \mathcal {H}\) be two statistical operators. Define the quantum-mechanical relative entropy \(\mathcal {R}\) between \(\rho \) and \(\sigma \) to be

$$\begin{aligned} \mathcal {R}(\rho , \sigma ) := {{\,\mathrm{Tr}\,}}(\rho \log \rho ) - {{\,\mathrm{Tr}\,}}(\rho \log \sigma ) . \end{aligned}$$

The non-negativity of the relative entropy is a direct consequence of Klein’s inequality that, for two positive trace-class operators \(A,B: \mathcal {H}\rightarrow \mathcal {H}\) with \({{\,\mathrm{Tr}\,}}(A\log A)<\infty \), reads (see [22, Lem. 14])

$$\begin{aligned} {{\,\mathrm{Tr}\,}}(B\log B) \ge {{\,\mathrm{Tr}\,}}(B\log A + B - A). \end{aligned}$$
(7)

(For an elementary proof in the finite-dimensional case, see [10, App. A].). If one sets \(B=\rho \) and \(A=\sigma \), due to the fact that \({{\,\mathrm{Tr}\,}}(\rho )={{\,\mathrm{Tr}\,}}(\sigma )=1\) for statistical operators, and under the assumption that \({{\,\mathrm{Tr}\,}}(\sigma \log \sigma ) <\infty \) (trivially satisfied when \(\mathcal {H}\) is finite-dimensional, i.e. in applications of molecular simulation) one obtains that:

$$\begin{aligned} \mathcal {R}(\rho , \sigma ) = {{\,\mathrm{Tr}\,}}(\rho \log \rho ) - {{\,\mathrm{Tr}\,}}(\rho \log \sigma ) \ge 0 . \end{aligned}$$
(8)

It also follows from the strict concavity of the logarithm that \(\mathcal {R}(\rho , \sigma )=0\) if and only if \(\rho =\sigma \) in the sense that \(p_n=q_n\) for all \(n\in \mathbb {N}\), for which \(p_n\ne 0\). The crucial point of the current idea is that the non-negativity of \(\mathcal {R}\) can be used to derive the two-sided Bogoliubov inequality for statistical operators, as we will discuss next.

4 Two-sided quantum Bogoliubov inequality

Assume again that the Hamiltonian H can be decomposed according to \(H := H_0 + U\) and define

$$\begin{aligned} \rho _0 := \frac{\mathrm {e}^{-\beta H_0}}{Z_0},\; Z_0 := {{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta H_0}) \quad \text {and}\quad \rho := \frac{\mathrm {e}^{-\beta H}}{Z},\; Z := {{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta H}). \end{aligned}$$
(9)

Using linearity of the trace and \({{\,\mathrm{Tr}\,}}(\rho ) = 1\), we observe that

$$\begin{aligned} \mathcal {R}(\rho _0,\rho )&= {{\,\mathrm{Tr}\,}}\left[ \rho _0 \Bigl (\log (\mathrm {e}^{-\beta H_0}) - \log (Z_0)\Bigr )\right] - {{\,\mathrm{Tr}\,}}\left[ \rho _0 \Bigl (\log (\mathrm {e}^{-\beta H}) - \log (Z)\Bigr )\right] \\&= - \beta {{\,\mathrm{Tr}\,}}(\rho _0 H_0) - \log (Z_0) + \beta {{\,\mathrm{Tr}\,}}(\rho _0 H) + \log (Z) \\&= \beta {{\,\mathrm{Tr}\,}}\Bigl (\rho _0 (H - H_0)\Bigr ) + \log \left( \frac{Z}{Z_0}\right) \\&= \beta {{\,\mathrm{Tr}\,}}(\rho _0 U) + \log \left( \frac{Z}{Z_0}\right) \end{aligned}$$

which, together with inequality (8) implies the upper bound \(\Delta F \le {\mathbf {E}}_{\rho _0}[U]\), with \(\Delta F = -\beta ^{-1}\log (Z/Z_0)\). This is the famous Peierls–Bogoliubov inequality [14,15,16]; see also [10, App. A].

The proof of the lower bound of the two-sided Bogoliubov inequality proceeds along the same line, using the reversed relative entropy \(\mathcal {R}(\rho ,\rho _0)\ge 0\):

$$\begin{aligned} \mathcal {R}(\rho ,\rho _0)&= {{\,\mathrm{Tr}\,}}\left[ \rho \Bigl (\log (\mathrm {e}^{-\beta H}) - \log (Z)\Bigr )\right] - {{\,\mathrm{Tr}\,}}\left[ \rho \Bigl (\log (\mathrm {e}^{-\beta H_0}) - \log (Z_0)\Bigr )\right] \\&= \beta {{\,\mathrm{Tr}\,}}\Bigl (\rho (H_0 - H)\Bigr ) + \log \left( \frac{Z_0}{Z}\right) \\&= - \beta {{\,\mathrm{Tr}\,}}(\rho U) - \log \left( \frac{Z}{Z_0}\right) . \end{aligned}$$

This entails the lower bound \({\mathbf {E}}_{\rho }[U] \le \Delta F\). We summarize the calculation in the following theorem; an alternative proof, based on the differentiation of the exponential operator, can be found in [21, Sec. 3.4; cf. Cor. 3.22 and Remark 3.23].

Theorem 4.1

Let the partition functions \(Z_0,Z>0\) in (9) be finite and \(U=H-H_0\), with \({\mathbf {E}}_{\rho _0}[U]<\infty \) and \({\mathbf {E}}_{\rho }[U]<\infty \). Then,

$$\begin{aligned} {\mathbf {E}}_{\rho }[U] \le \Delta F\le {\mathbf {E}}_{\rho _0}[U]. \end{aligned}$$
(10)

Independently of the proof one may choose, an innovative aspect that needs to be underlined is that Theorem 4.1 can actually be applied to molecular simulations of quantum systems and to define the error due to finite size approximations which are unavoidable in simulations (above all in simulations of quantum systems). In the language of simulations, the theorem prescribes the calculation of the average energy at the interface of the subsystems in which a large system of reference is divided. The calculation must be carried for: (a) when the subsystems interact through the standard particle-particle interactions (which essentially corresponds to the calculation of an ideal surface energy in the large system of reference) and (b) in case the subsystems are non-interacting (which corresponds to effectively running separate simulations of smaller sizes). In Sect. 5, we will discuss the main features of a computational protocol for typical situations occurring in molecular simulations and specify explicitly the quantities involved, highlighting the practical utility of the result above.

4.1 Obtaining sharper bounds

The inequalities can be sharpened by using the upper bound of the Peierls–Bogoliubov inequality and the Golden–Thompson trace inequality. Specifically, we have:

Lemma 4.2

(Gibbs variational principle) Let \(M_1(\mathcal {H})\) be the set of Hermitian positive trace-class operators on \(\mathcal {H}\) with unit trace (i.e. statistical operators). Further, let \(\sigma \in M_1(\mathcal {H})\) and W be any self-adjoint positive operator on a suitable (dense) subspace of \(\mathcal {H}\), with compact resolvent, such that \(-\beta ^{-1}\log \sigma +W\) is a self-adjoint positive operator on the domain of \(-\beta ^{-1}\log \sigma \). Then,

$$\begin{aligned} -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W}\right) = \inf _{\gamma \in M_1(\mathcal {H})} \left\{ {{\,\mathrm{Tr}\,}}\!\left( \gamma W\right) + \beta ^{-1}\mathcal {R}(\gamma ,\sigma )\right\} . \end{aligned}$$
(11)

If \({{\,\mathrm{Tr}\,}}(\sigma \log \sigma )<\infty \) and \({{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta W}W)<\infty \), the infimum is attained at

$$\begin{aligned} \gamma ^* = \frac{\mathrm {e}^{\log \sigma -\beta W}}{{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W}\right) }. \end{aligned}$$
(12)

Proof

We consider the upper bound \(\Delta F\le {\mathbf {E}}_{\rho _0}[U]\) in (10) where, without loss of generality, we may assume that \({{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta H_0})=1\), such that \(Z_0=1\).

The upper bound can then be recast as

$$\begin{aligned} -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \rho _0 - \beta U}\right) \le {{\,\mathrm{Tr}\,}}\!\left( \rho _0 U\right) . \end{aligned}$$
(13)

Introducing the new potential \(V = U - \beta ^{-1}\log \rho _0\) turns the last inequality into

$$\begin{aligned} -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{- \beta V}\right) \le {{\,\mathrm{Tr}\,}}\!\left( \rho _0 V\right) + \beta ^{-1}{{\,\mathrm{Tr}\,}}\!\left( \rho _0\log \rho _0\right) . \end{aligned}$$
(14)

Note that \(\rho _0\) is arbitrary, in that the inequality holds for any combination of density operators \(\rho _0\in M_1(\mathcal {H})\) and semibounded observable V on \(\mathcal {H}\); therefore, we write (14) in what follows as

$$\begin{aligned} -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{- \beta V}\right) \le {{\,\mathrm{Tr}\,}}\!\left( \gamma V\right) + \beta ^{-1}{{\,\mathrm{Tr}\,}}\!\left( \gamma \log \gamma \right) \, \end{aligned}$$
(15)

for any density operator \(\gamma \), where equality is obtained either by setting \(\gamma =\mathrm {e}^{-\beta V}/{{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta V})\) for a given V, or by setting \(V=-\beta ^{-1}\log \gamma \) when \(\gamma \) is given. Shifting the potential by \(V\mapsto V + \beta ^{-1}\log \sigma =:W\) for some density operator \(\sigma \in M_1(\mathcal {H})\) and applying the Golden–Thompson rule

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{-(A+B)}\right) \le {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{-A}\mathrm {e}^{-B}\right) \end{aligned}$$
(16)

that holds for every pair AB of self-adjoint, positive operators on a suitable (dense) subspace of \(\mathcal {H}\), such that B is relatively bounded by A, with A-bound for B being less than 1 and both \(\mathrm {e}^{-A}\) and \(\mathrm {e}^{-B}\) being trace-class (see [23, Thm. 1]; cf. [24, Thm. 4] or [25, Thm. 2]), we obtain

$$\begin{aligned} -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \sigma \mathrm {e}^{- \beta W}\right) \le -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma - \beta W}\right) \le {{\,\mathrm{Tr}\,}}\!\left( \gamma W\right) + \beta ^{-1}\mathcal {R}(\gamma ,\sigma ). \end{aligned}$$

where we used Eq. (15) in the second step. Hence,

$$\begin{aligned} -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \sigma \mathrm {e}^{-\beta W}\right) \le {{\,\mathrm{Tr}\,}}\!\left( \gamma W\right) + \beta ^{-1}\mathcal {R}(\gamma ,\sigma ). \end{aligned}$$
(17)

To show that equality can be attained, note that the right-hand side is operator convex in \(\gamma \) and consider a non-decreasing sequence \((W_n)_{n\in \mathbb {N}}\) of bounded self-adjoint operators with \(W_n\rightarrow W\) in the sense of strong resolvent convergence [26]. Further assume \({{\,\mathrm{Tr}\,}}(\sigma \log \sigma )<\infty \) and \({{\,\mathrm{Tr}\,}}(\mathrm {e}^{-\beta W}W)<\infty \), and define the sequence \((\gamma _n)_{n\in \mathbb {N}}\) of density operators

$$\begin{aligned} \gamma _n = \frac{\mathrm {e}^{\log \sigma -\beta W_n}}{{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W_n}\right) },\quad n\ge 1. \end{aligned}$$
(18)

By [27, Prop. 10.1.13] and the boundedness assumption for \(W_n\), this implies strong convergence \(\Vert W_n h - Wh\Vert \rightarrow 0\) for any \(h\in \mathcal {H}\) and with \(\Vert \cdot \Vert \) denoting the norm on \(\mathcal {H}\). Then, \(\mathcal {R}(\gamma _n,\sigma )<\infty \) for all \(n\ge 1\), and, by Fatou’s Lemma (that entails lower semi-continuity of the trace), we have

$$\begin{aligned} \inf _{n\ge 1}\left\{ {{\,\mathrm{Tr}\,}}\!\left( \gamma _n W\right) + \beta ^{-1}\mathcal {R}(\gamma _n,\sigma )\right\}&= \inf _{n\ge 1} \left\{ \frac{{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W_n} (W-W_n)\right) }{{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W_n}\right) } -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W_n}\right) \right\} \\&\quad \le -\beta ^{-1}\log \liminf _{n\rightarrow \infty }{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W_n}\right) \\&\quad \le -\beta ^{-1}\log {{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W}\right) . \end{aligned}$$

This, together with (17), shows that the infimum in (11) is attained at

$$\begin{aligned} \gamma ^* = \frac{\mathrm {e}^{\log \sigma -\beta W}}{{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \sigma -\beta W}\right) }. \end{aligned}$$
(19)

\(\square \)

Remark 4.3

The relative boundedness assumption underlying the Golden–Thompson inequality (16) basically states that the extra potential W in (17) is a small perturbation of the Hamiltonian \(-\beta ^{-1}\log \sigma \) (e.g. for \(\sigma \propto \exp (-\beta H_0)\)) that preserves self-adjointness.

The Gibbs variational principle (11) has a dual form, known by the name of Donsker-Varadhan variational principle that expresses the relative entropy by a maximisation over observables:

Lemma 4.4

Under the assumptions of Lemma 4.2, it holds for all \(\gamma ,\sigma \in M_1(\mathcal {H})\) with finite relative entropy \(\mathcal {R}(\gamma ,\sigma )<\infty \) that

$$\begin{aligned} \mathcal {R}(\gamma ,\sigma ) = \sup _{{\theta \le 0}}\left\{ {{\,\mathrm{Tr}\,}}(\gamma \theta ) - \log {{\,\mathrm{Tr}\,}}\left( \mathrm {e}^{\log \sigma +\theta }\right) \right\} \, \end{aligned}$$
(20)

where the supremum is over all self-adjoint negative operators defined on a dense subspace of \(\mathcal {H}\).

Proof

Setting \(\theta =-\beta W\) in (17) and noting that the resulting lower bound for \(\mathcal {R}(\gamma ,\sigma )\) is operator concave in \(\theta \) yields the desired statement. \(\square \)

We can now combine Lemmas 4.2 and 4.4 with the inequality

$$\begin{aligned} \Delta F\le {\mathbf {E}}_\gamma [U]+\beta ^{-1}\mathcal {R}(\gamma ,\rho _0)\ , \end{aligned}$$

that, by Lemma 4.2, holds for an arbitrary density matrix \(\gamma \). Then by Theorem 4.1, we obtain after setting \(\sigma =\rho _0\) and \(W=U\) in (17) and (20):

Corollary 4.5

Under the assumptions of Theorem 4.1, it holds

$$\begin{aligned} \sup _{{V\ge 0}}\left\{ {\mathbf {E}}_\rho [U - V] - \beta ^{-1}\log {{{\,\mathrm{Tr}\,}}\!\left( \mathrm {e}^{\log \rho _0-\beta V}\right) } \right\} = \Delta F = \inf _{\gamma \in M_1(\mathcal {H})}\left\{ {\mathbf {E}}_{\gamma }[U] + \beta ^{-1}\mathcal {R}(\gamma ,\rho _0)\right\} .\nonumber \\ \end{aligned}$$
(21)

In particular, we have the family of two-sided bounds that is valid for any positive observable V on \(\mathcal {H}\) and any density matrix \(\gamma \in M_1(\mathcal {H})\):

$$\begin{aligned} {\mathbf {E}}_{\rho }[U-V] - \beta ^{-1}\log {{{\,\mathrm{Tr}\,}}\left( \mathrm {e}^{\log \rho _0-\beta V}\right) } \le \Delta F\le {\mathbf {E}}_{\gamma }[U] + \beta ^{-1}\mathcal {R}(\gamma ,\rho _0). \end{aligned}$$
(22)

By the Golden–Thompson inequality, Eq. (22) implies

$$\begin{aligned} {\mathbf {E}}_{\rho }[U-V] - \beta ^{-1}\log {\mathbf {E}}_{\rho _0}\!\left[ \mathrm {e}^{-\beta V}\right] \le \Delta F\le {\mathbf {E}}_{\gamma }[U] + \beta ^{-1}\mathcal {R}(\gamma ,\rho _0) \end{aligned}$$
(23)

where the lower bound can in general not be attained, unless \(H_0\) and V commute since in this case it holds that \({{\,\mathrm{Tr}\,}}(\mathrm {e}^{\log \rho _0-\beta W}) = {\mathbf {E}}_{\rho _0}[\mathrm {e}^{-\beta W}]\). If the operators do not commute, the left-hand side may be smaller than the right-hand side, so Eq. (23) yields a slightly weaker lower bound. Finally, note that the bounds of Theorem 4.1 are a special case obtained by setting \(V=0\) and \(\gamma =\rho _0\).

5 Sketch of the computational protocol for molecular simulations

A typical situation where an optimal criterion for the separation of a large system into smaller independent subsystems is of particular importance occurs in the determination of the optimal size of the simulation box for a molecular liquid at a given molecular density. In principle, one needs a large number of molecules so that at the electronic level, microscopic properties such as spectroscopic responses linked to, e.g. molecular bonding are well-described. However, the cost of a large simulation often goes beyond the available computational resources, and thus, one needs to choose a system as small as possible while still being able to reasonably reproduce the properties of interest.

The criterion given in Theorem 4.1 can be used to define the optimal size of the simulation box for electronic properties of a molecular liquid. Figure 1 illustrates a typical setup of this kind for a static situation, and the example below treats the simple case of partitioning a large system into two smaller subsystems; the extension to several subsystems is straightforward. In the current example, the electronic Hamiltonian (in atomic units) of the whole system in the domain \(\Omega \) takes the form

$$\begin{aligned} H=-\frac{1}{2}\sum _{i=1}^{N}\nabla _{i}^{2}+\sum _{1 \le i < j \le N}\frac{1}{|\mathbf{r}_{i}-\mathbf{r}_{j}|}-\sum _{I=1}^{M}\sum _{i=1}^{N}\frac{Z_{I}}{|\mathbf{R}_{I}-\mathbf{r}_{i}|} \end{aligned}$$
(24)

where N is the total number of electrons, M the total number of nuclei and \(Z_{I}\) the charge of the I-th nucleus; \({\mathbf {r}}_i\) and \({\mathbf {R}}_I\) denote the positions of the electrons and nuclei, respectively. The question is whether the approximation of considering only one of the subsystems, e.g. the one defined in the domain \(\Omega _{1}\), would be sufficient to properly address the local electronic properties, and thus to avoid to include the rest of the box which occupies the domain \(\Omega _{2}\).

Fig. 1
figure 1

The simulation box of a molecular liquid (e.g. water). The upper part schematically illustrates the large system with domain \(\Omega \) and full Hamiltonian H, while the lower part illustrates the partitioning of the large system into two independent subsystems with domains \(\Omega _{1}\) and \(\Omega _{2}\) and corresponding Hamiltonians \(H^{0}_{1}\) and \(H^{0}_{2}\). The two subsystems can be treated in separate simulations

The above question is equivalent to the problem of determining the degree of independence of the two subsystems with respect to the larger system of reference as expressed by Theorem 4.1. In this context, the Hamiltonian for the system in the domain \(\Omega _{1}\) reads

$$\begin{aligned} H^{0}_{1}=-\frac{1}{2}\sum _{i=1}^{n}\nabla _{i}^{2}+\sum _{1 \le i < j \le n}\frac{1}{|\mathbf{r}_{i}-\mathbf{r}_{j}|}-\sum _{I=1}^{W}\sum _{i=1}^{n}\frac{Z_{I}}{|\mathbf{R}_{I}-\mathbf{r}_{i}|} \end{aligned}$$
(25)

where n is the number of electrons and W is the number of nuclei in \(\Omega _{1}\). Similarly, the Hamiltonian for the system in the domain \(\Omega _{2}\) is given by

$$\begin{aligned} H^{0}_{2}=-\frac{1}{2}\sum _{i=1}^{m}\nabla _{i}^{2}+\sum _{1 \le i < j \le m}\frac{1}{|\mathbf{r}_{i}-\mathbf{r}_{j}|}-\sum _{I=1}^{Y}\sum _{i=1}^{m}\frac{Z_{I}}{|\mathbf{R}_{I}-\mathbf{r}_{i}|} \end{aligned}$$
(26)

where m is the number of electrons and Y is the number of nuclei in \(\Omega _{2}\). Then, the operator U appearing in Theorem 4.1 for the given instantaneous partitioning of the system takes the form

$$\begin{aligned} U=\sum _{i=1}^{n}\sum _{k=1}^{m}\frac{1}{|\mathbf{r}_{i}-\mathbf{r}_{k}|}-\sum _{K=1}^{Y}\sum _{i=1}^{n}\frac{Z_K}{|\mathbf{R}_{K}-\mathbf{r}_{i}|}-\sum _{I=1}^{W}\sum _{k=1}^{m}\frac{Z_I}{|\mathbf{R}_{I}-\mathbf{r}_{k}|} \end{aligned}$$
(27)

with \(\mathbf{r}_{i},\mathbf{R}_{I} \in \Omega _{1}\) for all \(i=1, \dotsc , n\) and \(I=1, \dotsc , W\), and \(\mathbf{r}_{k},\mathbf{R}_{K} \in \Omega _{2}\) for all \(k=1, \dotsc , m\) and \(K=1, \dotsc , Y\).

With the partitioning of \(\Omega \) defined above, the calculation of the quantities in Theorem 4.1 can be done through the density matrix \(\rho _{\Omega }\) of the full reference system corresponding to the Hamiltonian H in the whole domain \(\Omega \) while the density matrix corresponding to the two non-interacting subsystems \(\rho _{1,2}=\rho _{\Omega _{1}}\otimes \rho _{\Omega _{2}}\) is determined by \(\rho _{\Omega _{1}}\) and \(\rho _{\Omega _{2}}\) calculated in two separate simulations on \(\Omega _{1}\) with \(H^{0}_{1}\) and on \(\Omega _{2}\) with \(H^{0}_{2}\), respectively. Equation (10) then implies

$$\begin{aligned} {\mathbf {E}}_{\rho _{\Omega }}[U] \le \Delta F\le {\mathbf {E}}_{\rho _{1,2}}[U]. \end{aligned}$$
(28)

Furthermore, the definition of the operator U shows that the interactions with respect to the electronic degrees of freedom are only of one-body and two-body form; thus, the density matrix representations needed for the calculations are one-body and two-body reduced density matrix terms, that is, three-dimensional electron densities and the two-body electron–electron correlation (for example given by the electron radial distribution function \(g(|\mathbf{r}-\mathbf{r'}|)\)). Such quantities are routinely computed in electronic structure calculations and numerical schemes for quantum chemistry, e.g. Kohn–Sham Density Functional Theory [28], Quantum Monte Carlo [29] and high level quantum-chemical techniques [30]. In molecular dynamics simulations, the statistics is enlarged by repeating the procedure and considering several instantaneous, uncorrelated configurations along the molecular trajectory of the system.

If the mean coupling energies \({\mathbf {E}}_{\rho _{\Omega }}[U]\) and \({\mathbf {E}}_{\rho _{1,2}}[U]\) per molecule (i.e. divided by the number of atoms) have values of the order of the characteristic energy scale of the quantity of interest, such as the molecule–molecule energy bond per molecule, then one can conclude that the model error due to the chosen size of the (isolated) simulation box is too large. Conversely, when \({\mathbf {E}}_{\rho _{\Omega }}[U]\) and \({\mathbf {E}}_{\rho _{1,2}}[U]\) have values much smaller than the physical quantity of reference, one can reasonably trust in the simulation setup chosen. Once an optimal box size is determined, according to the protocol suggested here, this information can be used in all the successive simulations for the same quantities of interest. In perspective, one may be able to extend this idea to the definition of the optimal size of the quantum region in quantum mechanical/molecular mechanical simulation where a quantum region is embedded into a larger classical molecular systems [31], or to the determination of the corrections required in the computational technique of molecular fragments where large polyatomic molecules such as polymers are divided into independent fragments and treated independently via quantum-chemical calculations [32], a technique which seems to be very promising for calculations on (futuristic) quantum computers [33].

6 Discussion and conclusions

Large systems of particles in fully atomistic resolution are a computational challenge for numerical simulations. The routinely used approximation is to treat small systems as representatives of large systems under the assumption that the influence of finite size effects is negligible in the computation of physical and chemical quantities of interest. The latter assessment requires precise and rigorous criteria of controlling these effects, otherwise modelling artefacts may easily deteriorate the predictions that can be obtained from finite systems. This work continues the efforts that were undertaken in a previous paper, in which a general criterion to precisely estimate the effect of finiteness of the system was developed for classical systems. Here, the extension to quantum systems is made by introducing the operator formalism for the equivalent classical quantities, namely, the statistical operator which is formally equivalent to the classical phase-space probability distribution and the von Neumann relative entropy that is commonly used in quantum information theory.

In doing so, we have proved a two-sided Hilbert space version of the well-known Bogoliubov inequality that is applicable to simulation of infinite-dimensional quantum systems, regardless of whether these systems are fermionic or bosonic. The bounds can be useful in connection with electronic structure calculations for open systems where finite size effects are often a major burden in the development of efficient computational models [34,35,36], or they can be used for bosonic and semiclassical systems in path integral molecular dynamics simulations, in which the control of finite size effects is the current bottleneck for applications in many fields of interest [37,38,39,40].