A variational framework for the inverse Henderson problem of statistical mechanics

The inverse Henderson problem refers to the determination of the pair potential which specifies the interactions in an ensemble of classical particles in continuous space, given the density and the equilibrium pair correlation function of these particles as data. For a canonical ensemble in a bounded domain it has been observed that this pair potential minimizes a corresponding convex relative entropy functional, and that the Newton iteration for minimizing this functional coincides with the so-called inverse Monte Carlo (IMC) iterative scheme. In this paper we show that in the thermodynamic limit analogous connections exist between the specific relative entropy introduced by Georgii and Zessin and a proper formulation of the IMC iteration in the full space. This provides a rigorous variational framework for the inverse Henderson problem, valid within a large class of pair potentials, including, for example, Lennard-Jones type potentials. It is further shown that the pressure is strictly convex as a function of the pair potential and the chemical potential, and that the specific relative entropy at fixed density is a strictly convex function of the pair potential. At a given reference potential and a corresponding density in the gas phase we determine the gradient and the Hessian of the specific relative entropy, and we prove that the Hessian extends to a symmetric positive semidefinite quadratic functional in the space of square integrable perturbations of this potential.

1. Introduction. Numerical simulation has established itself as an independent and indispensable branch of research in the natural sciences, on equal footing with theory and experiment. To be truly useful, numerical approaches have to face and master the multiscale nature which is ubiquitous in almost all relevant applications. This is particularly true for soft matter, where spatial scales may bridge from the electron scale up to the millimetre scale of biomaterials or polymers. Concerning examples we refer to the excellent survey by Noid [31], the collected volume edited by Monticelli and Salonen [28], or the recent special issue [40] of Journal of Physics: Condensed Matter.
An important technique to advance numerical algorithms to the particular needs of multiscale applications consists in coarse-graining, cf., e.g., Noid [32], or Peter and Kremer [33]: small-scale features are discarded on the coarser scale by replacing detailed descriptions of molecules or matter by artificial beads of a certain shape. The simulation then focusses only on these beads and their interaction with the other constituents of the system. When and where necessary, fine details can be reinserted back into the simulation for better accuracy as, e.g., in the AdResS scheme ( [34,6]).
Of course, to evaluate the equations of motion for the coarse-grained model it is necessary to derive the prevailing effective forces on these beads. Concerning the transition from an atomic microscale to a molecular macroscale in thermodynamic equilibrium, for example, there are essentially two ways to settle this problem. On the one hand one can employ an ab initio bottom-up approach and evaluate and assemble the resulting forces from the eliminated details of the fine-grained description, cf. Ercolessi and Adams [5], or run a fine-grained simulation, compute the corresponding forces, and somehow approximate them on the coarse level as suggested, e.g., by Izvekov and Voth [20], and Wang et al [46]. The other alternative is to follow a top-down strategy and use the given structural information about the location of the beads on the coarse scale to formulate an inverse problem: Which are the appropriate forces or interactions on the coarse level that define ensembles with the same structural properties?
In this work we treat one of the simplest incarnations of the latter approach. Let us presume that the probabilities of the snapshots of the coarse-grained bead ensembles are in good agreement with a model which only uses additive pairwise translation invariant interactions of the beads. Such interactions can be formulated in terms of a scalar pair potential which only depends on the relative position of the respective pair of beads. The pair correlation function, which measures the empirical likelihood to observe two beads at a given relative position, appears to be an adequate piece of data to be used for finding the corresponding potential, because both the data and the unknown consist of a scalar function of the space variable in that case. In fact, in a celebrated paper, Henderson [18] argued that the pair potential is uniquely specified this way, i.e., under given coditions of temperature and density, no two different pair potentials can give rise to the same piece of empirical data. Finding a pair potential that reproduces the given pair correlation function is therefore sometimes called the inverse Henderson problem.
The numerical solution of the inverse Henderson problem is demanding because of the lack of a mathematical formula for computing the pair correlation function for a given pair potential, or vice versa. In the old days people have developed approximate identities like the Percus-Yevick or hypernetted chain integral equations for this purpose, cf. Hansen and McDonald [17], or have used parametric ansatz functions like, e.g., Lennard-Jones potentials, and optimized the corresponding parameters numerically. Today, the state of the art is to use non-parametric potentials and employ iterative schemes which, in each iteration, simulate the equilibrium ensemble for the current guess of the pair potential, and use the associated data fit to somehow generate a new guess. Well-known examples are the iterative Boltzmann inversion (IBI), cf. Soper [42] and Reith, Pütz, and Müller-Plathe [35], and the inverse Monte-Carlo method (IMC) by Lyubartsev and Laaksonen [26]. We recommend the valuable reviews by Toth [44] and Rühle et al [37] for a comparison of these and further methods.
We emphasize that the setting of the inverse Henderson problem is generally accepted to be far too simplistic to capture all the relevant features of a real system, cf., e.g., [25,32,45], mostly because multibody interactions are neglected. In particular, thermodynamic properties of the coarse-grained model may differ from the real system, especially at other temperatures or densities.
But its simplicity offers a great opportunity for a mathematical analysis, which in turn may lead to a better understanding of other, more flexible coarse-graining techniques that are routinely being employed in practice. Still, only few rigorous mathematical results have yet been obtained, e.g., in [2,23,24,30,16,7], the reason being, again, the lack of explicit formulae to attack the problem.
The aim of this paper is to point out and advocate an alternative access point for theoretical investigations, which goes back to a nice observation by Shell [41] from within the chemical physics community: He argues that the Henderson potential minimizes the (information theoretic) relative entropy where -in his words -the summation is over all possible (coarse grained) ensemble configurations γ, P * (γ) denotes the target (or observed) probability of γ, and P(γ) is the corresponding probability of a model with a given pair potential. (Compare, for example, Georgii [12] for background on the concept of relative entropy in stochastics.) Subsequently, Murtola, Karttunen, and Vattulainen [29] pointed out that the functional (1.1) is convex, and that the Newton iteration for minimizing the relative entropy coincides with the aforementioned IMC iteration (see also Rosenberger et al [36]). Like Henderson's paper [18], the results in [29,36,41] lack mathematical rigor, because their arguments are restricted to bounded domains -whereas an unambiguous definition of a "translation invariant ensemble" is only possible in the full space. Concerning the Henderson theorem this shortcoming has recently been fixed in [7], building on fundamental work by Ruelle and by Georgii. Here we focus on the proposal by Shell and his colleagues, provide a rigorous justification of their results, and elaborate further on them.
To be specific we outline in Section 2 that the (appropriately formulated) relative entropy (1.1) divided by the volume of the bounded domain converges in the thermodynamic limit to the specific relative entropy, first introduced for continuum systems by Gallavotti and Miracle-Sole [8] in the case of hard-core interactions, and further investigated by Georgii and Zessin in [13,9,10] for general additive pair interactions. Under mild assumptions on the model and target ensembles to be utilized we prove that this specific relative entropy is a strictly convex functional on the corresponding set of pair potentials (which include Lennard-Jones type and hard-core potentials), and that its (unique) minimizer is the particular potential which solves the inverse Henderson problem (see Sections 3 and 4).
From Section 5 onwards we restrict ourselves to low densities (the "gas phase"), where the specific relative entropy is a differentiable function of the pair potential. We calculate its Hessian, and in Section 6 we verify that the Newton iteration for minimizing the specific relative entropy does indeed coincide with the IMC iteration formulated in the thermodynamic limit. In Section 7 we investigate the Hessian in more detail and show that it can be represented by a selfadjoint positive semidefinite operator in L 2 . The mapping properties of this operator can be analyzed somewhat further for the particular class of Lennard-Jones type pair potentials; we conclude with a corresponding result in Section 8.
We hope that this variational framework for the inverse Henderson problem opens a possibility to discuss the convergence of the IMC iteration, or to come up with measures to stabilize or regularize this popular iterative scheme.
with density ρ * and finite locally second moments, compare Georgii [9]. For the model ensemble we restrict ourselves to ensembles of classical particles with additive pairwise interactions defined by a measurable even pair potential u : Concerning the latter we assume that there exists r 0 > 0 and decreasing positive functions ϕ : (0, r 0 ) → R + 0 and ψ : holds true almost everywhere; for our convenience we take ψ to be a bounded and decreasing function defined for all r ≥ 0. We denote by U 0 the subset of the above pair potentials, which also belong to L ∞ loc (R 3 \ {0}), and define U to be the union of U 0 with the hard-core potentials, which satisfy (2.2) with ψ as above and ϕ replaced by +∞; accordingly, r 0 is taken to be the hard-core radius in this case. Technically we do not distinguish between potentials (and functions in general) which differ on sets of Lebesgue measure zero.
Remark 2.1. The set U is convex. To see this let u i ∈ U , i = 1, 2, be such that (2.1) and (2.2) hold for r 0,i > 0 and decreasing functions ϕ i and ψ i satisfying (2.1), respectively. Without loss of generality we may assume that r 0,1 ≤ r 0,2 . Let u = tu 1 + (1 − t)u 2 for some fixed t ∈ (0, 1). We distinguish two cases. If u 2 is a hard-core potential, then u is also a hard-core potential with hard-core radius r 0,2 . In this case we can choose r 0 = r 0,2 , ϕ(r) = +∞ for 0 < r < r 0,2 , ψ(r) = tψ 1 (r) + (1 − t)ψ 2 (r) for r ≥ 0 , to achieve the assumption (2.2) for u. On the other hand, if u 2 is no hard-core potential, then u 2 is bounded on [r 0,1 , r 0,2 ), and there exists c ≥ 1, such that It follows that u satisfies the inequalities (2.2) with r 0 = r 0,1 and In either case we have verified that u ∈ U , hence U is convex.
In analogy to Shell, who considered the relative entropy framework for canonical ensembles in Λ, we take the restriction P * ,Λ of P * as target, and the grand canonical ensemble in Λ with chemical potential µ ∈ R and pair potential u ∈ U as model. For a configuration γ ∈ Γ of N = N (γ) particles located at {x 1 , . . . , x N } ⊂ Λ the probability density of the model is thus given by where β > 0 is the inverse temperature, is the interaction energy, and is the corresponding grand canonical partition function. Note that most of the quantities we deal with depend on β; however, since we will keep the temperature fixed throughout this paper, we refrain from making this dependency explicit in our notation.
Let S(P * ,Λ ) denote the entropy associated with P * ,Λ . In the spirit of (1.1) we then specify the relative entropy where E * ,Λ [ · ] denotes expectation with respect to P * ,Λ . Take note that the entropy and the expected interaction energy may be +∞. Dividing by β|Λ| we arrive at Now we want to derive the analog of this identity for the thermodynamic limit ℓ → ∞. Concerning this limit it is known that there is a sequence (ℓ k ) k with ℓ k → ∞, such that the probability densities associated with the corresponding grand canonical ensembles converge in a local topology to a translation invariant tempered Gibbs measure on Γ, cf. Ruelle [39], for brevity called (µ, u)-Gibbs measure in the sequel. In general different sequences may lead to different (µ, u)-Gibbs measures, e.g., when the system exhibits a phase transition. For any such sequence, however, the limit is always the same well-defined and nonnegative finite number, namely the pressure of the ensemble in the thermodynamic limit, cf. Ruelle [38]. Therefore, passing in (2.4) to the thermodynamic limit ℓ → ∞, we can (uniquely) define the relative entropy S rel (µ, u) of all these (µ, u)-Gibbs measures with respect to the target model P * as where the individual terms on the right-hand side of (2.6) are the corresponding limits of the respective terms in (2.4): S * = S(P * ) is the specific entropy and E(u, P * ) is the specific interaction energy (with respect to the potential u) of the target ensemble, both of which have been shown to be well-defined in R ∪ {+∞}, cf. [11,13]. If P * satisfies a Ruelle condition (compare (A.1) in the appendix) then the specific entropy is finite; this is the case, e.g., when P * is a (µ * , u * )-Gibbs measure for some u * ∈ U and µ * ∈ R, compare [39,Corollary 5.3]. For this latter particular case it has further been shown in [7] that where ρ (2) * (x) is the pair correlation function associated with P * . Here, and throughout this paper, we deliberately use the short-hand notation ρ (2) (x) instead of ρ (2) (x, 0) for the pair correlation function of a translation invariant point process.
In [10] Georgii studied the relative entropy S rel (µ, u) in detail and established the following Gibbs variational principle: Theorem A (Gibbs variational principle). Let P * be a translation invariant probability measure on Γ with finite locally second moments. Then the relative entropy (2.6) is nonnegative real or +∞ for every u ∈ U and µ ∈ R. There holds S rel (µ, u) = 0, if and only if P * is a (µ, u)-Gibbs measure.
We have utilized this result in [7] to prove a rigorous version of the Henderson theorem: Theorem B (Henderson theorem). Let u 1 , u 2 ∈ U and µ 1 , µ 2 ∈ R. If P 1 and P 2 are (µ 1 , u 1 )and (µ 2 , u 2 )-Gibbs measures, respectively, which share the same density and the same pair correlation function, then u 1 = u 2 and µ 1 = µ 2 .
Combining Georgii's Gibbs variational principle and the Henderson theorem we can formulate an alternative version of the Gibbs variational principle. This version is the one that we will mostly use below.
Theorem 2.2 (Gibbs variational principle, alternative form). If the target P * is a (µ * , u * )-Gibbs measure for some u * ∈ U and µ * ∈ R, then the relative entropy S rel (µ, u) becomes minimal, if and only if u = u * and µ = µ * .
Proof. According to Theorem A it remains to investigate the case when S rel (µ, u) = 0 for some u ∈ U and µ ∈ R. The Gibbs variational principle states that P * then is a (µ * , u * )-and (µ, u)-Gibbs measure at the same time. In particular this means that these two Gibbs measures share the same density and pair correlation function. The assertion thus follows from the Henderson theorem.
Georgii investigated the relative entropy for fixed interaction and chemical potentials u and µ, and varied the target model P * . In connection with the inverse Henderson problem our interest is sort of dual to this: we assume that P * is fixed, and consider the relative entropy as a function of µ and u.
3. Strict convexity of the pressure and the relative entropy. It is wellknown that the pressure is a convex function of the chemical potential, cf. [38, Theorem 3.4.6]. This convexity is strict, whenever a Gibbs variational principle is valid, cf., e.g., Hughes [19,Section 4.3]. In the sequel we show that under our assumptions the pressure is also strictly convex in µ and u.
Theorem 3.1. The pressure p = p(µ, u) of (2.5) is a strictly convex function of µ ∈ R and u ∈ U . Moreover, for u ∈ U 0 there holds Proof. To begin with we first observe that R×U is convex by virtue of Remark 2.1. Now let µ 1 , µ 2 ∈ R and u 1 , u 2 ∈ U be arbitrarily chosen, not both being equal at the same time, and define for some fixed t ∈ (0, 1). When choosing for P * a corresponding (µ, u)-Gibbs measure, then it follows from (2.6) and the Gibbs variational principle of Theorem 2.2 that In particular, the specific entropy S * is finite because P * is a Gibbs measure, and hence, so is the specific interaction energy. Likewise we obtain which gives because of the linearity of the specific interaction energy. A comparison with (3.2) thus shows that the pressure is strictly convex. Consider now a fixed pair potential u ∈ U 0 . It has been shown in the proof of [9, Lemma 7.1] that for any ρ > 0 there exists a translation invariant probability measure P ρ on Γ with density ρ, such that S(P ρ ) < ∞, and E(u, P ρ ) < ∞. Choosing P * = P ρ in (2.6), Georgii's Gibbs variational principle (Theorem A) yields the inequality for every µ ∈ R. In other words, and hence, Since ρ > 0 has been arbitrary, this implies (3.1). We thus have shown that the relative entropy (2.6) is the sum of a strictly convex funtional and an affine function of (µ, u). Accordingly, the relative entropy is also a strictly convex functional of µ and u, as long as it is finite.
Another immediate consequence of Theorem 2.2 is the following inequality for the pressure.
Proof. The two inequalities (3.4) follow readily from Theorem 2.2 by choosing for P * the corresponding (µ i , u i )-Gibbs measures, respectively. They are strict, unless µ 1 = µ 2 and u 1 = u 2 .
4. The relative entropy functional for fixed density. Returning to the inverse Henderson problem formulated in the introduction we now constrain our model ensembles to have the same density ρ * as the target ensemble. Since the attainable densities for hard-core potentials are bounded we need to distinguish the case whether u is a hard-core potential or not. We focus our analysis on the latter case and mention the necessary modifications for hard-core potentials in Remark 4.4 later in this section.
The first fundamental problem to settle concerns the question whether and how the prescribed density ρ * can be attained by some (µ, u)-Gibbs measure for a given u ∈ U 0 . When the chemical potential is sufficiently small, i.e., when the system is in the gas phase (see Section 5 for a specification of this term), then it is known that there is a one-to-one relation between the corresponding chemical potentials and the associated densities, and in this case the inverse map ρ → µ can even be computed by means of cluster expansions, cf., e.g., Jansen, Kuna, and Tsagkarogiannis [22]. Outside the gas phase, when phase transitions may occur, the problem becomes more difficult. Adopting a method from Chayes and Chayes [2] we can establish the following result, which holds for the full range of possible densities.
Proof. For every µ ∈ R and the given u ∈ U 0 let P µ,u be a (µ, u)-Gibbs measure, and denote by p(µ) and ρ(µ) the associated pressure and density, respectively.
It is well-known, cf. [38,Theorem 4.3.1], that for small chemical potentials the pressure is a differentiable function with In other words, if ρ * is sufficiently small then the corresponding chemical potential µ * from the formulation of the theorem is given as the unique minimum of the function the latter being strictly convex by virtue of Theorem 3.1. We will proceed by showing that the minimizer of Θ is also the appropriate chemical potential to choose for larger values of ρ * . To see this we first observe that because the pressure is nonnegative, whereas for any suitable ε > 0 and corresponding constant c (1+ε)ρ * by virtue of (3.3) (with ρ = (1 + ε)ρ * ). This shows that Θ is bounded from below and that Therefore Θ attain its minimum for a uniquely defined value µ = µ * . For any µ ∈ R, µ = µ * , we now conclude from (3.4) and (4.2) that This means that Now let (µ + k ) k be a strictly decreasing sequence and (µ − k ) k a strictly increasing sequence of chemical potentials, both of which converge to µ * By virtue of Lemma A.1 from the appendix there exist (µ * , u)-Gibbs measures P − and P + with densities The inequalities (4.4) imply that Since the set of (µ * , u)-Gibbs measures is convex, P has all the desired properties from the statement of this theorem, and the proof is done. In the light of the above theorem we can now restrict our attention to (µ, u)-Gibbs measures with density ρ * , i.e., with chemical potential µ = µ * (u), when looking at the relative entropy functional. Further, we drop the constant offset S * in (2.6), as it is independent of u. This leads to the functional for brevity. By a slight abuse of wording we will keep calling Φ the relative entropy functional. As we will see next, although p * may fail to be convex, in general, the functional Φ is strictly convex, again. Theorem 4.2. Let P * be as in Theorem A. Then the functional Φ : U 0 → R of (4.5) is strictly convex as long as the specific interaction enery E(u, P * ) is finite.
Proof. Let u 1 and u 2 be two different pair potentials from U 0 . For any fixed 0 < t < 1 define and, as in Theorem 4.1, denote by µ = µ * (u) the chemical potential associated with u and density ρ * . Finally, let ρ (2) be the pair correlation function of the associated (µ, u)-Gibbs measure constructed in Theorem 4.1.
Then we obtain from (4.6) and (3.4 Note that this inequality is strict because u is different from u 1 and u 2 by construction.
Reordering terms we thus arrive at It thus follows from (4.5) and (4.7) that if the specific interaction energy E( · , P * ) stays finite then Φ is strictly convex, because E( · , P * ) is linear in the first argument.
Theorem 4.2 implies that the relative entropy functional has at most one local minimizer, which is then also a global one. Concerning this minimizer we have the following result. Theorem 4.3. Let P * be a (µ, u * )-Gibbs measure for some u * ∈ U 0 and µ ∈ R, and let ρ * be its density. Then the relative entropy function (4.5) attains its minimum for u = u * .
Proof. The given Gibbs measure P * has finite specific entropy S * , and hence, (2.6) implies that for every u ∈ U 0 . Since P * has density ρ * the chemical potential µ associated with P * must be given by µ = µ * (u * ) according to Theorem 4.1. From (4.8) and the Gibbs variational principle (Theorem A and Theorem 2.2) therefore follows that for every u ∈ U 0 \ {u * }, and this was to be shown. Note from (4.9) that the minimal value of Φ depends on the specific entropy of the target Gibbs measure, and is therefore unknown in general.
Remark 4.4. For a hard-core potential u ∈ U the density ρ of any associated (µ, u)-Gibbs measure is bounded from above by a finite closest-packing density ρ cp = ρ cp (u): This bound only depends on the hard-core radius r 0 of (2.2) and is a decreasing function of r 0 , cf. [9, Section 7]. We formally set ρ cp (u) = +∞ for u ∈ U 0 .
If the given density ρ * of the target happens to be below ρ cp (u) for a given u ∈ U , then Theorem 4.1 is still valid for this pair potential; to establish (4.3) in its proof, ε > 0 must be so small that (1 + ε)ρ * is still below ρ cp (u), because this is needed for [9, Lemma 7.1], and hence, for (3.3). The relative entropy functional (4.5), however, is only well-defined on the domain but this is again a convex set, compare Remark 2.1, and Φ happens to be strictly convex on D(Φ). Theorem 4.3 extends literally to every u * ∈ U with this understanding of the domain of Φ. ⋄ Theorem 4.3 is the basis for a variational setting of the inverse Henderson problem: If the density and the pair correlation function of a (µ * , u * )-Gibbs measure target P * are given, then the unique minimizer of the strictly convex relative entropy functional (4.5) with E(u, P * ) of (2.7) yields the corresponding pair potential u * , i.e., the solution of the inverse Henderson problem. Nevertheless, this approach also has some pitfalls as discussed in the following remark.
Remark 4.5. If the target P * fails to be some (µ, u)-Gibbs measure then it is not clear to us whether the relative entropy functional Φ will still be bounded from below on U , and even if it may, its infimum need not be attained on U .
Vice versa, if u * ∈ U happens to be the miminizer of Φ, then this does not imply that P * is a (µ * (u * ), u * )-Gibbs measure. To see the latter, consider the following example: Let u * be any hard-core potential in U and ρ * < ρ cp (u * ); compare Remark 4.4. Then, provided ρ * is sufficiently small, a result by Kuna, Lebowitz, and Speer [24,Corollary 4.3] states that there exist uncountably many distinct translation invariant probability measures with finite specific entropy and density ρ * , which share the pair correlation function ρ (2) * with the (µ * (u * ), u * )-Gibbs measure P, but P is the only (µ, u)-Gibbs measure among them according to the Henderson theorem. Because the specific interaction energy is given by (2.7) for all of these probability measures, the relative entropy functional does not differ, and hence, u * is its unique minimizer, regardless which of them has been the target P * .
Finally we mention that even if the target P * is a (µ, u * )-Gibbs measure, and hence, the functional (4.5) is minimized by u * according to Theorem 4.3, then this does not seem to imply that the model and the target have the same pair correlation function. To illustrate this, imagine that a fluid corresponding to a pair potential u * ∈ U exhibits a so-called triple point (compare, e.g., [17]), where three different phases coexist at the same thermodynamical state point, i.e., for the same values of pressure (or chemical potential µ * , say) and temperature (or inverse temperature β). It can be expected that the different phases have linearly independent pair correlation functions; taking convex combinations of the corresponding Gibbs measures one can thus determine two (µ * , u * )-Gibbs measures which have the same density ρ * , but have different pair correlation functions. One of them could be the target and the other one the minimizing model. We will see in Section 5 that if the density ρ * belongs to the gas phase of the minimizer of Φ then the pair correlation function of the minimizing Gibbs measure always coincides with the given ρ (2) * ; see Remark 5.5. ⋄ 5. Differentiability of the relative entropy functional. In the remainder of this paper we further analyze the case when Φ is a differentiable function of u. For this we build upon our earlier work [14,15]. * A conceptual difficulty in this context is the fact that U lacks a universal topology. Therefore, following [14], we consider a tailored neighborhood for any given u 0 ∈ U by introducing a corresponding Banach space V of perturbations, consisting of all even measurable functions v : R d → R, for which the associated norm is finite. Here, ψ 0 is the majorant ψ of (2.1) associated with u 0 . Note that the norm (5.1) is somewhat stronger than the one employed in [14] because it is more restrictive in the core region 0 < r < r 0 , and hence, the resulting space of perturbations is smaller. We believe that this restriction allows a more natural formulation of our results. Remark 5.1. It can easily be seen that for any v ∈ V the sum u 0 + v belongs to U : If u 0 is a hard-core potential, then r 0 is its hard-core radius and the corresponding function ϕ from (2.1) equals +∞, and we have Otherwise, there exists r 1 ∈ (0, r 0 ] such that ϕ(r) ≥ v V ψ 0 (0) for 0 < r ≤ r 1 , and therefore In either case this shows that u 0 + v ∈ U .

⋄
We now consider the ball around u 0 for a suitable radius δ 0 with 0 < δ 0 < 1; we further specify δ 0 in the context of (5.7) below. As has been verified in [14, Proposition 2.1], there are positive constants c β and B, such that every u ∈ B V (u 0 ) satisfies given by a certain infinite-dimensional matrix of integral operators, compare [38,14], and is Fréchet differentiable with respect to u ∈ B V (u 0 ). We fix and call the range µ ≤ µ 0 of chemical potentials the gas phase associated with the pair potentials in B V (u 0 ). In this gas phase the Kirkwood-Salsburg equations (5.5) have a unique solution ρ(µ, u), which can be developed in a converging Neumann series in X ; in particular, this means that for u ∈ B V (u 0 ) there exists only one (µ, u)-Gibbs measure for each chemical potential µ within the gas phase. The individual components ρ (m) (µ, u) of ρ(µ, u) are Fréchet differentiable (in L ∞ ) with respect to u, analytic with respect to µ < µ 0 and continuous in µ ≤ µ 0 . It is also easy to see that ρ is a C 1 function of µ ≤ µ 0 and u ∈ B V (u 0 ). Further, for fixed u ∈ B V (u 0 ) the density ρ = ρ(µ, u) of the associated Gibbs measure, i.e., the first entry of ρ(µ, u), is differentiable and strictly increasing in µ up to µ = µ 0 by virtue of Corollary 3.2. Since it is also continuous in u ∈ B V (u 0 ) we can choose δ 0 in (5.2) so small that for any given ε > 0, and then every density ρ ∈ (0, ρ 0 ) belongs to the gas phase of all pair potentials in B V (u 0 ). Finally, for fixed u ∈ B V (u 0 ), the pressure p(µ, u) is also differentiable and strictly increasing in µ up to µ = µ 0 , and its derivative is given by the density, cf. (4.1). Proposition 5.2. Let 0 < ρ * < ρ 0 with ρ 0 as in (5.7). Then the chemical potential µ * = µ * (u) defined in Theorem 4.1 is differentiable with respect to u ∈ B V (u 0 ) with derivative µ ′ * (u) ∈ V ′ given by where ∂ µ ρ and ∂ u ρ denote the partial derivatives of ρ(µ, u), and V ′ is the dual space of V . Proof. Since the pressure and the density of a fixed pair potential u ∈ B V (u 0 ) are strictly increasing and differentiable functions of the chemical potential in the gas phase, we can rewrite each of these functions as a strictly increasing function of any of the other ones. The pressure, for example, can be written as a function of density, which we call π to avoid any confusion, i.e., p(µ, u) = π ρ(µ, u), u .
It thus follows from (5.10) that ∂ µ ρ µ * (u), u > 0, so that the implicit function theorem is applicable to the equation near v = 0. From this we readily obtain (5.8); moreover, µ ′ * (u) belongs to V ′ because ∂ u ρ ∈ V ′ . Now we return to our analysis of the relative entropy functional (4.5). Theorem 5.3. Let u 0 ∈ U be a fixed pair potential and B V (u 0 ) ⊂ U be defined as in (5.2) with δ 0 so small that (5.7) holds. Furthermore, let the target P * of the relative entropy functional have density ρ * < ρ 0 , and assume that the associated specific interaction energy E( · , P * ) is bounded on for v ∈ V , where ρ (2) is the pair correlation function of the associated (µ * (u), u)-Gibbs measure.
Proof. Let u ∈ B V (u 0 ) and v ∈ V . We apply Corollary 3.2 to u 1 = u and u 2 = u + v with µ 1 = µ * (u) and µ 2 = µ * (u + v). This yields where ρ (2) denotes the pair correlation function of the corresponding (µ * (u+v), u+v)-Gibbs measure. It follows that i.e., that This shows that by virtue of (2.1). We finally observe that as v V → 0 because of Proposition 5.2 and the smoothness of ρ (2) as a function of µ and u. We have therefore established that as v V → 0, which yields (5.11). Since E( · , P * ) is bounded on B V (u 0 ), it is easy to see that E( · , P * ) ∈ V ′ , and hence, it follows in the same way as above that This concludes the proof. Remark 5.4. We emphasize that the dual space V ′ of V is a complicated Banach space. However, the representation (5.11), in combination with (2.7), reveals that if P * has a bounded and measurable pair correlation function ρ (2) * , then Φ ′ (u) can be identified with when using the natural dual pairing This is the case, for example, if P * is a (µ, u * )-Gibbs measure for some u * ∈ U (with µ = µ * (u * )), and then (5.12) can further be rewritten as where It follows from (5.17) below that ∇Φ(u) then also belongs to L 1 (R d ), and hence, to L 2 (R d ) as well. Note that V is embedded in L 2 (R) for the same reason. We refer to Section 7 for further arguments which support the choice of the L 2 -pairing. ⋄ Remark 5.5. Assume that the target P * has a bounded and measurable pair correlation function ρ (2) * , and that the relative entropy functional Φ is minimized by some u * ∈ U . Assume further that the given density ρ * belongs to the gas phase of u * . Then it follows from Theorem 5.3 that Φ is differentiable at u * , and that ∇Φ(u * ) = 0. From (5.12) we thus conclude that the target and the model share the same pair correlation function in this case; compare Remark 4.5. ⋄ Remark 5.6. For a (µ, u)-Gibbs measure the cluster functions, sometimes also called Ursell functions (e.g., Stell [43], and [15]), or truncated correlation functions (e.g. Jansen [21]), are defined recursively by the constant function ω (1) = ρ and, for m ≥ 2, by cf. [38, p. 87], where Π m k denotes the set of partitions π of x 1 , . . . , x m into k subsets x π1 , . . . , x π k . Note that it immediately follows from (5.15) that each cluster function inherits the translation and permutation invariance of the correlation functions.
For m = 2 we recover from (5.15) the definition (5.14), and for fixed µ and u we adopt the short-hand notation for ω (2) (x, 0) from the pair correlation function. Note that for every x ∈ R d (5. 16) because of the translation invariance. We will make repeatedly use of [38,Theorem 4.4.8] which states that provided that u ∈ U and µ ≤ µ 0 ; see also Lemma A.4 in the appendix. In particular, (5.17) shows that ω (2) ∈ L 1 (R d ).

⋄
Based on Theorem 5.3 we now compute the Hessian of Φ. Theorem 5.7. Under the assumptions of Theorem 5.3 the relative entropy functional (4.5) is twice differentiable, and its Hessian at u ∈ B V (u 0 ) is given by in terms of the dual pairing (5.13) of L 2 (R d ), where v, v ∈ V , and the partial derivatives ∂ u ρ and ∂ u ρ (2) are evaluated at the pair potential u and the chemical potential µ * (u). Proof. We differentiate (5.11) with respect to u in the direction v ∈ V . Keeping in mind that ρ (2) in (5.11) stands for ρ (2) (µ * (u), u) the chain rule gives where we have adopted the notation (5.13), and the partial derivatives of ρ (2) -which belong to L ∞ (R d ) -are evaluated at the pair potential u and the chemical potential µ * (u).
To determine ∂ µ ρ (2) (µ, u) for an arbitrary chemical potential µ < µ 0 we apply Corollary 3.2 with u 1 = u and u 2 = u + w for any w ∈ V , and with µ 1 = µ 2 = µ: As in the proof of Theorem 5.3 we conclude that the pressure is differentiable with respect to u with derivative ∂ u p(µ, u) ∈ V ′ given by where ρ (2) is the pair correlation function of the (µ, u)-Gibbs measure. Schwarz's theorem and (4.1) therefore yield Inserting this into (5.19) we obtain and the assertion (5.18) thus follows from Proposition 5.2. Note that Φ ′′ (u) is a continuous bilinear form: This follows readily from the result in [14] that ∂ u ρ ∈ V ′ and ∂ u ρ (2) ∈ L V , L ∞ (R d ) . We further observe that Φ ′′ is continuous in u because ρ and ρ (2) are C 1 functions of µ and u. This implies that Φ ′′ (u) is a symmetric bilinear form. Finally, this bilinear form is positive semidefinite, because Φ is strictly convex.
6. The connection to the IMC iterative scheme. In the preceding section we have seen that the relative entropy functional is twice differentiable at any u 0 ∈ U , provided that the target density ρ * belongs to its gas phase. The convex quadratic approximation valid for small enough v ∈ V , attains its minimum for every solution v 0 ∈ V of the variational problem 2) and the well-known Newton scheme for minimizing Φ consists in updating u 0 via to obtain a (hopefully) better approximation of the global unique minimizer of Φ. Unfortunately, however, we only know that Φ ′′ (u 0 ) is semidefinite; it is therefore not clear whether (6.2) has a solution, and if it does, whether it is unique. The bilinear form Φ ′′ (u 0 ) defines a linear operator A : V → V ′ via the identity As in Remark 5.4 the dual form on the right-hand side can be replaced by the dual pairing induced by L 2 (R d ), when A is identified with the (bounded) operator where ∇ u ρ ∈ L ∞ (R d ) denotes the representative of the functional ∂ u ρ ∈ V ′ for this pairing. This is an immediate consequence of Theorem 5.7. According to [15,Section 6] there holds is given in terms of the second and third cluster functions associated with µ and u; compare Remark 5.6. Using the translation and permutation invariance of ω (3) this can be rewritten as Note that it has been shown in [15,Proposition 4.2] that the integral in (6.5) is absolutely convergent and that it defines a bounded function of x ∈ R d . Since ω (2) ∈ L 1 (R d ) it thus follows from (6.6) that and inserting (6.6) into (6.5), we finally arrive at the representation which will be used later on. Now we introduce the (nonlinear) "Henderson operator" F : u → ρ (2) µ * (u), u , (6.9) which maps u ∈ B V (u 0 ) to its associated pair correlation function at the given density ρ * , and which is an injective operator according to the Henderson theorem. Since F (u) + 2∇Φ(u) is independent of u according to (5.11), see also (5.12), it follows that This shows that if the target P * has a pair correlation function ρ (2) * ∈ L ∞ (R d ) then the variational problem (6.2) is equivalent to the linear operator equation The step (6.3) for minimizing Φ with Newton's method is thus equivalent to one iteration of the IMC method, where the update v 0 is determined by solving (6.11). This is the thermodynamic limit analog of the observation in [29,36], referred to in the introduction. Remark 6.1. As follows from Theorem 5.3 and (2.7) the Henderson operator satisfies v, F (u 0 ) = 2 E(v, P 0 ) , where P 0 denotes the corresponding (µ * (u 0 ), u 0 )-Gibbs measure. In the IMC scheme, as introduced in [26], this interaction energy is approximated by the corresponding expectation value of the canonical ensemble corresponding to the pair potential u 0 at density ρ * in the bounded box Λ ⊂ R d , where V is the observable associated with the perturbation v ∈ V , compare (2.3). Likewise, the derivative F ′ (u 0 ) is approximated in the IMC scheme via the cross correlation matrix It has to be emphasized that the density constraint ρ(u) = ρ * is inherently built into the canonical ensemble. Accordingly, this approximation of F ′ (u 0 ) does respect this constraint. When working in the grand canonical ensemble instead (or in the thermodynamic limit) this constraint necessitates a correction term to project the unconstrained derivative into the tangent plane of the manifold of pair correlation functions corresponding to particle systems with the correct density. This is the reason for the first term on the right-hand side of (5.18) which does not (and need not) occur in the original IMC scheme. ⋄ 7. Extension of the Hessian to a semidefinite bilinear form on L 2 (R d ).
In the remainder of this paper we study the Hessian of the relative entropy functional somewhat further. More precisely, we examine the mapping properties of the Jacobian of the Henderson map, which is connected to the Hessian Φ ′′ via (6.10).
To begin with, recall from Remark 5.4 that the right-hand side of (6.11) belongs to L ∞ (R d ) ∩ L 1 (R d ), when the target P * is a (µ, u * )-Gibbs measure. As our first result of this section we show that F ′ (u 0 ) has matching mapping properties.
Proof. We already know from Section 6 that F ′ (u 0 ) = −2A ∈ L (V , L ∞ (R d )). It thus remains to establish that F ′ (u 0 ) ∈ L (V , L 1 (R d )). To this end we rewrite the Henderson operator in terms of the cluster function (5.14), i.e., which yields where µ * = µ * (u 0 ). The partial derivative of the cluster function with respect to the chemical potential is given by From (5.20) and (6.8) we conclude that and hence, together with Lemma A.2 it follows that In particular, this shows that ∂ µ ω (2) ∈ L 1 (R d ) by virtue of (5.17). Together with Proposition 5.2 and (7.1) we thus obtain because it has been proved in [15] that ∂ u ω (2) ∈ L (V , L 1 (R d )). This shows that As V is a dense subspace of L 2 (R d ), and because we have repeatedly identified the dual pairing V × V ′ with the dual pairing of L 2 (R d ), this raises the question whether F ′ (u 0 ) extends to a bounded operator in L 2 (R d ), or, in terms of the relative entropy functional, whether Φ ′′ (u 0 ) extends to a quadratic form on L 2 (R d ). This will be answered to the affirmative in the remainder of this section.
Lemma 7.2. Under the assumptions of Theorem 5.3 the Jacobian of the Henderson map is given by for v ∈ V and x ∈ R d . Proof. According to (6.4) we have by virtue of [15,Eq. (6.9)]. Concerning the nested integral in the second line of (7.5) it has further been shown in [15,Proposition 4.3] that the inner integral converges absolutely and defines an L ∞ function of x and x ′ . It follows that We now rewrite the individual terms of this kernel function. To begin with, we recall from (5.20) and (7.2) that The partial derivatives with respect to µ in the final three terms of the right-hand side can be eliminated with the help of (7.3) and Lemma A.2: This yields Concerning the second term of the integral kernel (7.7) we apply the definition (5.15) of the cluster functions to rewrite where we have also used (5.16).
In the same way we can rewrite the integrand of the integral in (7.7) in terms of the cluster functions: where we have used in the second step that the cluster functions are translation and permutation invariant. Integrating over x ′′ ∈ R d we thus obtain where all the terms on the right-hand side are bounded functions of x and x ′ : This follows from (6.7) and the fact that ω (2) Inserting (7.8), (7.9), and (7.10) into (7.7) we conclude that Note that the final two terms of this integral kernel representation define the same convolution operator because every v ∈ V is an even function. The assertion (7.4) thus follows readily from (7.6). Now we can prove the main result of this section. Theorem 7.3. Under the assumptions of Theorem 5.3 the operator F ′ (u 0 ) extends to a selfadjoint negative semidefinite operator on L 2 (R d ).
Proof. We already know from Theorem 7.1 that F ′ (u 0 ) ∈ L (V , L 2 (R d )). Since V is a dense subset of L 2 (R d ), it remains to discuss the continuity from L 2 (R d ) to L 2 (R d ) of each of the terms defined in lines (7.4a)-(7.4e) of Lemma 7.2.
For the multiplication operator in (7.4a) this is true because the pair correlation function ρ (2) is bounded. The autoconvolution ω (2) * ω (2) of the cluster function ω (2) ∈ L 1 (R d ) belongs to L 1 (R d ) again, hence the two convolution operators in (7.4b) are bounded operators in L 2 (R d ). From (7.3) it follows that ∂ µ ω (2) belongs to L 1 (R d ) ∩ L ∞ (R d ): the L 1 property has been verified in the argument following (7.3); the L ∞ property of the second term in (7.3) has been established in (6.7). This shows that ∂ µ ω (2) ∈ L 2 (R d ), and the continuity of the operator in (7.4c) is therefore a consequence of the Cauchy-Schwarz inequality. By virtue of (5.17) the cluster function ω (3) ( · , 0, · ) belongs to L 1 (R d × R d ). Since it is bounded, it also belongs to L 2 (R d × R d ). Therefore, (7.4d) defines a compact operator in L (L 2 (R d )). Concerning (7.4e) we have already pointed out after (7.10) that the inner integral of ω (4) is a bounded function of x and x ′ . It also belongs to L 1 (R d × R d ) by virtue of (5.17), again. The continuity of the operator defined in (7.4e) therefore follows in the same way as in the previous case.
In summary this shows that F ′ (u 0 ) extends to an operator in L (L 2 (R d )). By virtue of (6.10) this extension is selfadjoint and negative semidefinite.
Concerning the Hessian of the relative entropy functional we readily conclude from (6.10) the following corollary.
Corollary 7.4. Under the assumptions of Theorem 5.3 the Hessian Φ ′′ (u 0 ) of the relative entropy functional (4.5) defines a symmetric and positive semidefinite bilinear form on L 2 (R d ).
As indicated before we leave it as an open problem whether F ′ (u 0 ) is injective, i.e., whether the local quadratic approximation (6.1) of the relative entropy functional is strictly convex.
8. Lennard-Jones type pair potentials. In this final section we establish a stronger regularity result for the derivative of the Henderson map, respectively the Hessian of the relative entropy functional, which is valid when the potential u 0 ∈ U satisfies (2.2) with a majorant of the form ψ 0 (r) = C(1 + r 2 ) −α/2 (8.1) for some C > 0 and α > d. This class of potentials includes the so-called potentials of Lennard-Jones type, cf. [38]. It it known that in the gas phase the cluster function ω (2) corresponding to such a pair potential satisfies the same rate of decay near infinity as ψ 0 does. This entails the following result.
Proof. The aforementioned result about the cluster function states that ω (2) ∈ V under the given assumptions; compare Lemma A.4 in the appendix. Now we discuss the continuity of each of the operators corresponding to the individual terms in (7.4). Continuity obviously holds for the multiplication operator in (7.4a), because ρ (2) is bounded. Concerning the convolutions in (7.4b) we refer to [16,Proposition 4.1] for the fact that the convolution is a continuous bilinear mapping from V ×V to V . Accordingly, the two terms in (7.4b) also define operators in L (V ).
For the remaining terms we make use of Lemma A.4 again. For m = 3 this auxiliary result gives and since Much the same argument applies to the term in (7.4e): for every x ∈ R d , and the right-hand side again belongs to V as a function of x by virtue of Lemma A.4 for m = 4. Since V is continuously embedded in L ∞ (R d ) this shows that the two terms in (7.4d) and (7.4e) correspond to operators in L (V ). Finally, concerning the term in (7.4c) we conclude from (7.3), (8.2), and the fact that ω (2) ∈ V that ∂ µ ω (2) belongs to V as well. Accordingly, this term also represents a bounded operator from V to V , and the proof is done.
Note that if the target P * and the model P 0 are Gibbs measures corresponding to pair potentials u * and u 0 in U and their associated chemical potentials, where u * and u satisfy (2.2) for the same majorant ψ = ψ 0 given by (8.1), and if the density ρ * belongs to the gas phase of both pair potentials, then the cluster functions ω (2) * and ω (2) 0 of P * and P 0 , respectively, both belong to V . It thus follows from Remark 5.4 that the gradient ∇Φ(u 0 ) of the relative entropy functional belongs to V as well, and therefore the Newton equation (6.11) of the IMC iterative scheme is an operator equation in V by virtue of Theorem 8.1. As shown in [16, Remark 6.5], a valid choice u 0 ∈ U , which satisfies (2.2) for the same majorant (8.1), is given by the potential of mean force, provided the density ρ * is sufficiently small. Moreover, u * − u 0 belongs to V for this initial guess.
Appendix. In this appendix we provide some auxiliary results. In doing so we will repeatedly make use of the short-hand notation The first result is needed for the proof of Theorem 4.1, where we construct a (µ, u)-Gibbs measure for a pair potential u ∈ U with prescribed density ρ * < ρ cp (u).
Lemma A.1. Let u ∈ U and assume that the sequence (µ k ) k ⊂ R satisfies µ k → µ ∈ R as k → ∞. Furthermore, for every k ∈ N let P k be a (µ k , u)-Gibbs measure. Then (P k ) k has an accumulation point with respect to the local topology, and every such accumulation point is a (µ, u)-Gibbs measure.
Proof. Let (ρ (m) k ) m be the sequence of correlation functions of P k . Since the sequence (µ k ) k is bounded, the measures P k satisfy a uniform Ruelle condition by virtue of [39,Corollary 5.3], i.e., there exists ξ > 0, such that This implies that for fixed m ∈ N the sequence (ρ (m) k ) k of correlation functions has a weak * convergent subsequence in L ∞ ((R d ) m ) with limit ρ (m) , say. With a diagonalisation argument we can thus find a subsequence of indices k ∈ N, for which holds for every m ∈ N. This is equivalent to saying that (P k ) k has a subsequence which converges to some probability measure P in the local topology. To simplify the notation we assume in the sequel that the entire sequence (P k ) k is convergent. Since every P k is a Gibbs measure, the respective correlation functions satisfy the Kirkwood-Salsburg equations, cf., e.g., [ is the well-known Mayer-function, and the latter being taken to be zero for m = 0. Note that the right-hand side of (A.3) converges in L ∞ ((R d ) m+1 ) by virtue of the Ruelle condition (A.1) and the fact that the Mayer function (A.4) belongs to L 1 (R d ); compare (5.3). Moreover, since the Ruelle condition is uniform in k, this convergence is also uniform with respect to k, and hence, the weak * convergence (A.2) and the convergence µ k → µ imply that the Kirkwood-Salsburg equations (A.3) also hold true for the limiting correlation functions (ρ (m) ) m . It thus follows from [39,Corollary 5.3] again that P is a tempered (µ, u)-Gibbs measure. Finally, since ρ (m) k is translation invariant for every k ∈ N and every m ∈ N, its weak * limits ρ (m) must also be translation invariant, and hence P is translation invariant.
The remaining two results of the appendix concern properties of the cluster functions (see Remark 5.6) within the gas phase. The proofs make use of their cluster expansions (compare [38,Section 4.4] and [43]) where f is, again, the Mayer function (A.4), and C n denotes the set of undirected connected graphs with vertices {1, . . . , n} and edges (i, j) ∈ C, which connect vertices i and j; for n = m the integral in (A.5) has to be replaced by the value φ (m) (x m ) of its integrand. These cluster expansions are absolutely convergent for u 0 ∈ U and µ ≤ µ 0 with µ 0 of (5.6). Lemma A.2. Let u 0 ∈ U and µ < µ 0 with µ 0 as in (5.6). Then the density ρ of the corresponding Gibbs measure is differentiable with respect to µ and the derivative is given by f (x i − x j ) dx 2 · · · dx n .
Differentiating with respect to µ we obtain Using the short-hand notation for ω (2) and (5.16), the assertion follows.
We refer to [3,4,27] for further results on the decay of the higher order cluster functions.