1 Introduction and motivations

Monte-Carlo methods are widely used in theoretical physics, statistical mechanics and condensed matter (for an overview, see e.g. [1]). Since the inception of the field [2], most of the applications have relied on importance sampling, which allows us to evaluate stochastically with a controllable error multi-dimensional integrals of localised functions. These methods have immediate applications when one needs to compute thermodynamic properties, since statistical averages of (most) observables can be computed efficiently with importance sampling techniques. Similarly, in lattice gauge theories, most quantities of interest can be expressed in the path integral formalism as ensemble averages over a positive-definite (and sharply peaked) measure, which, once again, provide an ideal scenario for applying importance sampling methods.

However, there are noticeable cases in which Monte-Carlo importance sampling methods are either very inefficient or produce inherently wrong results for well understood reasons. Among those cases, some of the most relevant situations include systems with a sign problem (see [3] for a recent review), direct computations of free energies (comprising the study of properties of interfaces), systems with strong metastabilities (for instance, a system with a first order phase transition in the region in which the phases coexist) and systems with a rough free energy landscape. Alternatives to importance sampling techniques do exist, but generally they are less efficient in standard cases and hence their use is limited to ad hoc situations in which more standard methods are inapplicable. Noticeable exceptions are micro-canonical methods, which have experienced a surge in interest in the past 15 years. Most of the growing popularity of those methods is due to the work of Wang and Landau [4], which provided an efficient algorithm to access the density of states in a statistical system with a discrete spectrum. Once the density of states is known, the partition function (and from it all thermodynamic properties of the system) can be reconstructed by performing one-dimensional numerical integrals. Histogram-based straightforward generalisations of the Wang–Landau algorithm to models with a continuum spectrum have been shown to break down even on systems of moderate size [5, 6], hence more sophisticate techniques have to be employed, as done for instance in [7], where the Wang–Landau method is used to compute the weights for a multi-canonical recursion (see also [8]).

A very promising method, here referred to as the logarithmic linear relaxation (LLR) algorithm, was introduced in [9]. The potentialities of the method were demonstrated in subsequent studies of systems afflicted by a sign problem [10, 11], in the computation of the Polyakov loop probability distribution function in two-colour QCD with heavy quarks at finite density [12] and—rather unexpectedly—even in the determination of thermodynamic properties of systems with a discrete energy spectrum [13].

The main purpose of this work is to discuss in detail some improvements of the original LLR algorithm and to formally prove that expectation values of observables computed with this method converge to the correct result, which fills a gap in the current literature. In addition, we apply the algorithm to the study of compact U(1) lattice gauge theory, a system with severe metastabilities at its first order phase transition point that make the determination of observables near the transition very difficult from a numerical point of view. We find that in the LLR approach correlation times near criticality grow at most quadratically with the volume, as opposed to the exponential growth that one expects with importance sampling methods. This investigation shows the efficiency of the LLR method when dealing with systems having a first order phase transition. These results suggest that the LLR method can be efficient at overcoming numerical metastabilities in other classes of systems with a multi-peaked probability distribution, such as those with rough free energy landscapes (as commonly found, for instance, in models of protein folding or spin glasses).

The rest of the paper is organised as follows. In Sect. 2 we cover the formal general aspects of the algorithm. The investigation of compact U(1) lattice gauge theory is reported in Sect. 3. A critical analysis of our findings, our conclusions and our future plans are presented in Sect. 4. Finally, some technical material is discussed in the appendix. Some preliminary results of this study have already been presented in [14].

2 Numerical determination of the density of states

2.1 The density of states

Owing to formal similarities between the two fields, the approach we are proposing can be applied to both statistical mechanics and lattice field theory systems. In order to keep the discussion as general as possible, we shall introduce notations and conventions that can describe simultaneously both cases. We shall consider a system described by the set of dynamical variables \(\phi \), which could represent a set of spin or field variables and are assumed to be continuous. The action (in the field theory case) or the Hamiltonian (for the statistical system) is indicated by S and the coupling (or inverse temperature) by \(\beta \). Since the product \(\beta S\) is dimensionless, without loss of generality we will take both S and \(\beta \) dimensionless.

We consider a system with a finite volume V, which will be sent to infinity in the final step of our calculations. The finiteness of V in the intermediate steps allows us to define naturally a measure over the variables \(\phi \), which we shall call \(\mathcal{D} \phi \). Properties of the system can be derived from the function

$$\begin{aligned} Z(\beta )=\int \mathcal{D} \phi \; \mathrm {e}^{\beta S[\phi ]}, \end{aligned}$$

which defines the canonical partition function for the statistical system or the path integral in the field theory case. The density of state (which is a function of the value of \(S[\phi ] = E\)) is formally defined by the integral

$$\begin{aligned} \rho (E) = \int \mathcal{D} \phi \; \delta \Bigl ( S[\phi ]-E \Bigr ) . \end{aligned}$$
(2.1)

In terms of \(\rho (E)\), Z takes the form

$$\begin{aligned} Z(\beta )=\int \mathrm{d}E \; \rho (E) \; \mathrm {e}^{\beta E}\,. \end{aligned}$$

The vacuum expectation value (or ensemble average) of an observable O which is a function of E can be written asFootnote 1

$$\begin{aligned} \langle \mathcal {O}\rangle =\frac{1}{Z(\beta )}\int \mathrm{d}E \; \mathcal {O}(E) \; \rho (E) \; \mathrm {e}^{\beta E}\,. \end{aligned}$$
(2.2)

Hence, a numerical determination of \(\rho (E)\) would enable us to express Z and \(\langle O \rangle \) as numerical integrals of known functions in the single variable E. This approach is inherently different from conventional Monte-Carlo calculations, which relies on the concept of importance sampling, i.e. the configurations contributing to the integral are generated with probability

$$\begin{aligned} P_\beta (E) \; = \; \rho (E) \; \mathrm {e}^{\beta E} / Z(\beta ) \,. \end{aligned}$$

Owing to this conceptual difference, the method we are proposing can overcome notorious drawbacks of importance sampling techniques.

2.2 The LLR method

We will now detail our approach to the evaluation of the density of states by means of a lattice simulation. Our initial assumption is that the density of states is a regular function of the energy that can be always approximated in a finite interval by a suitable functional expansion. If we consider the energy interval \([E_k,E_k+\delta _E]\), under the physically motivated assumption that the density of states is a smooth function in this interval, the logarithm of the latter quantity can be written, using Taylor’s theorem, as

$$\begin{aligned} \ln \, \rho (E)= & {} \ln \, \rho \left( E_k +\frac{\delta _E}{2} \right) \; + \; \frac{ \mathrm{d}\;\ln \, \rho }{\mathrm{d}E } \Big \vert _{E=E_k+\delta _E/2} \;\nonumber \\&\times \left( E- E_k -\frac{\delta _E}{2} \right) \; + \; R_k(E) , \\ R_k(E)= & {} \frac{1}{2} \, \frac{ \mathrm{d}^2 \ln \, \rho }{\mathrm{d}E^2 } \Big \vert _{E_k +\delta _E/2} \; \left( E- E_k -\frac{\delta _E}{2} \right) ^2 \; + \; \mathcal{O}(\delta _E^3).\nonumber \end{aligned}$$
(2.3)

Thereby, for a given action E, the integer k is chosen such that

$$\begin{aligned} E _k \le E \le E_k \, + \, \delta _E, \quad E_k \; = \; E_0 \; + \; k \, \delta _E. \end{aligned}$$

Our goal will be to devise a numerical method to calculate the Taylor coefficients

$$\begin{aligned} a_k := \frac{ \mathrm{d} \ln \, \rho }{\mathrm{d}E } \Big \vert _{E=E_k+\delta _E/2} \end{aligned}$$
(2.4)

and to reconstruct from these an approximation for the density of states \(\rho (E)\). By introducing the intrinsic thermodynamic quantities, \(T_k\) (temperature) and \(c_k\) (specific heat) by

$$\begin{aligned} \frac{ \mathrm{d} \ln \, \rho }{\mathrm{d}E } \Big \vert _{E=E_k+\delta _E/2}= & {} \frac{1}{T_k} \; = \; a_k , \nonumber \\ \quad \frac{ \mathrm{d}^2 \ln \, \rho }{\mathrm{d}E^2 } \Big \vert _{E=E_k+\delta _E/2}= & {} - \frac{1}{T_k^2 \, c_k} \, \frac{1}{V} . \end{aligned}$$
(2.5)

we expose the important feature that the target coefficients \(a_k\) are independent of the volume while the correction \(R_k(E) \) is of order \(\delta _E^2/V\). In all practical applications, \(R_k\) will be numerically much smaller than \(a_k \, \delta _E\). For a certain parameter range (i.e., for the correlation length smaller than the lattice size), we can analytically derive this particular volume dependence of the density derivatives. Details are left to the appendix.

Using the trapezium rule for integration, we find in particular

$$\begin{aligned} \ln \, \frac{ \rho (E_{k+1} +\delta _E/2) }{ \rho (E_k +\delta _E/2)}= & {} \int _{E_k+\frac{\delta _E}{2}}^{E_{k+1}+\frac{\delta _E}{2}} \frac{ d \, \ln \rho }{\mathrm{d}E} \; \mathrm{d}E \nonumber \\= & {} \frac{ \delta _E}{2} \, [ a_k + a_{k+1} ] \; + \; \mathcal{O}(\delta _E^3) . \end{aligned}$$
(2.6)

Using this equation recursively, we find

$$\begin{aligned} \ln \frac{ \rho (E_N +\frac{\delta _E}{2}) }{\rho (E_0 +\frac{\delta _E}{2})}= & {} \frac{a_0}{2} \, \delta _E\; + \; \sum _{k=1}^{N-1} a_k \, \delta _E\;\nonumber \\&+ \; \frac{a_N}{2} \, \delta _E\; + \; \mathcal{O}(\delta _E^2) . \end{aligned}$$
(2.7)

Note that \(N \, \delta _E= \mathcal{O}(1)\). Exponentiating (2.3) and using (2.7), we obtain

$$\begin{aligned} \rho (E)= & {} \rho \left( E_N +\frac{\delta _E}{2}\right) \, \exp \Bigl \{ a_N \, (E-E_N -\delta _E/2) + \mathcal{O}(\delta _E^2) \Bigr \} \nonumber \\\end{aligned}$$
(2.8)
$$\begin{aligned}= & {} \rho _0 \left( \prod _{k=1}^{N-1} \mathrm {e}^{a_k \delta _E} \right) \; \exp \left\{ a_N \, \left( E -E_N \right) \; + \; \mathcal{O}(\delta _E^2) \right\} , \nonumber \\ \end{aligned}$$
(2.9)

where we have defined an overall multiplicative constant by

$$\begin{aligned} \rho _0 \; = \; \rho \left( E_0 +\frac{\delta _E}{2}\right) \, \mathrm {e}^{a_0 \delta _E/2} . \end{aligned}$$

We are now in the position to introduce the piecewise-linear and continuous approximation of the density of states by

$$\begin{aligned} \tilde{\rho } (E) \; = \; \rho _0 \left( \prod _{k=1}^{N-1} \mathrm {e}^{a_k \delta _E} \right) \; \mathrm {e}^{a_N (E-E_N) } , \end{aligned}$$
(2.10)

with N chosen in such a way that \( E_N \le E < E_N + \delta _E\) for a given E. With this definition, we obtain the remarkable identity

$$\begin{aligned} \rho (E) \; = \; \tilde{\rho } \left( E \right) \; \exp \Bigl \{ \mathcal{O}(\delta _E^2) \Bigr \} \; = \; \tilde{\rho } \left( E \right) \Bigl [ 1 \; + \; \mathcal{O}(\delta _E^2) \Bigr ] , \end{aligned}$$
(2.11)

which we will extensively use below. We will observe that \(\rho (E)\) spans many orders of magnitude. The key observation is that our approximation implements exponential error suppression, meaning that \(\rho (E)\) can be approximated with nearly constant relative error despite that it may reach over thousands of orders of magnitude:

$$\begin{aligned} 1 - \frac{\tilde{\rho }(E)}{\rho (E)} \; = \; \mathcal{O}\left( \delta _E^2 \right) . \end{aligned}$$
(2.12)

We will now present our method to calculate the coefficients \(a_k\). To this aim, we introduce the action restricted and re-weighted expectation values [9] with a being an external variable:

$$\begin{aligned}&\left\langle \left\langle W[\phi ] \right\rangle \right\rangle _{k} (a) = \frac{1}{\mathcal{N}_k} \int \mathcal{D} \phi \; \theta _{[E_k,\delta _E]}(S[\phi ]) \; W[\phi ] \; \, \; \; \mathrm {e}^{-a S[\phi ] } , \nonumber \\\end{aligned}$$
(2.13)
$$\begin{aligned}&\mathcal{N}_k = \int \mathcal{D} \phi \; \theta _{[E_k,\delta _E]} \; \, \; \; \mathrm {e}^{-a S[\phi ] } \; = \; \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \, \rho (E) \,\mathrm {e}^{-aE} , \nonumber \\ \end{aligned}$$
(2.14)

where we have used (2.1) to express \(\mathcal{N}_k\) as an ordinary integral. We also introduced the modified Heaviside function,

$$\begin{aligned} \theta _{[E_k,\delta _E]} (S) \; = \; \left\{ \begin{array}{l@{\quad }l} 1 &{} \hbox {for} \; \; \; E_k \le S \le E_k + \delta _E\\ 0 &{} \hbox {otherwise . } \end{array} \right. \end{aligned}$$

If the observable only depends on the action, i.e., \( W[\phi ] = O(S[\phi ])\), (2.13) simplifies to

$$\begin{aligned} \left\langle \left\langle O \right\rangle \right\rangle _{k} (a) \; = \; \frac{1}{\mathcal{N}_k} \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \, \rho (E) \; O(E) \; \,\mathrm {e}^{-aE} , \end{aligned}$$
(2.15)

Let us now consider the specific action observable

$$\begin{aligned} \Delta E = S - E_k - \frac{\delta _E}{2} , \end{aligned}$$
(2.16)

and the solution \(a^*\) of the non-linear equation

$$\begin{aligned} \left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a=a^*) \; = \; 0 . \end{aligned}$$
(2.17)

Inserting \(\rho (E)\) from (2.8) into (2.15) and defining \(\Delta a = a_k - a\), we obtain

$$\begin{aligned}&\left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a)\nonumber \\&\quad = \frac{ \rho (E_k+\delta _E/2) \, \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; (E- E_k - \delta _E/2) \; \mathrm {e}^{ \Delta a \, (E-E_k) } \; \mathrm {e}^{\mathcal{O}(\delta _E^2)} }{ \rho (E_k+\delta _E/2) \, \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; \mathrm {e}^{ \Delta a \, (E-E_k) } \; \mathrm {e}^{\mathcal{O}(\delta _E^2)} } \nonumber \\&\quad = \frac{ \int _{E_k}^{E_k+\delta _E} {\mathrm{d}E} \; (E- E_k - \delta _E/2) \; \mathrm {e}^{ \Delta a \, (E-E_k) } }{ \int _{E_k}^{E_k+\delta _E} {\mathrm{d}E} \; \mathrm {e}^{ \Delta a \, (E-E_k) } \; } \; \end{aligned}$$
(2.18)

Let us consider for the moment the function

$$\begin{aligned} F(\Delta a) \; := \; \frac{ \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; (E- E_k - \delta _E/2) \; \mathrm {e}^{ \Delta a \, (E-E_k) } }{ \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; \mathrm {e}^{ \Delta a \, (E-E_k) } \; } . \end{aligned}$$

It is easy to check that F is monotonic and vanishing for \(\Delta a=0\):

$$\begin{aligned} F^\prime (\Delta a) \; > \; 0 , \quad F(\Delta a=0) \; = \; 0 . \end{aligned}$$

Since (2.18) approximates \(\left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a)\) up to \(\mathcal{O}(\delta _E^2)\), we conclude that

$$\begin{aligned} a^* \; = \; a_k \; +\;\mathcal{O}\Bigl (\delta _E^2\Bigr ) = \; \frac{ \mathrm{d} \ln \, \rho }{\mathrm{d}E }\Big \vert _{E=E_k +\frac{\delta _E}{2}}\; + \; \mathcal{O}\Bigl (\delta _E^2\Bigr ) \; . \end{aligned}$$
(2.19)

The latter equation is at the heart of the LLR algorithm: it details how we can obtain the log-rho derivative by calculating the Monte-Carlo average \(\left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a)\) (using (2.13)) and solving a non-linear equation, i.e. (2.17) with the identification \(a^* \equiv a_k\) (justified by the order of our approximation).

In the following, we will discuss the practical implementation by addressing two questions: (i) How do we solve the non-linear equation? (ii) How do we deal with the statistical uncertainty since the Monte-Carlo method only provides stochastic estimates for the expectation value \(\left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a)\)?

Let us start with the standard Newton–Raphson method to answer question (i). Starting from an initial guess \(a^{(0)}_k\) for the solution, this method produces a sequence

$$\begin{aligned} a^{(0)}_{k} \; \rightarrow \; a^{(1)}_{k} \rightarrow \; a^{(2)}_k \rightarrow \; \cdots \; \rightarrow \; a^{(n)}_{k} \rightarrow \; a^{(n+1)}_{k} \cdots , \end{aligned}$$

which converges to the true solution \(a_k\). Starting from \(a^{(n)}_k\) for the solution, we would like to derive an equation that generates a value \(a^{(n+1)}_k\) that is even closer to the true solution:

$$\begin{aligned}&\left\langle \left\langle \Delta E \right\rangle \right\rangle _k \Bigl (a^{(n+1)}_k\Bigr ) \; = \; \left\langle \left\langle \Delta E \right\rangle \right\rangle _k \Bigl ( a^{(n)}_k \Bigr ) \;\nonumber \\&\quad + \; \frac{d}{da} \left\langle \left\langle \Delta E \right\rangle \right\rangle _k \Bigl ( a^{(n)}_k \Bigr ) \; \Bigl ( a^{(n+1)}_k - a^{(n)}_k \Bigr ) \; = \; 0 . \end{aligned}$$
(2.20)

Using the definition of \(\left\langle \left\langle \Delta E \right\rangle \right\rangle _k \Bigl (a^{(n+1)}\Bigr ) \) in (2.18) with reference to (2.16) and (2.15), we find

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}a} \left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a)= & {} - \; \Bigl [ \left\langle \left\langle \Delta E^2 \right\rangle \right\rangle _k (a) \; - \; \left\langle \left\langle \Delta E \right\rangle \right\rangle _k^2 (a) \, \Bigr ] \;\nonumber \\=: & {} - \; \sigma ^2 (\Delta E; a) . \end{aligned}$$
(2.21)

We thus find for the improved solution:

$$\begin{aligned} a^{(n+1)}_k \; = \; a^{(n)}_k \; + \; \frac{ \left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a^{(n)}_k)}{ \sigma ^2 (\Delta E; a^{(n)}_k)} . \end{aligned}$$
(2.22)

We can convert the Newton–Raphson recursion into a simpler fixed point iteration if we assume that the choice \(a^{(n)}_k\) is sufficiently close to the true value \(a_k\) such that

$$\begin{aligned} \delta _E\; \Bigl ( a^{(n)}_k - a_k \Bigr ) \; \ll \; 1 . \end{aligned}$$

Without affecting the precision with which the solution \(a_k\) of (2.18) can be obtained, we replace \(\sigma ^2 \) with its first order Taylor expansion around \(a=a_k\)

$$\begin{aligned} \sigma ^2 (\Delta E; a) \; = \; \frac{1}{12}\, \delta _E^2 \; \Bigl [ 1 \; + \; \mathcal{O} \Bigl ( \delta _E\Delta a \Bigr )^2 \Bigr ] \; \; \Bigl [ 1 + \mathcal{O}(\delta _E) \Bigr ] . \end{aligned}$$
(2.23)

Hence, the Newton–Raphson iteration is given by

$$\begin{aligned} a^{(n+1)}_k \; = \; a^{(n)}_k \; + \; \frac{12 }{\delta _E^2 } \; \left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a^{(n)}_k) \end{aligned}$$
(2.24)

We point out that one fixed point of the above iteration, i.e., \(a^{(n+1)}_k=a^{(n)}_k=a_k\), is attained for

$$\begin{aligned} \left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a_k) = 0 , \end{aligned}$$

which, indeed, is the correct solution. We have already shown that the above equation has only one solution. Hence, if the iteration converges at all, it necessarily converges to the true solution. Note that convergence can always be achieved by suitable choice of under-relaxation. We here point out that the solution to question (ii) above will involve a particular type of under-relaxation.

Let us address question (ii) now. We have already pointed out that we have only a stochastic estimate for the expectation value \(\left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a) \) and the convergence of the Newton–Raphson method is necessarily hampered by the inevitable statistical error of the estimator. This problem, however, has been already solved by Robbins and Monroe [15].

For completeness, we shall now give a brief presentation of the algorithm. The starting point is the function M(x), and a constant \(\alpha \), such that the equation \(M(x) = \alpha \) has a unique root at \(x=\theta \). M(x) is only available by stochastic estimation using the random variable N(x):

$$\begin{aligned} \mathbb E[N(x)] = M(x) , \end{aligned}$$

with \(\mathbb E[N(x)]\) being the ensemble average of N(x). The iterative root finding problem is of the type

$$\begin{aligned} x_{n+1} \; = \; x_n \; + \; c_n \, (\alpha -N(x_n)) \end{aligned}$$
(2.25)

where \(c_n\) is a sequence of positive numbers sizes satisfying the requirements

$$\begin{aligned} \sum ^{\infty }_{n=0}c_n = \infty \quad \text{ and } \quad \sum ^{\infty }_{n=0}c^2_n < \infty \end{aligned}$$
(2.26)

It is possible to prove that under certain assumptions [15] on the function M(x) the \(\lim _{n\rightarrow \infty }x_n\) converges in \(L^2\) and hence in probability to the true value \(\theta \). A major advance in understanding the asymptotic properties of this algorithm was the main result of [15]. If we restrict ourselves to the case

$$\begin{aligned} c_n=\frac{c}{n+1} \end{aligned}$$
(2.27)

one can prove that \(\sqrt{n}(x_n -\theta )\) is asymptotically normal with variance

$$\begin{aligned} \sigma ^2_x=\frac{c^2 \sigma ^2_\xi }{2~c~M'(x)-1} \end{aligned}$$
(2.28)

where \(\sigma _\xi ^2\) is the variance of the noise. Hence, the optimal value of the constant c, which minimises the variance is given by

$$\begin{aligned} c=\frac{1}{M'(\theta )} . \end{aligned}$$
(2.29)

Adapting the Robbins–Monro approach to our root finding iteration in (2.24), we finally obtain an under-relaxed Newton–Raphson iteration

$$\begin{aligned} a^{(n+1)}_k \; = \; a^{(n)}_k \; + \; \frac{12 }{\delta _E^2 \; (n+1) } \; \left\langle \left\langle \Delta E \right\rangle \right\rangle _k (a^{(n)}_k) , \end{aligned}$$
(2.30)

which is optimal with respect to the statistical noise during iteration.

2.3 Observables and convergence with \(\delta _E\)

We have already pointed out that expectation values of observables depending on the action only can be obtained by a simple integral over the density of states (see (2.2)). Here we develop a prescription for determining the values of expectations of more general observables by folding with the numerical density of states and analyse the dependence of the estimate on \(\delta _E\).

Let us denote a generic observable by \(B(\phi )\). Its expectation value is defined by

$$\begin{aligned} \langle B[\phi ] \rangle \; = \; \frac{1}{Z(\beta )} \; \int \mathcal{D}\phi \; B[\phi ] \; \mathrm {e}^{\beta S[\phi ] } \end{aligned}$$
(2.31)

In order to relate to the LLR approach, we break up the latter integration into energy intervals:

$$\begin{aligned} \langle B[\phi ] \rangle \; = \; \frac{1}{Z(\beta )} \; \sum _i \int \mathcal{D}\phi \; \theta _{[E_i, \delta _E]} \; B[\phi ] \; \; \mathrm {e}^{\beta S[\phi ] } . \end{aligned}$$
(2.32)

Note that \(\langle B[\phi ] \rangle \) does not depend on \(\delta _E\).

We can express \(\langle B[\phi ] \rangle \) in terms of a sum over double-bracket expectation values by choosing

$$\begin{aligned} W := B[\phi ] \;\exp \{ (\beta + a_i) S[\phi ] \} \end{aligned}$$

in (2.13). Without any approximation, we find

$$\begin{aligned}&\langle B[\phi ] \rangle = \frac{1}{Z(\beta )} \; \sum _i \mathcal{N}_i \, \mathrm {e}^{a_iE_i} \;\nonumber \\&\quad \times \left\langle \left\langle B[\phi ] \; \exp \{ \beta S [\phi ]\, + \, a_i ( S[\phi ] - E_i) \} \right\rangle \right\rangle \; (E_i) , \end{aligned}$$
(2.33)
$$\begin{aligned}&Z(\beta ) = \sum _i \mathcal{N}_i \, \mathrm {e}^{a_iE_i} \;\nonumber \\&\quad \times \left\langle \left\langle \; \exp \{ \beta S [\phi ]\, + \, a_i ( S[\phi ] - E_i) \} \right\rangle \right\rangle (E_i) . \end{aligned}$$
(2.34)

where \(\mathcal{N}_i = \mathcal{N}_i (a_i)\) is defined in (2.14). The above result can be further simplified by using (2.10) and (2.11):

$$\begin{aligned} \mathcal{N}_i \, \mathrm {e}^{a_iE_i}= & {} \int _{E_i}^{E_i+\delta _E} \mathrm{d}E\; \rho (E) \; \exp \{-a_i (E-E_i) \} \;\nonumber \\= & {} \; \mathrm {e}^{ \mathcal{O}(\delta _E^2) } \int _{E_i}^{E_i+\delta _E} \mathrm{d}E\; \tilde{\rho } (E) \; \exp \{-a_i (E-E_i) \} \nonumber \\= & {} \mathrm {e}^{ \mathcal{O}(\delta _E^2) } \; \tilde{\rho } (E_i) \; \int _{E_i}^{E_i+\delta _E} \mathrm{d}E\; = \; \delta _E\; \tilde{\rho } \left( E_i\right) \; \mathrm {e}^{ \mathcal{O}(\delta _E^2) } \nonumber \\= & {} \delta _E\; \tilde{\rho } \left( E_i \right) \; \Bigl [ 1 \; + \; \mathcal{O}( \delta _E^2) \, \Bigr ] . \end{aligned}$$
(2.35)

We now define the approximation to \(\langle B[\phi ] \rangle \) by

$$\begin{aligned} \langle B[\phi ] \rangle _\mathrm {app}= & {} \frac{1}{Z(\beta )} \, \sum _i \delta _E\;\tilde{\rho } \left( E_i\right) \;\nonumber \\&\times \left\langle \left\langle B[\phi ] \, \exp \{ \beta S[\phi ]\, + \, a_i ( S[\phi ] - E_i) \} \right\rangle \right\rangle \nonumber \\\end{aligned}$$
(2.36)
$$\begin{aligned} Z(\beta ):= & {} \sum _i\; \delta _E\; \tilde{\rho }\left( E_i \right) \;\nonumber \\&\times \left\langle \left\langle \; \exp \{ \beta S [\phi ]\, + \, a_i ( S[\phi ] - E_i) \} \right\rangle \right\rangle . \end{aligned}$$
(2.37)

Since the double-bracket expectation values do not produce a singularity if \(\delta _E\rightarrow 0\), i.e.,

$$\begin{aligned} \lim _{\delta _E\rightarrow 0} \, \left\langle \left\langle B[\phi ] \, \exp \{ \beta S [\phi ]\, + \, a_i ( S[\phi ] - E_i) \} \right\rangle \right\rangle \; = \; \hbox {finite} , \end{aligned}$$

using (2.35), from (2.33) and (2.34) we find that

$$\begin{aligned} \langle B[\phi ] \rangle&= \langle B[\phi ] \rangle _\mathrm {app} + \sum _i \mathcal{O}(\delta _E^3)\nonumber \\&= \langle B[\phi ] \rangle _\mathrm {app} \; + \; \mathcal{O}(\delta _E^2) . \end{aligned}$$
(2.38)

The latter formula together with (2.36) provides access to all types of observables using the LLR method with little more computational resources: Once the Robbins–Monro iteration (2.30) has settled for an estimate of the coefficient \(a_k\), the Monte-Carlo simulation simply continues to derive estimators for the double-bracket expectation values in (2.36) and (2.37).

With the further assumption that the double-bracket expectation values are (semi-)positive, an even better error estimate is produced by our approach:

$$\begin{aligned} \langle B[\phi ] \rangle= & {} \langle B[\phi ] \rangle _\mathrm {app} \; + \; \sum _i \mathcal{O}(\delta _E^3) \\= & {} \langle B[\phi ] \rangle _\mathrm {app} \; \Bigl [ 1 \; + \; \mathcal{O}(\delta _E^2) \; \Bigr ]. \end{aligned}$$

This implies that the observable \(\langle B[\phi ] \rangle \) can be calculated with an relative error of order \(\delta _E^2\). Indeed, we find from (2.33, 2.34, 2.35) that

$$\begin{aligned} \langle B[\phi ] \rangle= & {} \frac{1}{Z(\beta )} \, \sum _i \delta _E\; \tilde{\rho } \left( E_i\right) \; \langle B[\phi ] \, \exp \{ \beta S [\phi ] \nonumber \\&+\, a_i ( S[\phi ] - E_i) \} \rangle \exp \Bigl \{ \mathcal{O}(\delta _E^2) \, \Bigr \} , \end{aligned}$$
(2.39)
$$\begin{aligned} Z(\beta ):= & {} \sum _i\; \delta _E\; \tilde{\rho } \left( E_i \right) \; \left\langle \left\langle \; \exp \{ \beta S [\phi ]\!+\! a_i ( S[\phi ] \!-\! E_i) \} \right\rangle \right\rangle . \nonumber \\ \end{aligned}$$
(2.40)

Thereby, we have used

$$\begin{aligned} \left| \sum _i a_i \, \exp \Bigl \{ c_i \delta _E^2 \Bigr \} \right|\le & {} \sum _i \vert a_i \vert \, \left| \exp \{ c_i \delta _E^2 \} \right| \\\le & {} \sum _i \vert a_i \vert \, \exp \{ c_\mathrm {max} \delta _E^2 \} \\= & {} \exp \{ c_\mathrm {max} \delta _E^2 \} \; \sum _i a_i \\= & {} \exp \Bigl \{ \mathcal{O}(\delta _E^2) \, \Bigr \} \times \, \sum _i a_i . \end{aligned}$$

The assumption of (semi-)positive double-expectation values is true for many action observables, and possibly also for Wilson loops, whose re-weighted and action restricted double-expectation values might turn out to be positive (as is the case for their standard expectation values). In this case, our method would provide an efficient determination of those quantities. This is important in particular for large Wilson loop expectation values, since they are notoriously difficult to measure with importance sampling methods (see e.g. [16]). We also note that, in order to have an accurate determination of a generic observable, any Monte-Carlo estimate of the double-expectation values must be obtained to good precision dictated by the size of \(\delta _E\). A detailed numerical investigation of these and related issues is left to future work.

For the specific case that the observable \(B[\phi ]\) only depends on the action \(S[\phi ]\), we circumvent this problem and evaluate the double-expectation values directly. To this aim, we introduce for the general case \( \left\langle \left\langle W[\phi ] \right\rangle \right\rangle _k\) the generalised density \(w_k(E)\) by

$$\begin{aligned} \rho (E) \; w_k (E) \; = \; \int \mathcal{D} \phi \; \theta _{[E_k,\delta _E]} (S[\phi ]) \; W[\phi ] \; \delta \Bigl ( E - S[\phi ] \Bigr ) . \end{aligned}$$
(2.41)

We then point out that if \(W[\phi ]\) is depending on the action only, i.e., \(W[\phi ] = f(S[\phi ])\), we obtain

$$\begin{aligned} w_k(E) \; = \; f(E) \; \theta _{[E_k,\delta _E]} (E) . \end{aligned}$$

With the definition of the double-expectation value (2.13), we find

$$\begin{aligned} \left\langle \left\langle W[\phi ] \right\rangle \right\rangle _k (a_k) \; = \; \frac{ \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; \rho (E) \; \mathrm {e}^{-a_k E} \; w_k(E) }{ \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; \rho (E) \; \mathrm {e}^{-a_k E} } . \end{aligned}$$
(2.42)

Rather than calculating \(\left\langle \left\langle W[\phi ] \right\rangle \right\rangle _k\) by Monte-Carlo methods, we can analytically evaluate this quantity (up to order \(\mathcal{O}(\delta _E^2) \) ). Using the observation that for any smooth (\(C_2\)) function g

$$\begin{aligned} \int _{E_k}^{E_k+\delta _E} \mathrm{d}E \; g(E) \; = \; \delta _E\; g \left( E_k + \frac{ \delta _E}{2} \right) \; + \; \mathcal{O} \Bigl ( \delta _E^3 \Bigr ) , \end{aligned}$$

and, using this equation for both numerator and denominator of (2.42), we conclude that

$$\begin{aligned} \left\langle \left\langle W[\phi ] \right\rangle \right\rangle _k (a_k) \; = \; w_k \left( E_k + \frac{ \delta _E}{2} \right) \; + \; \mathcal{O} \Bigl ( \delta _E^2 \Bigr ) . \end{aligned}$$
(2.43)

Let us now specialise to the case that is relevant for (2.39) with B depending on the action only:

$$\begin{aligned} W[\phi ]= & {} b\Bigl ( S[\phi ] \Bigr ) \,\exp \{ \beta S [\phi ] + a_i ( S[\phi ] - E_i) \} , \nonumber \\ w_i(E)= & {} b(E) \, \exp \{ \beta E + a_i ( E - E_i) \} . \end{aligned}$$
(2.44)

This leaves us with

$$\begin{aligned} \left\langle \left\langle W[\phi ] \right\rangle \right\rangle _i (a_i) \; = \; b\Bigl ( E_i + \frac{\delta _E}{2} \Bigr ) \; \mathrm {e}^{\beta ( E_i + \frac{\delta _E}{2} ) } \; \mathrm {e}^{a_i \frac{\delta _E}{2} } \; + \; \mathcal{O} \Bigl ( \delta _E^2 \Bigr ) . \end{aligned}$$
(2.45)

Inserting (2.43) together with (2.44) into (2.36), we find

$$\begin{aligned} \langle B[\phi ] \rangle= & {} \frac{1}{Z(\beta )} \; \sum _i \delta _E\; \tilde{\rho } \left( E_i + \frac{\delta _E}{2} \right) \;\nonumber \\&\times b_i \Bigl (E_i + \frac{\delta _E}{2} \Bigr ) \; \mathrm {e}^{\beta (E_i + \frac{\delta _E}{2})} \; + \; \mathcal{O}\Bigl ( \delta _E^2 \Bigr ) , \end{aligned}$$
(2.46)
$$\begin{aligned} Z(\beta )= & {} \sum _i \delta _E\; \tilde{\rho } \left( E_i + \frac{\delta _E}{2} \right) \; \mathrm {e}^{\beta (E_i + \frac{\delta _E}{2})} . \end{aligned}$$
(2.47)

Below, we will numerically test the quality of expectation values obtained by the LLR approach using action observables only, i.e., \(B[\phi ] = O(S[\phi ])\). We will find that we indeed achieve the predicted precision in \(\delta _E^2 \) for this type of observables (see below Fig. 6).

2.4 The numerical algorithm

So far, we have shown that a piecewise continuous approximation of the density of states that is linear in intervals of sufficiently small amplitude \(\delta _E\) allows us to obtain a controlled estimate of averages of observables and that the angular coefficients \(a_i\) of the linear approximations can be computed in each interval i using the Robbins–Monro recursion (2.30). Imposing the continuity of \(\log \rho (E)\), one can then determine the latter quantity up to an additive constant, which does not play any role in cases in which observables are standard ensemble averages.

The Robbins–Monro recursion can easily be implemented in a numerical algorithm. Ideally, the recurrence would be stopped when a tolerance \(\epsilon \) for \(a_i\) is reached, i.e. when

$$\begin{aligned} \left| a^{(n+1)}_i- a^{(n)}_i \right| = \frac{12~\left| \Delta E_i(a^{(n)}_i) \right| }{(n+1)~\delta _E^2} \le \epsilon \ , \end{aligned}$$
(2.48)

with (for instance) \(\epsilon \) set to the precision of the computation. When this condition is fulfilled, we can set \(a_i = a^{(n+1)}_i\). However, one has to keep into account the fact that the computation of \(\Delta E_i\) requires an averaging over Monte-Carlo configurations. This brings into play considerations about thermalisation (which has to be taken into account each time we send \(a^{(n)}_i \rightarrow a^{(n+1)}_i\)), the number of measurements used for determining \(\Delta E_i\) at fixed \(a^{(n)}_i\) and—last but not least—fluctuations of the \(a^{(n)}_i\) themselves.

Following those considerations, an algorithm based on the Robbins–Monro recursion relation should depend on the following input (tuneable) parameters:

  • \(N_{\mathrm {TH}}\), the number of Monte-Carlo updates in the restricted energy interval before starting to measure expectation values;

  • \(N_{\mathrm {SW}}\), the number of iterations used for computing expectation values;

  • \(N_{\mathrm {RM}}\), the number of Robbins–Monro iterations for determining \(a_i\);

  • \(N_B\), number of final values from the Robbins–Monro iteration subjected to a subsequent bootstrap analysis.

The version of the LLR method proposed and implemented in this paper is reported in an algorithmic fashion in the box Algorithm 1. This implementation differs from that provided in [9, 10] by the replacement of the originally proposed root finding procedure based on a deterministic Newton–Raphson like recursion with the Robbins–Monro recursion, which is better suited to the problem of finding zeroes of stochastic equations.

figure a

Since the \(a_i\) are determined stochastically, a different reiteration of the algorithm with different starting conditions and different random seeds would produce a different value for the same \(a_i\). The stochastic nature of the process implies that the distribution of the \(a_i\) found in different runs is Gaussian. The generated ensemble of the \(a_i\) can then be used to determine the error of the estimate of observables using analysis techniques such as jackknife and bootstrap.

The parameters \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\) depend on the system and on the phenomenon under investigation. In particular, standard thermodynamic considerations on the infinite volume limit imply that if one is interested in a specific range of temperatures and the studied observables can be written as statistical averages with Gaussian fluctuations, it is possible to restrict the range of energies between the energy that is typical of the smallest considered temperature and the energy that is typical of the highest considered temperature. Determining a reasonable value for the amplitude of the energy interval \(\delta _E\) and the other tuneable parameters \(N_{\mathrm {SW}}\), \(N_{\mathrm {TH}}\), \(N_{\mathrm {RM}}\) and \(N_{\mathrm {A}}\) requires a modest amount of experimenting with trial values. In our applications we found that the results were very stable for wide ranges of values of those parameters. Likewise, \(\bar{a}_i\), the initial value for the Robbins–Monro recursion in interval i, does not play a crucial role; when required and possible, an initial value close to the expected result can be inferred inverting \(\langle E (\beta )\rangle \), which can be obtained with a quick study using conventional techniques.

The average \(\left\langle \left\langle \dots \right\rangle \right\rangle \) imposes an update that restricts configurations to those with energies in a specific range. In most of our studies, we have imposed the constraint analytically at the level of the generation of the newly proposed variables, which results in a performance that is comparable with that of the unconstrained system. Using a simple-minded more direct approach, in which one imposes the constraint after the generation of the proposed new variable, we found that in most cases the efficiency of Monte-Carlo algorithms did not drop drastically as a consequence of the restriction, and even for systems like SU(3) (see Ref. [9]) we were able to keep an efficiency of at least 30 % and in most cases no less than 50 % with respect to the unconstrained system.

2.5 Ergodicity

Fig. 1
figure 1

Left For contiguous energy intervals if a transition between configurations with energy in the same interval requires going through configurations with energy that are outside that interval, the simulation might get trapped in one of the allowed regions. Right For overlapping energy intervals with replica exchange, the simulation can travel from one allowed region to the other through excursions to the upper interval

Our implementation of the energy restricted average \(\left\langle \left\langle \cdots \right\rangle \right\rangle \) assumes that the update algorithm is able to generate all configurations with energy in the relevant interval starting from configurations that have energy in the same interval. This assumption might be too strong when the update is localFootnote 2 in the energy (i.e. each elementary update step changes the energy by a quantity of order one for a system with total energy of order V) and there are topological excitations that can create regions with the same energy that are separated by high energy barriers. In these cases, which are rather common in gauge theories and statistical mechanicsFootnote 3, generally in order to go from one acceptable region to the other one has to travel through a region of energies that is forbidden by an energy-restricted update method such as the LLR. Hence, by construction, in such a scenario our algorithm will get trapped in one of the allowed regions. Therefore, the update will not be ergodic.

In order to solve this problem, one can use an adaptation of the replica exchange method [17], as first proposed in [18]. The idea is that instead of dividing the whole energy interval in contiguous sub-intervals overlapping only in one point (in the following simply referred to as contiguous intervals), one can divide it in sub-intervals overlapping in a finite energy region (this case will be referred to as overlapping intervals). With the latter prescription, after a fixed number of iterations of the Robbins–Monro procedure, we can check whether in any pairs of overlapping intervals \((I_1, I_2\)) the energy of both corresponding configurations is in the common region. For pairs fulfilling this condition, we can propose an exchange of the configurations with a Metropolis probability

$$\begin{aligned} P_{\mathrm {swap}} = \mathrm {min}\left( 1,e^{\left( a^{(n)}_{I_1} - a^{(n)}_{I_2}\right) \left( E_{C_1} - E_{C_2}\right) }\right) \ , \end{aligned}$$
(2.49)

where \(a^{(n)}_{I_1}\) and \(a^{(n)}_{I_2}\) are the values of the parameter a at the current n-th iterations of the Robbins–Monro procedure, respectively, in intervals \(I_1\) and \(I_2\) and \(E_{C_1}\) (\(E_{C_2}\)) is the value of the energy of the current configuration \(C_1\) (\(C_2\)) of the replica in the interval \(I_1\) (\(I_2\)). If the proposed exchange is accepted, \(C_1 \rightarrow C_2\) and \(C_2 \rightarrow C_1\). With repeated exchanges of configurations from neighbour intervals, the system can now travel through all configuration space. A schematic illustration of how this mechanism works is provided in Fig. 1.

As already noticed in [18], the replica exchange step is amenable to parallelisation and hence can be conveniently deployed in calculations on massively parallel computers. Note that the replica exchange step adds another tuneable parameter to the algorithm, which is the number \(N_{\mathrm {SWAP}}\) of configurations swaps during the Monte-Carlo simulation at a given Monte-Carlo step. A modification of the LLR algorithm that incorporates this step can easily be implemented.

2.6 Reweighting with the numerical density of states

In order to screen our approach outlined in Sects. 2.2 and 2.3 for ergodicity violations and to propose an efficient procedure to calculate any observable once an estimate for the density of states has been obtained, as an alternative to the replica exchange method discussed in the previous section, we here introduce an importance sampling algorithm with re-weighting with respect to the estimate \(\tilde{\rho }\). This algorithm features short correlation times even near critical points. Consider for instance a system described by the canonical ensemble. We define a modified Boltzmann weight \(W_B(E)\) as follows:

$$\begin{aligned} W_B(E) = \left\{ \begin{array}{ll} e^{ \beta _1 E + c_1} &{}\quad \text{ for } \ E < E_{\mathrm {min}}\ ; \\ 1/\tilde{\rho }(E) &{}\quad \text{ for } \ E_{\mathrm {min}}\le E \le E_{\mathrm {max}}\ ; \\ e^{ \beta _2 E + c_2} &{}\quad \text{ for } \ E > E_{\mathrm {max}}\ . \end{array} \right. \end{aligned}$$
(2.50)

Here \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\) are two values of the energy that are far from the typical energy of interest E:

$$\begin{aligned} E_{\mathrm {min}}\ll E \ll E_{\mathrm {max}}\ . \end{aligned}$$
(2.51)

If conventional Monte-Carlo simulations can be used for numerical studies of the given system, we can choose \(\beta _1\) and \(\beta _2\) from the conditions

$$\begin{aligned} \langle E(\beta _i) \rangle = E_i \ , \qquad i=1,2 \ . \end{aligned}$$
(2.52)

If importance sampling methods are inefficient or unreliable, \(\beta _1\) and \(\beta _2\) can be chosen to be the micro-canonical \(\beta _{\mu }\) corresponding respectively to the density of states centred in \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\). These \(\beta _{\mu }\) are outputs of our numerical determination \(\tilde{\rho }(E)\). The two constants \(c_1\) and \(c_2\) are determined by requiring continuity of \(W_B(E)\) at \(E_{\mathrm {min}}\) and at \(E_{\mathrm {max}}\):

$$\begin{aligned} \lim _{E \rightarrow E_{\mathrm {min}}^-} W_B(E)= & {} \lim _{E \rightarrow E_{\mathrm {min}}^+} W_B(E)\quad \text{ and } \quad \lim _{E \rightarrow E_{\mathrm {max}}^-} W_B(E)\nonumber \\= & {} \lim _{E \rightarrow E_{\mathrm {max}}^+} W_B(E) \ . \end{aligned}$$
(2.53)

Let \(\rho (E)\) be the correct density of state of the system. If \(\tilde{\rho }(E) = \rho (E)\), then for \(E_{\mathrm {min}}\le E \le E_{\mathrm {max}}\)

$$\begin{aligned} \rho (E) W_B(E) = 1 \ , \end{aligned}$$
(2.54)

and a Monte-Carlo update with weights \(W_B(E)\) drives the system in configuration space following a random walk in the energy. In practice, since \(\tilde{\rho }(E)\) is determined numerically, upon normalisation

$$\begin{aligned} \rho (E) W_B(E) \simeq 1 \ , \end{aligned}$$
(2.55)

and the random walk is only approximate. However, if \(\tilde{\rho }(E)\) is a good approximation of \(\rho (E)\), possible free energy barriers and metastabilities of the canonical system can be successfully overcome with the weights (2.50). Values of observables for the canonical ensemble at temperature \(T = 1/\beta \) can be obtained using re-weighting:

$$\begin{aligned} \langle O (\beta ) \rangle = \frac{\langle O e^{ \beta E} (W_B(E))^{-1} \rangle _W}{\langle e^{ \beta E} (W_B(E))^{-1} \rangle _W} \ , \end{aligned}$$
(2.56)

where \(\langle \ \rangle \) denotes average over the canonical ensemble and \(\langle \ \rangle _W\) average over the modified ensemble defined in (2.50). The weights \(W_B(E)\) guarantee ergodic sampling with small auto-correlation time for the configurations with energies E such that \(E_{\mathrm {min}}\le E \le E_{\mathrm {max}}\), while suppressing to energy \(E \ll E_{\mathrm {min}}\) and \(E \gg E_{\mathrm {max}}\). Hence, as long as for a given \(\beta \) of the canonical system \(\overline{E} = \langle E \rangle \) and the energy fluctuation \(\Delta \overline{E} = \sqrt{\langle E^2 \rangle - \langle E \rangle ^2 }\) are such that

$$\begin{aligned} E_{\mathrm {min}}\ll \overline{E} - \Delta \overline{E} \quad \text{ and } \quad \overline{E} + \Delta \overline{E} \ll E_{\mathrm {max}}\ , \end{aligned}$$
(2.57)

the re-weighting (2.56) does not present any overlap problem. The role of \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\) is to restrict the approximate random walk only to energies that are physically interesting, in order to save computer time. Hence, the choice of \(E_{\mathrm {min}}\), \(E_{\mathrm {max}}\) and of the corresponding \(\beta _1\), \(\beta _2\) do not need to be fine-tuned, the only requirement being that Eq. (2.57) hold. These conditions can be verified a posteriori. Obviously, choosing the smallest interval \(E_{\mathrm {max}}- E_{\mathrm {min}}\) where the conditions (2.57) hold optimises the computational time required by the algorithm. The weights (2.56) can easily be imposed using a metropolis or a biased metropolis [19]. Again, due to the absence of free energy barriers, no ergodicity problems are expected to arise. This can be checked by verifying that in the simulation there are various tunnellings (i.e. round trips) between \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\) and that the frequency histogram of the energy is approximately flat between \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\). Reasonable requirements are to have \(\mathcal{O}(100-1000)\) tunnellings and an histogram that is flat within 15–20 %. These criteria can be used to confirm that the numerically determined \(\tilde{\rho }(E)\) is a good approximation of \(\rho (E)\). The flatness of the histogram is not influenced by the \(\beta \) of interest in the original multi-canonical simulation. This is particularly important for first order phase transitions, where traditional Monte-Carlo algorithms have a tunnelling time that is exponentially suppressed with the volume of the system. Since the modified ensemble relies on a random walk in energy, the tunnelling time between two fixed energy densities is expected to grow only as the square root of the volume.

This procedure of using a modified ensemble followed by re-weighting is inspired by the multi-canonical method [20], the only substantial difference being the recursion relation for determining the weights. Indeed for U(1) lattice gauge theory a multi-canonical update for which the weights are determined starting from a Wang–Landau recursion is discussed in [7]. We also note that the procedure used here to restrict ergodically the energy interval between \(E_{\mathrm {min}}\) and \(E_{\mathrm {max}}\) can easily be implemented also in the replica exchange method analysed in the previous subsection.

3 Application to compact U(1) lattice gauge theory

3.1 The model

Compact U(1) lattice gauge theory is the simplest gauge theory based on a Lie group. Its action is given by

$$\begin{aligned} S = \beta \sum _{x, \mu < \nu } \cos (\theta _{\mu \nu }(x) ) \ , \end{aligned}$$
(3.1)

where \(\beta = 1/g^2\), with \(g^2\) the gauge coupling, x is a point of a d-dimensional lattice of size \(L^d\) and \(\mu \) and \(\nu \) indicate two lattice directions, with index from 1 to d (for simplicity, in this work we shall consider only the case \(d = 4\)), \(\theta _{\mu \nu }\) plays the role of the electromagnetic field tensor: if we associate the compact angular variable \(\theta _{\mu }(x) \in [ - \pi ; \pi [\) with the link stemming from i in direction \(\hat{\mu }\),

$$\begin{aligned} \theta _{\mu \nu }(x) = \theta _{\mu }(x) + \theta _{\nu }(x + \hat{\mu }) - \theta _{\mu }(x + \hat{\nu }) - \theta _{\nu }(x) \ . \end{aligned}$$
(3.2)

The path integral of the theory is given by

$$\begin{aligned} Z = \int \mathcal{D \theta _{\mu }}\; \mathrm {e}^{S} , \quad \mathcal{D \theta _{\mu }} = \prod _{x,\mu } \frac{\text{ d } \theta _{\mu }(x)}{2 \pi } \ , \end{aligned}$$
(3.3)

the latter identity defining the Haar measure of the U(1) group.

The connection with the general framework of lattice gauge theories is better elucidated if we introduce the link variable

$$\begin{aligned} U_{\mu }(x) = e^{i \theta _{\mu }(x)} \ . \end{aligned}$$
(3.4)

With this definition, S can be rewritten as

$$\begin{aligned} S = \beta \sum _{x, \mu < \nu } \mathcal{R}e \, P_{\mu \nu } (x)\ , \end{aligned}$$
(3.5)

with

$$\begin{aligned} P_{\mu \nu }(x) = U_{\mu }(x) U_{\nu }(x + \hat{\mu }) U_{\mu }^{*} (x + \hat{\nu }) U_{\nu }^*(x) \end{aligned}$$

the plaquette variable, and \(U^{*}_{\mu }(x)\) is the complex conjugate of \(U_{\mu }(x)\). Working with the variables \(U_{\mu }(x)\) allows us to show immediately that S is invariant under U(1) gauge transformations, which act as

$$\begin{aligned} U_{\mu }(x) \mapsto \Lambda ^{*}(x) \; U_{\mu }(x) \; \Lambda (x + \hat{\mu }) \ , \qquad \Lambda (x) = e^{i \lambda (x)} \ , \end{aligned}$$
(3.6)

with \(\lambda (x) \in [-\pi ; \ \pi [\) a function defined on lattice points.

The connection with U(1) gauge theory in the continuum can be shown by introducing the lattice spacing a and the non-compact gauge field \(a \, A_{\mu }(x) = \theta _{\mu }(x)/ g\), so that

$$\begin{aligned} U_{\mu }(x) = e^{i g a \, A_{\mu }(x)} \ . \end{aligned}$$
(3.7)

Taking a small and expanding the cosine leads us to

$$\begin{aligned} S= & {} - \frac{1}{4} a^4 \sum _{x, \mu , \nu } \left( \Delta _{\mu } A_{\nu }(x) - \Delta _{\nu } A_{\mu }(x) \right) ^2\nonumber \\&+ \,\mathcal{O}(a^6) \, + \, \hbox {constant} , \end{aligned}$$
(3.8)

with \(\Delta _{\mu }\) the forward difference operator. In the limit \(a \rightarrow 0\), we finally find

$$\begin{aligned} S \simeq - \frac{1}{4} \int \text{ d }^4 x F_{\mu \nu }(x) ^2 , \end{aligned}$$
(3.9)

with \(F_{\mu \nu }\) being the usual field strength tensor. This shows that in the classical \(a \rightarrow 0\) limit S becomes the Euclidean action of a free gas of photons, with interactions being related to the neglected lattice corrections. It is worth to remark that this classical continuum limit is not the continuum limit of the full theory. In fact, this classical continuum limit is spoiled by quantum fluctuations. These prevent the system from developing a second order transition point in the \(a \rightarrow 0\) limit, which is a necessary condition to be able to remove the ultraviolet cutoff introduced with the lattice discretisation. The lack of a continuum limit is related to the fact that the theory is strongly coupled in the ultraviolet. Despite the non-existence of a continuum limit for compact U(1) lattice gauge theory, this lattice model is still interesting, since it provides a simple realisation of a weakly first order phase transition. This bulk phase transition separates a confining phase at low \(\beta \) (whose existence was pointed out by Wilson [21] in his seminal work on lattice gauge theory) from a deconfined phase at high \(\beta \), with the transition itself occurring at a critical value of the coupling \(\beta _c \simeq 1\). Rather unexpectedly at first side, importance sampling Monte-Carlo studies of this phase transitions turned out to be demanding and not immediate to interpret, with the order of the transition having been debated for a long time (see e.g. [2231]). The issue was cleared only relatively recently, with investigations that made a crucial use of supercomputers [32, 33]. What makes the transition difficult to observe numerically is the role played in the deconfinement phase transition by magnetic monopoles [34], which condense in the confined phase [34, 35].

The existence of topological sectors and the presence of a transition with exponentially suppressed tunnelling times can provide robust tests for the efficiency and the ergodicity of our algorithm. This motivates our choice of compact U(1) for the numerical investigation presented in this paper.

3.2 Simulation details

The study of the critical properties of U(1) lattice gauge theory is presented in this section. In order to test our algorithm, we investigated the behaviour of specific heat as a function of the volume. This quantity has been carefully investigated in previous studies, and as such provides a stringent test of our procedure. In order to compare data across different sizes, our results will be often provided normalised to the number of plaquette \(6 L^4 = 6V\).

We studied lattices of size ranging from \(8^4\) to \(20^4\) and for each lattice size we computed the density of states \(\rho (E)\) in the interval \(E_{\mathrm {min}}\le E \le E_{\mathrm {max}}\) (see Table 1). The rationale behind the choice of the energy region is that it must be centred around the critical energy and it has to be large enough to study all the critical properties of the theory, i.e. every observable evaluated has to have support in this region and have virtually no correction coming from the choice of the energy boundaries.

Table 1 Values of the tuneable parameters of the LLR algorithm used in our numerical investigation, in the last column we report the total number of global MC steps needed to perform the entire investigation

We divided the energy interval in steps of \(\delta _E\) and for each of the sub-interval we have repeated the entire generation of the log-linear density of states function and evaluation of the observables \(N_{B}=20\) times to create the bootstrap samples for the estimate of the errors. The values of the other tuneable parameters of the algorithm used in our study are reported in Table 1. An example determination of one of the \(a_i\) is reported in Fig. 2. The plot shows the rapid convergence to the asymptotic value and the negligible amplitude of residual fluctuations. Concerning the cost of the simulations, we found that accurate determinations of observables can be obtained with modest computational resources compared to those needed in investigations of the system with importance sampling methods. For instance, the most costly simulation presented here, the investigation of the \(20^4\) lattice, was performed on 512 cores of Intel Westmere processors in about five days. This needs to be contrasted with the fact that in the early 2000s only lattices up to \(18^4\) could be reliably investigated with importance sampling methods, with the largest sizes requiring supercomputers [32, 33].

Fig. 2
figure 2

Estimated \(a_i\) as a function of the Robbins–Monro iteration, on a \(20^4\) lattice and for action \(E/(6V) = 0.59009548\) at the centre of the interval with \(\delta _E/V=1.91\times 10^{-4}\)

Fig. 3
figure 3

Comparison between the plaquette computed with the LLR algorithm (see Sect. 2.2) and via re-weighting with respect to the estimate \(\tilde{\rho }\) (see Sect. 2.6) for a \(L=12\) lattice.

One of our first analyses was a screening for potential ergodicity violations with the LLR approach. As detailed in Sect. 2.5, these can emerge for LLR simulations using contiguous intervals as is the case for the U(1) study reported in this paper. To this aim, we calculated the action expectation value \(\langle E \rangle \) for a \(12^4\) lattice for several values using the LLR method and using the re-weighting with respect to the estimate \(\tilde{\rho }\). Since the latter approach is conceptually free of ergodicity issues, any violations by the LLR method would be flagged by discrepancy. Our findings are summarised in Fig. 3 and the corresponding table. We find good agreement for the results from both methods. This suggests that topological objects do not generate energy barriers that trap our algorithm in a restricted section of configuration space. Said in other words, for this system the LLR method using contiguous intervals seems to be ergodic.

3.3 Volume dependence of \(\log \tilde{\rho }\) and computational cost of the algorithm

As a first investigation we have performed a study of the scaling properties of the \(a_i\) as a function of the volume. In Fig. 4 we show the behaviour of the \(a_i\) with the lattice volume. The estimates are done for a fixed \(\delta _E/V\), where the chosen value for the ratio fulfils the request that within the errors all our observables are not varying for \(\delta _E\rightarrow 0\) (we report on the study of \(\delta _E\rightarrow 0\) in Sect. 3.5). As is clearly visible from the plot, the data are scaling towards an infinite volume estimate of the \(a_i\) for fixed energy density.

Fig. 4
figure 4

Estimate of \(a_i\) as a function of the energy density for various volumes. The right panel is a zoom of the interesting region

As mentioned before, the issue facing importance sampling studies at first order phase transitions are connected with tunnelling times that grow exponentially with the volume. With the LLR method, the algorithmic cost is expected to grow with the size of the system as \(V^2\), where one factor of V comes from the increase of the size and the other factor of V comes from the fact that one needs to keep the width of the energy interval per unit of volume \(\delta _E/V\) fixed, as in the large-volume limit only intensive quantities are expected to determine the physics. One might wonder whether this apparently simplistic argument fails at the first order phase transition point. This might happen if the dynamics is such that a slowing down takes place at criticality. In the case of compact U(1), for the range of lattice sizes studied here, we have found that the computational cost of the algorithm is compatible with a quadratic increase with the volume.

3.4 Numerical investigation of the phase transition

Using the density of states it is straightforward to evaluate, by direct integration (see Sect. 2.3), the expectation values of any power of the energy and evaluate thermodynamical quantities like the specific heat

$$\begin{aligned} C_V(\beta ) = \langle E^2(\beta )\rangle -\langle E(\beta )\rangle ^2. \end{aligned}$$
(3.10)

As usual we define the pseudo-critical coupling \(\beta _c(L)\) such as the coupling at which the peak of the specific heat occurs for a fixed volume. The peak of the specific heat has been located using our numerical procedure and the error bars are computed using the bootstrap method. Our results are summarised in Table 2 with a comparison with the values in [32]. Once again, the agreement testifies to the good ergodic properties of the algorithm.

Table 2 \(\beta _c(L)\) evaluated with the LLR algorithm and reference data from [32]

Using our data it is possible to make a precise estimate of the infinite volume critical beta by means of a finite size scaling analysis. The finite size scaling of the pseudo-critical coupling is given by

$$\begin{aligned} \beta _{c}(L)=\beta _{c}+\sum _{k=1}^{k_{max}} B_k L^{-4 k} , \end{aligned}$$
(3.11)

where \( \beta _{c} \) is the critical coupling. We fit our data with the function in Eq. (3.11); the results are reported in Table 3.

Table 3 Estimates of \(\beta _c\) for various choices of the fit parameters. In bold the best fits

Another quantity easily accessible is the latent heat. This quantity can be related to the height of the peak of the specific heat at the critical temperature through:

$$\begin{aligned} \frac{C_{L}(\beta _c(L))}{6L^4}=\frac{G^{2}}{4}+ \sum _{k=1}^{k_{max}} C_k L^{-4k} , \end{aligned}$$
(3.12)

where G is the latent heat. Our results for this observable are reported in Table 4. We fit the result with Eq. (3.12); see Table 5.

Table 4 \(C_V(\beta _c(L))\) evaluated with the LLR algorithm and reference data from [32]. Results for a \(20^4\) lattice have never been reported before in the literature
Table 5 Estimates of G for various choices of the fit parameters. In bold the best fits

The latent heat can be obtained also from the knowledge of the location of the peaks of the probability density at \(\beta _c\) (of infinite volume), indeed in this case the latent heat is equal to energy gap between the peaks. This direct measure can be used as crosscheck of the previous analysis. In the language of the density of states the probability density is simply given by

$$\begin{aligned} P_\beta (E) = \frac{1}{Z} \rho (E) e^{\beta E}. \end{aligned}$$
(3.13)
Fig. 5
figure 5

Probability density for \(L=20\) at \(\beta _c\). The probability is plotted at \(\beta _c\) of infinite volume hence the peaks are not of equal height

We have performed the study of the location in energy of the two peaks of \(P_{\beta _c}(E)\) (an example is displayed in Fig. 5) and we have reported them in Table 6. Also in this case we have performed a finite size scaling analysis to extract the infinite volume behaviour:

$$\begin{aligned} E_i(L)/(6V)=\epsilon _i +a_i e^{-b_i\,L}. \end{aligned}$$
(3.14)

A fit of the values in Table 6 yields \(\chi ^{2}_{red,1}=0.67, \ \epsilon _1 =0.6279(9)\) and \(\chi ^{2}_{red,2}=0.2, \ \epsilon _2=0.65485(4)\). The latent heat can be evaluated as \(G=\epsilon _2-\epsilon _1=0.0270(9)\), which is in perfect agreement with the estimates obtained by studying the scaling of the specific heat.

Table 6 Location of the peak of the probability density in the two meta-stable phases

3.5 Discretisation effects

In this section we want to address the dependence of our observables from the size of energy interval \(\delta _E\). In order to quantify this effect we study the dependence of the peak of the specific heat \(C_{v,peak}\) with \(\delta _E\) for various lattice sizes, namely 8, 10, 12, 14, 16. In Table 7 we report the lattice sizes and the corresponding \(\delta _E\) used to perform such investigation. For each pair of \(\delta _E\) and volume reported we have repeated all our simulations and analysis with the same simulation parameters reported in Table 1.

Table 7 Values of \(\delta _E\) used to perform the study of the discretisation effects. The other simulation parameters are kept identical to the one reported in Table 1

The choice of the specific heat as an observable for such investigation can easily be justified: we found that specific heat is much more sensible to the discretisation effects with respect to other simpler observables such as the plaquette expectation value. In Fig. 6 we report an example of such study relative to \(L=8\).

Fig. 6
figure 6

The peak of the \( C_{V}(\beta _C(L)) \) as a function \(\delta _E\)

We can confirm that all our data are scaling with quadratic law in \(\delta _E\) consistent with our findings in Sect. 2.3. Indeed by fitting our data with a form

$$\begin{aligned} C_V(\beta _c(L),\delta _E)=C_V(\beta _c(L)) +6 V b_{dis} \delta _E^{2}, \end{aligned}$$
(3.15)

we found \(\chi ^{2}_{\mathrm {red}} \sim 1 \) for all lattice sizes we investigated. We report in Table 8 the values of \(b_{dis}\). Note that the numerical values used in our finite size scaling analysis of the peak of \(C_V\) presented in the previous section are compatible with the results extrapolated to \(\delta _E= 0\) obtained here.

Table 8 The coefficient \(b_{dis}\) for different lattice sizes

4 Discussion, conclusions and future plans

The density of states \(\rho (E)\) is a measure of the number of configurations on the hyper-surface of a given action E. Knowing the density of states relays the calculation of the partition function to performing an ordinary integral. Wang–Landau type algorithms perform Markov chain Monte-Carlo updates with respect to \(\rho \) while improving the estimate for \(\rho \) during simulations. The LLR approach, first introduced in [9], uses a non-linear stochastic equation (see (2.17)) for this task and is particularly suited for systems with continuous degrees of freedom. To date, the LLR method has been applied to gauge theories in several publications, e.g. [1012, 14], and it has turned out in practice to be a reliable and robust method. In the present paper, we have thoroughly investigated the foundations of the method and have presented high-precision results for the U(1) gauge theory to illustrate the excellent performance of the approach.

Two key features of the LLR approach are:

  1. (i)

    It solves an overlap problem in the sense that the method can specifically target the action range that is of particular importance for an observable. This range might easily be outside the regime for which standard MC methods would be able to produce statistics.

  2. (ii)

    It features exponential error suppression: although the density of states \(\rho \) spans many orders of magnitude, \(\tilde{\rho }\), the density of states defined from the linear approximation of its log, has a nearly constant relative error (see Sect. 2.2) and the numerical determination of \(\tilde{\rho }\) preserves this level of accuracy.

We point out that feature (i) is not exclusive of the LLR method, but is quite generic for multi-canonical techniques [20], Wang–Landau type updates [4] or hybrids thereof [7].

Key ingredient for the LLR approach is the double-bracket expectation value [9] (see (2.13)). It appears as a standard Monte-Carlo expectation value over a finite action interval of size \(\delta _E\) and with the density of states as a re-weighting factor. The derivative of the density of states a(E) emerges from an iteration involving these Monte-Carlo expectation values. This implies that their statistical errors interfere with the convergence of the iteration. This might introduce a bias preventing the iteration to converge to the true derivative a(E). We resolved this issue by using the Robbins–Monro formalism [15]: we showed that a particular type of under-relaxation produces a normal distribution of the determined values a(E) with the mean of this distribution coinciding with the correct answer (see Sect. 2.2).

In this paper, we also addressed two concerns, which were raised in the wake of the publication of Ref. [9]:

  1. (1)

    The LLR simulations restrict the Monte-Carlo updates to a finite action interval and might therefore be prone to ergodicity violations.

  2. (2)

    The LLR approach seems to be limited to the calculation of action dependent observables only.

To address the first issue, we have proposed in Sects. 2.5 and 2.6 two procedures that are conceptually free of ergodicity violations. The first method is based upon the replica exchange method [17, 18]: using overlapping action ranges during the calculation of the double-bracket expectation values offers the possibility to exchange the configurations of neighbouring action intervals with appropriate probability (see Sect. 2.5 for details). The second method is a standard Monte-Carlo simulation but with the inverse of the estimated density of states, i.e., \(\tilde{\rho }^{-1}(E)\), as re-weighting factor. The latter approach falls into the class of ergodic Monte-Carlo update techniques and is not limited by a potential overlap problem: if the estimate \(\tilde{\rho }\) is close to the true density \(\rho \), the Monte-Carlo simulation is essentially a random walk in configuration space sweeping the action range of interest.

To address issue (2), we first point out that the latter re-weighting approach produces a sequence of configurations that can be used to calculate any observable by averaging with the correct weight. Second, we have developed in Sect. 2.2 the formalism to calculate any observable by a suitable sum over a combination of the density of states and double-bracket expectation values involving the observable of interest. We were able to show that the order of convergence (with the size \(\delta _E\) of the action interval) for these observables is the same as for \(\rho \) itself (i.e., \(\mathcal{O}(\delta _E^2)\)).

In view of the features of the density of states approach, our future plans naturally involve investigations that either are enhanced by the direct access to the partition function (such as the calculation of thermodynamical quantities) or that are otherwise hampered by an overlap problem. These, most notably, include complex action systems such as cold and dense quantum matter. The LLR method is very well equipped for this task since it is based upon Monte-Carlo updates with respect to the positive (and real) estimate of the density of states and features an exponential error suppression that might beat the resulting overlap problem. Indeed, a strong sign problem was solved by LLR techniques using the original degrees of freedom of the \(Z_3\) spin model [10, 11]. We are currently extending these investigations to other finite density gauge theories. QCD at finite densities for heavy quarks (HDQCD) is work in progress. We have plans to extend the studies to finite density QCD with moderate quark masses.