Emergence of AdS geometry in the simulated tempering algorithm

In our previous work [1], we introduced to an arbitrary Markov chain Monte Carlo algorithm a distance between configurations. This measures the difficulty of transition from one configuration to the other, and enables us to investigate the relaxation of probability distribution from a geometrical point of view. In this paper, we investigate the global geometry of a stochastic system whose equilibrium distribution is highly multimodal with a large number of degenerate vacua. We show that, when the simulated tempering algorithm is implemented to such a system, the extended configuration space has an asymptotically Euclidean anti-de Sitter (AdS) geometry. We further show that this knowledge of geometry enables us to optimize the tempering parameter in a simple, geometrical way.


Introduction
Let M = {x} be a configuration space, and S(x) an action. We are often concerned with calculating the vacuum expectation value (VEV) of an operator O(x) with respect to the action: In a Markov chain Monte Carlo (MCMC) simulation, we set up an algorithm that generates a stochastic process p n (x) → p n+1 (x) = dy P (x|y) p n (y) such that the probability distribution p n (x) relaxes to the desired equilibrium distribution e −S(x) /Z in the limit n → ∞.
In order for such an algorithm to be practically useful, the relaxation to equilibrium needs to be sufficiently rapid. However, when the equilibrium distribution is multimodal (i.e., when the action has very high potential barriers), transitions between configurations belonging to JHEP11(2018)060 different modes take extraordinarily long computational times, which delay the relaxation to equilibrium and make the MCMC simulation almost impractical.
To accelerate the transitions, there have been invented various methods, including the overrelaxation [2] and the simulated and the parallel tempering methods [3][4][5][6]. In the simulated tempering method [3], for example, as will be briefly reviewed in subsection 3.1, one picks up a parameter β of the model (such as the overall coefficient of the action) as a tempering parameter, and extends the configuration space by treating β as an additional dynamical variable. We then design a MCMC algorithm such that configurations belonging to different modes for the original action now can be connected easily by passing through configurations in the extended configuration space. In order for such an algorithm to work, transitions along the β direction must have significant acceptance rates, which enforces us to make a careful adjustment of the parameters that are additionally introduced when extending the configuration space.
In our previous work [1], we introduced to an arbitrary MCMC algorithm the notion of distance between configurations. This measures the difficulty of transition from one configuration to the other, and enables us to investigate the relaxation of probability distribution from a geometrical point of view. We also introduced the coarse-grained configuration space to a highly multimodal stochastic system, where configurations in the same mode are regarded as a single configuration [1]. This perspective facilitates the understanding of the global geometry of the configuration space, and is justified by the fact that distances between different modes take so large values that the difference of distances between two configurations in the same mode can be effectively neglected. In this paper, we investigate the global geometry of such a highly multimodal stochastic system with a large number of degenerate vacua. We show that, when the simulated tempering algorithm is implemented, an asymptotically Euclidean anti-de Sitter (AdS) geometry emerges in the extended, coarse-grained configuration space. We further show that this knowledge of geometry enables us to optimize the additional parameters in a simple, geometrical way. 1 This paper is organized as follows. In section 2, we give a brief review on the distance between configurations for a MCMC algorithm [1]. We mainly consider a system whose equilibrium distribution is highly multimodal with a large number of degenerate vacua. We also introduce the concept of the coarse-grained configuration space and define the metric on the space. In section 3, we implement the simulated tempering method by taking the tempering parameter to be the overall coefficient of the action, and show that the geometry of the extended, coarse-grained configuration space is given by an asymptotically Euclidean AdS space metric. In section 4, we optimize the tempering parameter such that the distances get minimized, and show that this can be easily done in a geometrical way. The conclusion is confirmed by direct numerical calculations. Section 5 is devoted to conclusion and outlook for future work.

JHEP11(2018)060 2 Distance between configurations in MCMC simulations
In this section, we give a brief review on the distance between configurations for a MCMC algorithm [1]. We mainly consider a system whose equilibrium distribution is highly multimodal with a large number of degenerate vacua. We then introduce the coarse-grained configuration space, where configurations in the same mode are regarded as a single configuration.

Definition of the distance between configurations
Let us consider a configuration space M = {x} with a real-valued action S(x). Suppose that we make a numerical simulation to estimate the VEVs of observables by using a given MCMC algorithm. We denote by P (x|y) = x|P |y the transition probability from a configuration y to a configuration x at a single Markov step. The Markov property then says that the transition probability from y to x at n steps is given by P n (x|y) = x|P n |y . We assume that the stochastic process has the unique equilibrium distribution e −S(x) /Z (Z = dxe −S(x) ) and P (x|y) satisfies the detailed balance condition P (x|y) e −S(y) = P (y|x) e −S(x) . We further assume that all the eigenvalues of P (x|y) are positive (see [1] for mathematical details).
The distance θ n (x, y) between two configurations x, y [1] is then defined by θ n (x, y) ≡ arccos P n (x|y) P n (y|x) P n (x|x) P n (y|y) . (2.1) One can easily show that θ n (x, y) satisfies the axioms of distance [1]:

4)
θ n (x, y) + θ n (y, z) ≥ θ n (x, z), (2.5) and vanishes in the limit n → ∞ for any two configurations x, y ∈ M: lim n→∞ θ n (x, y) = 0, (2.6) because P n (x|y) → e −S(x) /Z in the limit n → ∞. The distance θ n (x, y) actually measures the difficulty of transition from y to x at n steps in the sense that: • When configurations x and y belong to different modes of the equilibrium distribution, θ n (x, y) is large for a finite n.
• If x can be easily reached from y, θ n (x, y) is small even for a finite n.
One can further show that, as long as the chosen algorithm generates only local moves of configuration, the large scale structure of distance θ n (x, y) takes a universal form, in the sense that differences of distance between two such algorithms can always be absorbed into a rescaling of n [1].

JHEP11(2018)060
In [1] we introduced a few distances in addition to θ n (x, y), among which is the distance d n (x, y) that is defined by d n (x, y) = − ln P n (x|y) P n (y|x) P n (x|x) P n (y|y) (2.7) and is related with θ n (x, y) as cos θ n (x, y) = e −(1/2) d 2 n (x,y) .
(2.8) d n (x, y) agrees with θ n (x, y) when they take small values, and satisfies almost the same properties as θ n (x, y). The only exception is that d n (x, y) generically does not satisfy the triangle inequality for the original configuration space M. However, as will be discussed in subsection 2.3, d n (x, y) is more useful than θ n (x, y) for investigating the large scale geometry of the configuration space and does satisfy the triangle inequality when the configuration space is coarse-grained. Thus, we will mainly use d n (x, y) in the following discussions.

Coarse-grained configuration space
For a stochastic system whose equilibrium distribution is highly multimodal with a large number of degenerate vacua, distances d n between two different modes take so large values that the difference of distances between two configurations in the same mode can be effectively neglected. This leads us to introduce the coarse-grained configuration spaceM by regarding configurations in the same mode as a single configuration [1]. In the following, we assume that a way of separation between different modes is uniform and translationally invariant inM. A typical model which has this property can be given by the action 2 Here, the configuration space is a D-dimensional torus with the period 2L for every direction: M = {x = (x µ ) | −L < x µ ≤ L (µ = 1, . . . , D)} (we impose the periodic boundary condition). This action certainly gives a multimodal equilibrium distribution when β 0 1. The coarse-grained configuration spaceM is given by the D-dimensional lattice torusM = {n = (n µ ) | n µ = −L + 1, . . . , L − 1, L}, that consists of the degenerate classical vacua and has a translational invariance.

Distance on the coarse-grained configuration space
We here explain why we think that d n (x, y) is more suitable than θ n (x, y) for investigating the geometry ofM.  Firstly, d n (x, y) has a better resolution on the large scale structure of configuration space. In fact, as can be seen from its definition, when transitions between two configurations x and y happen only very rarely, d n (x, y) can take a large value without limitation, while the value of θ n (x, y) is saturated close to π/2. Secondly, the distance d n gives a flat Euclidean metric for a simple Gaussian distribution [1]. In fact, for a quadratic action S(x) = (ω/2) x 2 defined on the configuration space with the Langevin algorithm, the distance between x and y is given by where is the step size of the discretized fictitious time in the Langevin equation. θ n (x, y), to the contrary, has a complicated expression.
Lastly, the distance d n is expected to satisfy the triangle inequality on the coarsegrained configuration spaceM. Although we still do not have a rigorous proof on this statement, we can check it numerically for the model (2.9). To see this, we first note that the triangle inequality inM is equivalent to the concavity of d n (x, y) as a function of r = |x − y| (x, y ∈M) due to the translational symmetry of this model. Figure 1 shows d n (x, 0) for the action (2.9) with D = 1, L = 10 and β 0 = 3. As a MCMC algorithm, we use the Metropolis algorithm with the Gaussian proposal distribution with standard deviation σ x = 1.6/(2π √ β 0 ). 3 We discretize the original configuration space M with spacing 0.02, and numerically construct the transition matrix P (x|y). We then calculate d n (x, 0) for 0 ≤ x ≤ L with n = 2, 500. We see that, although d n (x, 0) does not satisfy concavity for generic configurations x ∈ M, it becomes a concave function of r = |x| when x is on the lattice of the coarse-grained configuration space (drawn as orange points in the figure).
We return to a general highly-multimodal configuration space with a large number of degenerate vacua, M = {x}, and define the metric ds 2 in the coarse-grained configuration JHEP11(2018)060 spaceM as where x and x + dx are nearby points residing onM [1]. For the translationally invariant, parity even action (2.9), the metric will take the following form for β 0 1: because the transition probability between two neighboring modes is given by O(e −const. β 0 ) as can be easily seen by an instanton calculus [1].

Emergence of the Euclidean AdS geometry
In this section, we implement the simulated tempering method for a highly multimodal system with degenerate vacua that are distributed uniformly. Universality ensures that the large-scale geometry can be studied by using the action (2.9). We take the tempering parameter to be the overall coefficient of the action, β 0 , and extend the configuration space so that the tempering parameter is also treated as a dynamical variable. We show that the global geometry of the extended configuration space is given by an asymptotically Euclidean AdS metric. In subsection 3.1 we briefly explain the simulated tempering algorithm, and in subsection 3.2 we write down the transition matrix explicitly for large β. In subsection 3.3 we prove that the geometry of the extended, coarse-grained configuration space is given by an AdS metric for large β. This statement is numerically verified in subsection 3.4.

Simulated tempering algorithm
We here give a brief review of the simulated tempering algorithm [3]. As is commented in Introduction, this is introduced to a multimodal stochastic system in order to accelerate the relaxation to global equilibrium. A MCMC simulation proceeds as follows: 1. We choose a tempering parameter (to be denoted by β 0 ) from parameters in the action, which we write as S(x; β 0 ) to indicate its dependence on β 0 .

We extend the original configuration space
4. We set up a Markov chain on the extended configuration space such that the global equilibrium distribution P eq (X) takes the form 4 P eq (X) = P eq (x, β a ) = w a e −S(x;βa) .
(3.1) 4 The weight wa will be chosen as in the following discussion to make configurations with βa appear with the same appearance ratio for all a.

JHEP11(2018)060
5. After the system is well regarded as reaching global equilibrium, we retrieve a subsample with β a=0 out of a full sample taken from the extended configuration space, and estimate the VEVs by sample averages with respect to the subsample.
In this paper, we realize the global equilibrium (3.1) by repeating the following steps: (1) We generate a local move in the x direction, X = (x, β a ) → X = (x , β a ), with the Metropolis algorithm. This is repeated 2k times so that local equilibrium is realized for fixed β a .
(2) We generate a local move in the β direction, X = (x, β a ) → X = (x, β a±1 ), using the Metropolis algorithm, where an adjacent tempering parameter is proposed with probability 1/2 and accepted with probability min(1, P eq (X )/P eq (X)). 5 We denote byP (1) the one-step transition matrix in the x direction, which will be repeated 2k times, and byP (2) that in the β direction. We regard as the transition matrix at a single Markov step. One can easily check thatP satisfies the detailed balance condition, X|P |X P eq (X ) = X |P |X P eq (X).
With the transition matrixP , the distance d n (X, Y ) is defined for the extended configuration space M × A as in (2.7): where P n (X|Y ) = X|P n |Y . In appendix A, we demonstrate that the distance d n (X, Y ) also satisfies the triangle inequality on the extended, coarse-grained configuration spacē M × A.

Matrix elements of the transition matrix
In what follows, we exclusively consider the action (2.9). In this subsection, we write down the matrix elements ofP =P k We first recall that k is taken to be sufficiently large such that the transition in the x direction,P k (1) , makes the system be well in local equilibrium. Thus, configurations will get distributed around local minima after the action ofP k (1) ,

JHEP11(2018)060
Here, [x] represents the local minimum (a lattice point) within the mode to which x belongs. The probability distribution of local equilibrium, P (loc) eq (x, β a ), should be well approximated by the Gaussian distribution around [x] for large β a : On the other hand, the matrix elements ofP (2) for a = b are given as follows (see subsection 3.1): Substituting (3.5)-(3.7) to (3.4) and making some calculation (see appendix B), we obtain the a = b matrix elements for β a , β b 1: where the explicit form of the function ∆(z) is given in (B.7). The remaining a = b elements can be determined from probability conservation, and we finally obtain the full matrix elements for β a , β b 1: x, β a |P | y, β b P (loc) eq (x, β a ) × Note that the matrix element x, β a |P | y, β b is factorized to the following form: where R a ≡ β a /β a+1 is the ratio of the parameters. One can further show that x, β a |P n | y, β b also takes a factorized form: In fact, in the equationP n =P k (1)P (2)P k (1)P n−1 , the matrixP k (1) projectsP n−1 to the local equilibrium distribution, and thus the rest calculation is the same as the case of n = 1.

Geometry of the extended, coarse-grained configuration space
We are now in a position to show that the extended, coarse-grained configuration spacē M × A for the action (2.9) has an asymptotic AdS geometry. 6 Here, the metric onM × A JHEP11(2018)060 is defined in a similar way to (2.11): where X = (x, β a ) and X + dX = (x + dx, β a+1 ) are nearby points onM × A. Since the action (2.9) is invariant under the lattice translation . Furthermore, since the action is also invariant under the reflection x µ → −x µ , h µ (β) must vanish. We thus conclude that the metric onM × A generally takes the following form for the action (2.9): We then note that g(β) should be an increasing function of β at least when β is large [1]. In fact, g(β) D µ=1 dx 2 µ expresses the squared distances between two different modes with fixed β, and it should increase as β gets larger because the transition between them becomes more difficult. Recalling that g(β) ∝ β (β 1) when the tempering is not implemented [see (2.12)], we expect that, with the implementation of the tempering, the increase of g(β) is milder than β [see figure 2]. In the following, we assume that g(β) has a powerlike increase, g(β) ∝ β q (β 1). In subsection 3.4, this assumption will be numerically verified, and q will be found to be in the range 0 < q < 1.

JHEP11(2018)060
Thus, the dependences on local equilibrium distribution disappear from the expression, only leaving a function that depends on the ratios R a . Therefore, f (β) dβ 2 is invariant under the scaling transformation β a → λ β a for large β, and thus we conclude that f (β) ∝ 1/β 2 (β 1). Putting everything together, we find that the metric on the extended, coarse-grained configuration spaceM × A takes the following form for β 1: Note that this is a Euclidean AdS metric, as can be easily seen by a coordinate transformation β → ( √ α qz/2) −2/q :

Numerical verification of the metric
In this subsection, we verify that the geometry ofM × A for large β is actually given by the metric (3.15), by numerically evaluating the distance d n (X, Y ) and comparing the result with a geodesic distance of AdS space which can be analytically calculated for the metric (3.15) as We consider a 2-dimensional (D = 2) configuration space that is compactified with the period 2L = 2000. We set the parameters to n = 100, k = 8. Transitions in the x direction at fixed β a are generated by the Metropolis algorithm using the Gaussian proposal distribution with standard deviation σ x (β a ) = 1.6/(2π √ 2β a ). 7 As for transitions in the β direction, in order to reduce boundary effects for a finite interval {β a }, we extend the set {β a } such that a runs from −200 to 200, and only consider the distances between the points X a ≡ (x, β a ) and Y a ≡ (0, β a ) with a = 0, 1, 2. The explicit functional form of β a = β(a) are chosen in two ways in order to show that the asymptotic AdS geometry is always obtained independently of the choice of β a 's. The first choice is an exponential form β a = exp(10.6 − 2.06a), and the other is a zigzag form β a = exp[10.6 − 2.06a + (−1) a ]. 7 The values of σx and k for large βa are set such thatP 2k (1) sufficiently realizes local equilibrium. Note that, when βa 1, local equilibrium distribution around a local minimum x = [x] can be well approximated by the Gaussian distribution [see (3.6)]: P . Therefore, when we propose from a configuration x a new configuration x = x + y by using the Gaussian distribution with standard deviation σx, the difference of the action, ∆S = S(x ) − S(x), appearing in the Metropolis acceptance probability [min(1, e −∆S )] has the expectation value ∆S 2π 2 βa y 2 2π 2 βa Dσ 2 x . We will set this to be an order-one constant (≡ α 2 /2) in order that the acceptance rate is significant and does not change largely for different βa's. This makes σx depend on βa as σx(βa) = α/(2π √ Dβa). In this paper, we set α = 1.6 and k = 8, which we confirm to give significant acceptance rates in actual Monte Carlo calculations. The exponential form given above is actually the optimized form that minimize distances, as will be found in subsection 4.3. With the above setting, we numerically estimate the distance d n (X a , Y a ) between X a = (x, β a ) and Y a = (0, β a ) (a = 0, 1, 2) as follows. For a fixed initial configuration Y a , we generate n-step paths sequentially, and count the number of paths that terminate at X a [to be denoted by N (X a , Y a )]. When N (X a , Y a ) becomes ten, we record two numbers: one is the number of paths that return to the initial configuration Y a [to be denoted by N (Y a , Y a )], and the other is the total number of paths that have been generated by this moment [to be denoted by N path ]. 8 Knowing N path , we then estimate P n (X a |Y a ) by

JHEP11(2018)060
Using the relation P n (X a |Y a ) = P n (Y a |X a ), which follows from the symmetry x µ → −x µ , we finally estimate the distance d n (X a , Y a ) by In order to obtain an estimate of d n (X a , Y a ) and its uncertainty [to be denoted by d n (X a , Y a ) and δd n (X a , Y a ), respectively] by using the Jackknife method, we prepare five independent sets for N (X a , Y a ), N (Y a , Y a ) by repeating the procedure in the previous paragraph five times. The results for the first choice of β a (exponential form) are depicted as dots in figure 3, while those for the second choice (zigzag form) in figure 4. Using them, we fit the parameters , α, q in (3.15) by minimizing the following χ 2 :

Optimization of the tempering parameter
In this section, we optimize the tempering parameter by demanding that the distances get minimized. We will show that a simple, geometrical consideration gives the conclusion that β a 's take an exponential form for large β a . We further will confirm this conclusion by numerically optimizing the tempering parameter set. We show that the optimized parameter set actually takes an exponential form for large β a and that they also exhibit a horizon for small β a , beyond which configurations can move freely in the x direction.

Need for the optimization of β a
The tempering parameter set A = {β a } must be chosen such that transitions in the β direction have significant acceptance rates and also that transitions in the x direction are easy for some β a . If β represents the overall coefficient of the action and we order A = {β a } as β 0 > β 1 > · · · > β A [as we have done for the action (2.9)], then the above requirement means that β a must be placed densely to some extent (and thus A must be large), and also that β a (a ∼ A) must be sufficiently small such that severe potential barriers no longer exist there. However, if one introduced too many β a 's, the size of the subsample with β a=0 would become small and the sample average would get a large statistical error. This enforces us to make a careful adjustment of the elements of the set A = {β a }. We will show in the next subsection that this adjustment can be carried out in a simple, geometrical way.

Geometrical optimization of the tempering parameter
In section 3, we showed that the metric in the extended, coarse-grained configuration space has the component:

JHEP11(2018)060
where we regard β(a) ≡ β a as a continuous function of a. To accelerate the relaxation of the probability distribution, we need to optimize the tempering parameter set {β a }. Since almost all of the transitions occur only in the β direction for large β a , the optimization should be made so that the transitions in the β direction has no obstacle for large β. Since it is the parameter a that is directly dealt with by a MCMC simulation, the above requirement can be re-expressed as a geometrical statement that the metric in the a direction is flat. This means the equality β (a) β(a) = const., (4.2) and thus we conclude that the optimized tempering parameter set should have an exponential form, β a = β 0 R −a (R: constant). Note that this form is expected only for large β a 's, because for small enough β a , transitions in the x direction come to occur frequently and thus the above argument no longer holds.

Numerical confirmation of the geometrical optimization
In this subsection, we numerically optimize the tempering parameter set {β a }, and show that the optimized set actually takes an exponential form for large β a . We also show that there exists a horizon for small β a , beyond which configurations can move freely in the x direction. Below is the numerical algorithm we take to optimize the tempering parameter set. We here fix β 0 (the overall coefficient of the original action), n (the number of steps), and A (the number of the additional parameters), and vary the parameters β ≡ {β a } a=1,...,A such that the distances between different modes X 1 = (x 1 , β 0 ), X 2 = (x 2 , β 0 ) ∈M × A are minimized. For given initial parameters β, we update them with the following steps: 9 1. We calculate the distance d n (X 1 , X 2 ; β) for β. 10 2. We propose a new parameter set β = {β 0 , . . . , β a−1 , β a , β a+1 , . . . , β A } by generating ln β a using the Gaussian distribution with mean ln β a and standard deviation σ β (a constant given in advance) for randomly selected a. If the obtained parameters are not ordered, we repeatedly generate another set until we get ordered ones.
4. We repeat steps 2 and 3. Since the calculated distance can be unintentionally small due to a statistical error, it can happen that we reject a set which should be accepted.
To avoid this to happen frequently, we recalculate the distance with a larger precision when proposed sets are rejected N discard times sequentially (N discard is a number given in advance). 9 To ensure transitions in the β direction to occur sufficiently, n should be set to be larger than O(A 2 ), as can be estimated by regarding the transitions as pure random walks. At the same time, however, n should not be taken too large in order to avoid a large accumulation of boundary effects. In the following calculation we will set n to n ∼ 1.5A 2 . 10 We here denote the distance by dn(X1, X2; β) to specify its dependence on the tempering parameter set β = {βa}.  . We use the Metropolis algorithm to generate a transition in the x direction, using the Gaussian distribution with standard deviation σ x (β a ) = 1.6/(2π √ 2β a ) as a proposal distribution. We calculate the distance d n=100 (X 1 , X 2 ; β) between X 1 = ((1, 0), β 0 ) and X 2 = ((0, 0), β 0 ) for given β. We first repeat steps 2 and 3 10,000 times, where at least 100 paths from X 1 to X 2 are generated to calculate the distance. Following this, we then repeat the same procedures 15,000 times, but this time at least 200 paths are generated. After we obtain the optimized values (drawn as orange points in figure 5), we fit the points in the region 1 ≤ a ≤ 4 with a linear function c 1 a + c 2 by minimizing χ 2 . We obtain c 1 = −2.06 (12), c 2 = 10.56(38) with χ 2 /(4 − 2) = 0.68. The points near the boundary a = 0 behaves differently which may be explained as boundary effects. We also see that β a (a = 5, 6, 7, 8) have almost the same value, β H ∼ 2. β H represents a "horizon", beyond which the potential barriers in the x direction disappear and configurations can move freely in the direction. 11 Thus, the value A = 8 is actually too large, and we find that five parameters β a (a = 1, . . . , 5) should be enough for this calculation.

JHEP11(2018)060
We thus arrive at the conclusion that the optimized tempering parameter is given by an exponential form, β a = β 0 R −a , except for those near boundaries. This confirms the geometrical optimization made in subsection 4.2.
We end this subsection with a comment that there are actually many metastable solutions for β and all such solutions also take exponential forms. The left panel of figure 6 is the history of d 2 n (X 1 , X 2 ; β) with respect to the updates of β. We there see several plateaus that correspond to metastable states. The right panel shows that the metastable solutions, each corresponding to a plateau, actually take exponential forms. 11 The position of the horizon, βH , depends on the algorithm, especially on the standard deviation σx of the Gaussian proposal for transitions in the x direction. This horizon differs from the horizon of a Euclidean blackhole where a single S 1 -cycle vanishes.

Conclusion and outlook
In this paper, we have investigated the global geometry of a stochastic system whose equilibrium distribution is highly multimodal with a large number of degenerate vacua that are distributed uniformly.
We implemented the simulated tempering algorithm to such a system by taking the tempering parameter to be the overall coefficient β 0 of the action. We showed that the geometry of the extended, coarse-grained configuration spaceM × A is given by an asymptotically AdS space. With this knowledge of geometry, we deduced the conclusion that the optimized tempering parameter set {β a } must take an exponential form β a = β 0 R −a for large β a . This conclusion was confirmed by numerical calculations, where it was also found that a horizon exists at small β, around which transitions in the x direction become easy.
As for the future work, it would be interesting to investigate the large scale geometry of the configuration space in the Yang-Mills theory, especially by coarse-graining the configuration space according to the topological charges [7].
It should also be important to construct a more convenient method to find the optimized parameters for a given MCMC algorithm. For the simulated tempering algorithm, for example, the optimization is to determine the functional form of β a = β(a). Thus, it should be nice if we can find a local functional of β(a) such that the optimized form is directly given by solving its Euler-Lagrange equation, and further if this local functional has some relationship with the Einstein-Hilbert action. We expect that this investigation may also give a clue to understanding the hidden stochastic character in general relativity.
In this paper, we have discussed only the case where the action S(x) is real. When the action takes complex values as in QCD at finite density or in the quantum Monte Carlo computation of the Hubbard model away from half filling, we cannot directly use the present definition of distance because the Boltzmann weight is complex-valued. It must be important to investigate whether nice distances can also be introduced to these algorithms. In particular, it must be interesting to investigate the large scale geometry of the  Table 1. d n (X, Y ) + d n (Y, Z) − d n (X, Z) for X = ((0, 0), β 0 ), Y = ((y, 0), β a ) and Z = ((3, 0), β 1 ). β a are set to β 0 = 100, β 1 50, β 2 25. All the entries are positive within statistical errors, which shows that the triangle inequality holds.
(generalized) Lefschetz thimble method [8][9][10][11][12][13][14][15] with the implementation of the tempering algorithm that takes the tempering parameter to be the flow time of the antiholomorphic gradient flow [14,15]. A study along these lines is now in progress and will be reported elsewhere. We are concerned with the case where β a , β b 1, and thus can approximate P eq (x, β a ) by the Gaussian distributions around local minima [see (3.6)]. Due to the translational invariance, we only need to consider the local minimum at the origin, for which the integrand becomes min (2πβ a ) D/2 e −2π 2 βar 2 , (2πβ b ) D/2 e −2π 2 β b r 2 (r 2 ≡ x 2 ).

(B.2)
Let us first consider the case β a < β b . The integral of (B.2) then can be evaluated as (see figure 7) I and depend on β a , β b only through the ratio β b /β a . The integral I for the case β a > β b can also be evaluated in the same way. We thus obtain I 1 − 1 Γ(D/2) γ(D/2,r 2 c ) − γ(D/2,s 2 c ) ≡ 1 − ∆(β b /β a ), (B.7) which is actually an expression valid for the both cases, β a ≶ β b .
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.