Skip to main content
Log in

A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms

  • Published:
Applied Mathematics & Optimization Submit manuscript

Abstract

Parallel tempering, or replica exchange, is a popular method for simulating complex systems. The idea is to run parallel simulations at different temperatures, and at a given swap rate exchange configurations between the parallel simulations. From the perspective of large deviations it is optimal to let the swap rate tend to infinity and it is possible to construct a corresponding simulation scheme, known as infinite swapping. In this paper we propose a novel use of large deviations for empirical measures for a more detailed analysis of the infinite swapping limit in the setting of continuous time jump Markov processes. Using the large deviations rate function and associated stochastic control problems we consider a diagnostic based on temperature assignments, which can be easily computed during a simulation. We show that the convergence of this diagnostic to its a priori known limit is a necessary condition for the convergence of infinite swapping. The rate function is also used to investigate the impact of asymmetries in the underlying potential landscape, and where in the state space poor sampling is most likely to occur.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Boué, M., Dupuis, P.: A variational representation for certain functionals of Brownian motion. Ann. Prob. 26, 1641–1659 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  2. Budhiraja, A., Dupuis, P., Maroulas, V.: Variational representations for continuous time processes. Ann. l’Inst. H. Poincaré 47, 725–747 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Doll, J., Dupuis, P.: On performance measures for infinite swapping Monte Carlo methods. J. Chem. Phys 142, 024111 (2015)

    Article  Google Scholar 

  4. Doll, J., Plattner, N., Freeman, D.L., Liu, Y., Dupuis, P.: Rare-event sampling: occupation-based performance measures for parallel tempering and infinite swapping Monte Carlo methods. J. Chem. Phys 137, 204112 (2012)

    Article  Google Scholar 

  5. Dupuis, P., Ellis, R.S.: The large deviation principle for a general class of queueing systems. I. Trans. Am. Math. Soc. 347, 2689–2751 (1996)

    MathSciNet  MATH  Google Scholar 

  6. Dupuis, P., Ellis, R.S.: A Weak Convergence Approach to the Theory of Large Deviations. Wiley, New York (1997)

    Book  MATH  Google Scholar 

  7. Dupuis, P., Liu, Y.: On the large deviation rate for the empirical measure of a reversible pure jump markov processes. Ann. Probab. 43, 1121–1156 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dupuis, P., Liu, Y., Plattner, N., Doll, J.D.: On the infinite swapping limit for parallel tempering. SIAM J. Multiscale Model. Simul. 10, 986–1022 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  9. Earl, D.J., Deem, M.W.: Parallel tempering: theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 7, 3910–3916 (2005)

    Article  Google Scholar 

  10. Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence. Wiley, New York (1986)

    Book  MATH  Google Scholar 

  11. Fleming, W.H.: Exit probabilities and optimal stochastic control. Appl. Math. Optim. 4, 329–346 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fleming, W.H., Soner, H.M.: Asymptotic expansions for Markov processes with Levy generators. Appl. Math. Optim. 19, 203–223 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  13. Geyer, C.J.: Markov chain Monte Carlo maximum likelihood. In: Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pp. 156–163. American Statistical Association, New York (1991)

  14. Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, New York (2001). Revised Second Edition

    Book  MATH  Google Scholar 

  15. David, G.: Luenberger, Optimization by Vector Space Methods, 1st edn. Wiles, New York (1969)

    Google Scholar 

  16. Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes, and Martingales, Vol. 2. Cambridge University Press, Cambridge (2000) (Itô calculus, Reprint of the second (1994) edition)

  17. Stroock, D.W.: An Introduction to Markov Processes. Graduate Texts in Mathematics, vol. 230. Springer, Berlin (2005)

    MATH  Google Scholar 

  18. Sugita, Y., Okamoto, Y.: The incomplete beta function law for parallel tempering sampling of classical canonical systems. Chem. Phys. Lett. 314, 141–151 (1999)

    Article  Google Scholar 

  19. Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin glasses. Phys. Rev. Lett. 57, 2607–2609 (1986)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

J. Doll: Research supported in part by the National Science Foundation (DMS-1317199), and the Defense Advanced Research Projects Agency (W911NF-15-2-0122). P. Dupuis: Research supported in part by the Department of Energy (DE-SC0010539), the National Science Foundation (DMS-1317199), and the Defense Advanced Research Projects Agency (W911NF-15-2-0122). P. Nyquist: Research supported in part by National Science Foundation (DMS-1317199).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Dupuis.

Ancillary Results

Ancillary Results

Lemma 7.1

For any sequence \(a_{1},a_{2},\dots ,a_{K}\) such that \(a_{i}\ge 0\) for all i and \(K\in [0,\infty )\),

$$\begin{aligned} \sum _{i=1}^{K}a_{i}=K \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^{K}a_{i}^{1/2}>K \end{aligned}$$

cannot both be true.

Proof

We can assume without loss that \(K>0\). Let \(b_{i}=a_{i}/K\), so that \(\left\{ b_{i},i=1,\ldots ,K\right\} \) is a probability. By Hölder’s inequality

$$\begin{aligned} \sum _{i=1}^{K}b_{i}^{1/2}\le K^{1/2}\left( \sum _{i=1}^{K}b_{i}\right) ^{1/2}=K^{1/2}. \end{aligned}$$

Using \(b_{i}=a_{i}/K\) gives \(\sum _{i=1}^{K}a_{i}^{1/2}\le K\), which completes the argument. \(\square \)

Lemma 7.2

Consider the ergodic control problem or equivalent minimization problem of Lemma 5.4, with \(h(\mathbf {x})=\lambda \rho (\mathbf {x})\). For two temperatures the optimal cost \(\gamma ^{*}\) satisfies

$$\begin{aligned} \gamma ^{*}<\frac{\lambda }{2}. \end{aligned}$$

In the general case with K temperatures, \(\gamma ^{*}<\lambda /K!\)

Proof

To simplify the notation we consider the case \(\lambda =1\). From Lemma 5.4 we know that there is a minimizing measure \(\nu ^{*}\) in (5.1) and that the optimal cost \(\gamma ^{*}\) satisfies

$$\begin{aligned} \gamma ^{*}=J(\nu ^{*})+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\nu ^{*}(\mathbf {x}). \end{aligned}$$

Moreover, if \(W^{*}\) is defined as

$$\begin{aligned} W^{*}(\mathbf {x})=-\log \left[ \frac{d\nu ^{*}}{d\bar{\mu }}\right] ^{1/2}(\mathbf {x}),\ \ \mathbf {x}\in \mathcal {S}^{2}, \end{aligned}$$

then \((\gamma ^{*},W^{*})\) is a solution to the Bellman equation (4.9).

Suppose that \(W^{*}\) is a constant. Inserting this into the Bellman equation yields, for each \(\mathbf {x}\in \mathcal {S}^{2}\),

$$\begin{aligned} 0=-\gamma ^{*}+\rho (\mathbf {x}), \end{aligned}$$

which cannot hold since \(\rho \) is not a constant. Thus, \(W^{*}\) cannot be a constant function. This in turn implies that the likelihood ratio \([d\nu ^{*}/d\bar{\mu }]\) is not constant equal to 1 (the only possible constant value). Thus \(\nu ^{*}\) is not \(\bar{\mu }\), the invariant measure for the original symmetrized dynamics.

Inserting \(\bar{\mu }\) into the objective function in (5.1) gives

$$\begin{aligned} J(\bar{\mu })+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\bar{\mu }(\mathbf {x})=\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\bar{\mu }(\mathbf {x})=\frac{1}{2}, \end{aligned}$$

where the second equality comes from \(\rho (\mathbf {x})+\rho (\mathbf {x}^{R})=1=1/2+1/2\) and the symmetry of \(\bar{\mu }\). Thus the cost associated with the uncontrolled dynamics is 1 / 2. Since \(\nu ^{*}\) is the unique minimizer in (5.1) and \(\nu ^{*}\ne \bar{\mu }\), it holds that

$$\begin{aligned} \gamma ^{*}=J(\nu ^{*})+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\nu ^{*}(\mathbf {x})<J(\bar{\mu })+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\bar{\mu }(\mathbf {x})=\frac{1}{2}. \end{aligned}$$

The argument for \(K>2\) temperatures is completely analogous. \(\square \)

Lemma 7.3

Assume S is a finite set and that \(\Gamma _{x,y}\), \(x,y\in S\) is the intensity matrix of an ergodic Markov chain on S with invariant probability distribution \(\bar{\mu }\). Let \(q(x)=\sum _{y\in S}\Gamma _{x,y}\), and for \(\nu \in \mathcal {P}(S)\) with \(\theta (x)=\nu (x)/\bar{\mu }(x)\) let

$$\begin{aligned} J(\nu )=\sum _{x\in S}q(x)\theta (x)\bar{\mu }(x)-\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2}(y)\Gamma _{x,y}\bar{\mu }(x). \end{aligned}$$

Then \(J(\nu )\) is strictly convex on the relative interior of \(\mathcal {P}(S)\).

Proof

It is enough to show the strict convexity of

$$\begin{aligned} \theta (\cdot )\rightarrow -\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2} (y)\Gamma _{x,y}\bar{\mu }(x) \end{aligned}$$

for \(\theta (x)\ge 0\), \(\sum _{x\in S}\theta (x)\bar{\mu }(x)=1\). Let \(\{x_{1},x_{2},\ldots ,x_{K}\}\) be an enumeration of the distinct elements of S, \(\theta _{i}=\theta (x_{i}),\bar{\mu }_{i}=\bar{\mu }(x_{i})\) and \(f_{i,j}(\theta )=-\theta _{i}^{1/2}\theta _{j}^{1/2}\). If \(M_{i,j}(\theta )\) denotes the matrix of second order partial derivatives of \(f_{i,j}(\theta )\) at \(\theta \), then straightforward calculation shows that the eigenvalue 0 is repeated \(K-1\) times, and \((\theta _{i}/\theta _{j}+\theta _{j}/\theta _{i})\) is also an eigenvalue with eigenvector \(\theta _{j}e_{i}-\theta _{i}e_{j}\). Hence the null space of this matrix is the collection of vectors orthogonal to \(\theta _{j}e_{i}-\theta _{i}e_{j}\). Since \((\theta _{i}/\theta _{j}+\theta _{j}/\theta _{i})>0\), \(f_{i,j}(\theta )\) is strictly convex (as a function in \(\mathbb {R}^{K}\)) at \(\theta \) except in those directions orthogonal to \(\theta _{j}e_{i}-\theta _{i}e_{j}\).

Since \(\Gamma _{x,y}\) is ergodic all states communicate, and so there exists a sequence \(1=i_{1},i_{2},\ldots ,i_{K},i_{K+1}=1\) such that \(\Gamma _{x_{i_{k}},x_{i_{k+1}}}>0\) for \(k=1,\ldots ,K\). Thus \(-\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2}(y)\Gamma _{x,y}\bar{\mu }(x)\) is strictly convex except in those directions that are orthogonal to each of \(\theta _{i_{k+1}}e_{i_{k}}-\theta _{i_{k}}e_{i_{k+1}}\), which is exactly the set of directions spanned by \((\theta _{1},\theta _{2},\ldots ,\theta _{K})\). Since this direction cannot be parallel to \(\{\theta :\sum _{k=1}^{K}\theta _{k}\bar{\mu }_{k}=1\}\), \(J(\nu )\) is strictly convex on this set. \(\square \)

Remark 6.1

The proofs in Sect. 5 were largely confined to the setting of two temperatures \(\tau _{1},\tau _{2}\). This was to keep the notation simple and the results generalize to any number \(K\ge 2\) of temperatures. The only result which appears to substantially use that two temperatures are considered is Lemma 5.3, and specifically the argument by contradiction. Here we outline how the proof would proceed for the general setting.

In the setting of K temperatures the assumption (5.7) becomes

$$\begin{aligned} \sum _{\sigma \in \Sigma }e^{-2W(\mathbf {x}^{\sigma })}-K!=0,\ \forall \mathbf {x} \in \mathcal {S}^{K}. \end{aligned}$$
(7.1)

We still have that \(W(\mathbf {x})=0\) for \(\mathbf {x}\in \mathcal {D}\).

Let \(\mathcal {D}\) denote the set of diagonal states: \(\mathcal {D}=\{\mathbf {x}:\mathbf {x}=\mathbf {x}^{\sigma }, \ \forall \sigma \in \Sigma _{K}\}\). The only such states are those for which all components are equal. The cost structure is such that \(h(\mathbf {x})=1/K!\) for \(\mathbf {x}\in \mathcal {D}\).

Consider the states that communicate directly with \(\mathcal {D}\), i.e., those only one step away from a diagonal state. Since the underlying processes only jump one at a time there can only be a difference in one component, the others remaining fixed. There are a total of K! possible permutations in \(\Sigma _{K}\), \((K-1)!\) of which keep a specific component fixed. Thus, for a state that is one step removed from the diagonal there are \((K-1)!\) permutations that result in the same state. Moreover, the diagonal state in question will communicate directly with the remaining K permutations as well.

The states one step away from a specific diagonal point can be viewed as forming disjoint sets of states according to the previous description. For each state \(\mathbf {y}\) one step removed from an \(\mathbf {x}\), there are K distinct states \(\mathbf {y}_{1},\dots ,\mathbf {y}_{K}\) that are permutations of \(\mathbf {y}\) and communicate directly with \(\mathbf {x}\). For each such collection of states we can pick one to represent the collection (does not matter which one we pick). Let \(\mathcal {A}_{x}\) denote the collection of such representative states \(\mathbf {y}\). In the case of two temperatures this can be phrased as only looking at states above the diagonal.

The Bellman equation for a diagonal state \(\mathbf {x}\) takes the form

$$\begin{aligned} 0=\sum _{\mathbf {y}\in \mathcal {A}_{x}}\sum _{\sigma :\mathbf {y}^{\sigma } \ne \mathbf {y}}r(\mathbf {x},\mathbf {y})\left[ 1-e^{-W(\mathbf {y}^{\sigma })+W(\mathbf {x})}\right] -\gamma +\frac{1}{K!}. \end{aligned}$$

The rates \(r(\mathbf {x},\mathbf {y}^{\sigma })\) are all equal due to symmetry. Combined with \(W(\mathbf {x})=0\) for \(\mathbf {x}\in \mathcal {D}\), this allows the Bellman equation to be expressed as

$$\begin{aligned} 0=\sum _{\mathbf {y}\in \mathcal {A}_{x}}r(\mathbf {x},\mathbf {y})\left[ K-\sum _{\sigma :\mathbf {y}^{\sigma }\ne \mathbf {y}}e^{-W(\mathbf {y}^{\sigma })}\right] -\gamma +\frac{1}{K!}. \end{aligned}$$

Since \(\gamma <(1/K!)\) and the rates are all non-negative it must be the case that for at least one \(\mathbf {y}\in \mathcal {A}_{x}\)

$$\begin{aligned} \sum _{\sigma :\mathbf {y}^{\sigma }\ne \mathbf {y}}e^{-W(\mathbf {y}^{\sigma })}-K>0. \end{aligned}$$

For states one step from the diagonal, since \((K-1)!\) permutations will result in the same state, the condition (7.1) takes the form

$$\begin{aligned} (K-1)!\sum _{\sigma :\mathbf {x}^{\sigma }\ne \mathbf {x}}e^{-W(\mathbf {x}^{\sigma })}-K!=0\Leftrightarrow \sum _{\sigma :\mathbf {x}^{\sigma }\ne \mathbf {x} }e^{-W(\mathbf {x}^{\sigma })}-K=0. \end{aligned}$$

That is we need only be concerned with the permutations that switch the location of the component that differs from the diagonal state (and the \(\sigma \) that corresponds to the identify map in \(\Sigma _{K}\)). There will then be \((K-1)!\) permutations that produces the exact same state, yielding the factor \((K-1)!\) in front of the sum.

For the reduced form of (7.1) and the Bellman equation to hold, we must have that

$$\begin{aligned} \sum _{\sigma :\mathbf {x}^{\sigma }\ne \mathbf {x}}e^{-W(\mathbf {x}^{\sigma })}-K=0, \end{aligned}$$

for all \(\mathbf {y}\) that communicate with \(\mathbf {x}\), and for at least one such \(\mathbf {y}\),

$$\begin{aligned} \sum _{\sigma :\mathbf {y}^{\sigma }\ne \mathbf {y}}e^{-W(\mathbf {y}^{\sigma })}-K>0. \end{aligned}$$

This is precisely the setting of Lemma 1 with the \(a_{i}\)s represented by \(e^{-2W(\mathbf {y}^{\sigma })}\) for the K relevant permutations \(\sigma \). The lemma then implies that (7.1) is inconsistent with the Bellman equation and therefore cannot hold. This contradicts that \((M\bar{\nu })=\mu \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Doll, J., Dupuis, P. & Nyquist, P. A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms. Appl Math Optim 78, 103–144 (2018). https://doi.org/10.1007/s00245-017-9401-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00245-017-9401-9

Keywords

Navigation