A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms

Doll, J.; Dupuis, P.; Nyquist, P.

doi:10.1007/s00245-017-9401-9

A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms

Published: 08 February 2017

Volume 78, pages 103–144, (2018)
Cite this article

Applied Mathematics & Optimization Submit manuscript

J. Doll¹,
P. Dupuis² &
P. Nyquist²

290 Accesses
8 Citations
Explore all metrics

Abstract

Parallel tempering, or replica exchange, is a popular method for simulating complex systems. The idea is to run parallel simulations at different temperatures, and at a given swap rate exchange configurations between the parallel simulations. From the perspective of large deviations it is optimal to let the swap rate tend to infinity and it is possible to construct a corresponding simulation scheme, known as infinite swapping. In this paper we propose a novel use of large deviations for empirical measures for a more detailed analysis of the infinite swapping limit in the setting of continuous time jump Markov processes. Using the large deviations rate function and associated stochastic control problems we consider a diagnostic based on temperature assignments, which can be easily computed during a simulation. We show that the convergence of this diagnostic to its a priori known limit is a necessary condition for the convergence of infinite swapping. The rate function is also used to investigate the impact of asymmetries in the underlying potential landscape, and where in the state space poor sampling is most likely to occur.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methodological and Computational Aspects of Parallel Tempering Methods in the Infinite Swapping Limit

Article 01 January 2019

Simulated Tempering and Swapping on Mean-Field Models

Article 16 June 2016

Large Deviations in Monte Carlo Methods

References

Boué, M., Dupuis, P.: A variational representation for certain functionals of Brownian motion. Ann. Prob. 26, 1641–1659 (1998)
Article MathSciNet MATH Google Scholar
Budhiraja, A., Dupuis, P., Maroulas, V.: Variational representations for continuous time processes. Ann. l’Inst. H. Poincaré 47, 725–747 (2011)
Article MathSciNet MATH Google Scholar
Doll, J., Dupuis, P.: On performance measures for infinite swapping Monte Carlo methods. J. Chem. Phys 142, 024111 (2015)
Article Google Scholar
Doll, J., Plattner, N., Freeman, D.L., Liu, Y., Dupuis, P.: Rare-event sampling: occupation-based performance measures for parallel tempering and infinite swapping Monte Carlo methods. J. Chem. Phys 137, 204112 (2012)
Article Google Scholar
Dupuis, P., Ellis, R.S.: The large deviation principle for a general class of queueing systems. I. Trans. Am. Math. Soc. 347, 2689–2751 (1996)
MathSciNet MATH Google Scholar
Dupuis, P., Ellis, R.S.: A Weak Convergence Approach to the Theory of Large Deviations. Wiley, New York (1997)
Book MATH Google Scholar
Dupuis, P., Liu, Y.: On the large deviation rate for the empirical measure of a reversible pure jump markov processes. Ann. Probab. 43, 1121–1156 (2015)
Article MathSciNet MATH Google Scholar
Dupuis, P., Liu, Y., Plattner, N., Doll, J.D.: On the infinite swapping limit for parallel tempering. SIAM J. Multiscale Model. Simul. 10, 986–1022 (2012)
Article MathSciNet MATH Google Scholar
Earl, D.J., Deem, M.W.: Parallel tempering: theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 7, 3910–3916 (2005)
Article Google Scholar
Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence. Wiley, New York (1986)
Book MATH Google Scholar
Fleming, W.H.: Exit probabilities and optimal stochastic control. Appl. Math. Optim. 4, 329–346 (1978)
Article MathSciNet MATH Google Scholar
Fleming, W.H., Soner, H.M.: Asymptotic expansions for Markov processes with Levy generators. Appl. Math. Optim. 19, 203–223 (1989)
Article MathSciNet MATH Google Scholar
Geyer, C.J.: Markov chain Monte Carlo maximum likelihood. In: Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pp. 156–163. American Statistical Association, New York (1991)
Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, New York (2001). Revised Second Edition
Book MATH Google Scholar
David, G.: Luenberger, Optimization by Vector Space Methods, 1st edn. Wiles, New York (1969)
Google Scholar
Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes, and Martingales, Vol. 2. Cambridge University Press, Cambridge (2000) (Itô calculus, Reprint of the second (1994) edition)
Stroock, D.W.: An Introduction to Markov Processes. Graduate Texts in Mathematics, vol. 230. Springer, Berlin (2005)
MATH Google Scholar
Sugita, Y., Okamoto, Y.: The incomplete beta function law for parallel tempering sampling of classical canonical systems. Chem. Phys. Lett. 314, 141–151 (1999)
Article Google Scholar
Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin glasses. Phys. Rev. Lett. 57, 2607–2609 (1986)
Article MathSciNet Google Scholar

Download references

Acknowledgements

J. Doll: Research supported in part by the National Science Foundation (DMS-1317199), and the Defense Advanced Research Projects Agency (W911NF-15-2-0122). P. Dupuis: Research supported in part by the Department of Energy (DE-SC0010539), the National Science Foundation (DMS-1317199), and the Defense Advanced Research Projects Agency (W911NF-15-2-0122). P. Nyquist: Research supported in part by National Science Foundation (DMS-1317199).

Author information

Authors and Affiliations

Department of Chemistry, Brown University, Providence, RI, 02912, USA
J. Doll
Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
P. Dupuis & P. Nyquist

Authors

J. Doll
View author publications
You can also search for this author in PubMed Google Scholar
P. Dupuis
View author publications
You can also search for this author in PubMed Google Scholar
P. Nyquist
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Dupuis.

Ancillary Results

Lemma 7.1

For any sequence $a_{1},a_{2},\dots ,a_{K}$ such that $a_{i}\ge 0$ for all i and $K\in [0,\infty )$,

$$\begin{aligned} \sum _{i=1}^{K}a_{i}=K \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^{K}a_{i}^{1/2}>K \end{aligned}$$

cannot both be true.

Proof

We can assume without loss that $K>0$. Let $b_{i}=a_{i}/K$, so that $\left\{ b_{i},i=1,\ldots ,K\right\} $ is a probability. By Hölder’s inequality

$$\begin{aligned} \sum _{i=1}^{K}b_{i}^{1/2}\le K^{1/2}\left( \sum _{i=1}^{K}b_{i}\right) ^{1/2}=K^{1/2}. \end{aligned}$$

Using $b_{i}=a_{i}/K$ gives $\sum _{i=1}^{K}a_{i}^{1/2}\le K$, which completes the argument. $\square $

Lemma 7.2

Consider the ergodic control problem or equivalent minimization problem of Lemma 5.4, with $h(\mathbf {x})=\lambda \rho (\mathbf {x})$. For two temperatures the optimal cost $\gamma ^{*}$ satisfies

$$\begin{aligned} \gamma ^{*}<\frac{\lambda }{2}. \end{aligned}$$

In the general case with K temperatures, $\gamma ^{*}<\lambda /K!$

Proof

To simplify the notation we consider the case $\lambda =1$. From Lemma 5.4 we know that there is a minimizing measure $\nu ^{*}$ in (5.1) and that the optimal cost $\gamma ^{*}$ satisfies

$$\begin{aligned} \gamma ^{*}=J(\nu ^{*})+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\nu ^{*}(\mathbf {x}). \end{aligned}$$

Moreover, if $W^{*}$ is defined as

$$\begin{aligned} W^{*}(\mathbf {x})=-\log \left[ \frac{d\nu ^{*}}{d\bar{\mu }}\right] ^{1/2}(\mathbf {x}),\ \ \mathbf {x}\in \mathcal {S}^{2}, \end{aligned}$$

then $(\gamma ^{*},W^{*})$ is a solution to the Bellman equation (4.9).

Suppose that $W^{*}$ is a constant. Inserting this into the Bellman equation yields, for each $\mathbf {x}\in \mathcal {S}^{2}$,

$$\begin{aligned} 0=-\gamma ^{*}+\rho (\mathbf {x}), \end{aligned}$$

which cannot hold since $\rho $ is not a constant. Thus, $W^{*}$ cannot be a constant function. This in turn implies that the likelihood ratio $[d\nu ^{*}/d\bar{\mu }]$ is not constant equal to 1 (the only possible constant value). Thus $\nu ^{*}$ is not $\bar{\mu }$, the invariant measure for the original symmetrized dynamics.

Inserting $\bar{\mu }$ into the objective function in (5.1) gives

$$\begin{aligned} J(\bar{\mu })+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\bar{\mu }(\mathbf {x})=\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\bar{\mu }(\mathbf {x})=\frac{1}{2}, \end{aligned}$$

where the second equality comes from $\rho (\mathbf {x})+\rho (\mathbf {x}^{R})=1=1/2+1/2$ and the symmetry of $\bar{\mu }$. Thus the cost associated with the uncontrolled dynamics is 1 / 2. Since $\nu ^{*}$ is the unique minimizer in (5.1) and $\nu ^{*}\ne \bar{\mu }$, it holds that

$$\begin{aligned} \gamma ^{*}=J(\nu ^{*})+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\nu ^{*}(\mathbf {x})<J(\bar{\mu })+\sum _{\mathbf {x}\in \mathcal {S}^{2}}\rho (\mathbf {x})\bar{\mu }(\mathbf {x})=\frac{1}{2}. \end{aligned}$$

The argument for $K>2$ temperatures is completely analogous. $\square $

Lemma 7.3

Assume S is a finite set and that $\Gamma _{x,y}$, $x,y\in S$ is the intensity matrix of an ergodic Markov chain on S with invariant probability distribution $\bar{\mu }$. Let $q(x)=\sum _{y\in S}\Gamma _{x,y}$, and for $\nu \in \mathcal {P}(S)$ with $\theta (x)=\nu (x)/\bar{\mu }(x)$ let

$$\begin{aligned} J(\nu )=\sum _{x\in S}q(x)\theta (x)\bar{\mu }(x)-\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2}(y)\Gamma _{x,y}\bar{\mu }(x). \end{aligned}$$

Then $J(\nu )$ is strictly convex on the relative interior of $\mathcal {P}(S)$.

Proof

It is enough to show the strict convexity of

$$\begin{aligned} \theta (\cdot )\rightarrow -\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2} (y)\Gamma _{x,y}\bar{\mu }(x) \end{aligned}$$

for $\theta (x)\ge 0$, $\sum _{x\in S}\theta (x)\bar{\mu }(x)=1$. Let $\{x_{1},x_{2},\ldots ,x_{K}\}$ be an enumeration of the distinct elements of S, $\theta _{i}=\theta (x_{i}),\bar{\mu }_{i}=\bar{\mu }(x_{i})$ and $f_{i,j}(\theta )=-\theta _{i}^{1/2}\theta _{j}^{1/2}$. If $M_{i,j}(\theta )$ denotes the matrix of second order partial derivatives of $f_{i,j}(\theta )$ at $\theta $, then straightforward calculation shows that the eigenvalue 0 is repeated $K-1$ times, and $(\theta _{i}/\theta _{j}+\theta _{j}/\theta _{i})$ is also an eigenvalue with eigenvector $\theta _{j}e_{i}-\theta _{i}e_{j}$. Hence the null space of this matrix is the collection of vectors orthogonal to $\theta _{j}e_{i}-\theta _{i}e_{j}$. Since $(\theta _{i}/\theta _{j}+\theta _{j}/\theta _{i})>0$, $f_{i,j}(\theta )$ is strictly convex (as a function in $\mathbb {R}^{K}$) at $\theta $ except in those directions orthogonal to $\theta _{j}e_{i}-\theta _{i}e_{j}$.

Since $\Gamma _{x,y}$ is ergodic all states communicate, and so there exists a sequence $1=i_{1},i_{2},\ldots ,i_{K},i_{K+1}=1$ such that $\Gamma _{x_{i_{k}},x_{i_{k+1}}}>0$ for $k=1,\ldots ,K$. Thus $-\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2}(y)\Gamma _{x,y}\bar{\mu }(x)$ is strictly convex except in those directions that are orthogonal to each of $\theta _{i_{k+1}}e_{i_{k}}-\theta _{i_{k}}e_{i_{k+1}}$, which is exactly the set of directions spanned by $(\theta _{1},\theta _{2},\ldots ,\theta _{K})$. Since this direction cannot be parallel to $\{\theta :\sum _{k=1}^{K}\theta _{k}\bar{\mu }_{k}=1\}$, $J(\nu )$ is strictly convex on this set. $\square $

Remark 6.1

The proofs in Sect. 5 were largely confined to the setting of two temperatures $\tau _{1},\tau _{2}$. This was to keep the notation simple and the results generalize to any number $K\ge 2$ of temperatures. The only result which appears to substantially use that two temperatures are considered is Lemma 5.3, and specifically the argument by contradiction. Here we outline how the proof would proceed for the general setting.

In the setting of K temperatures the assumption (5.7) becomes

$$\begin{aligned} \sum _{\sigma \in \Sigma }e^{-2W(\mathbf {x}^{\sigma })}-K!=0,\ \forall \mathbf {x} \in \mathcal {S}^{K}. \end{aligned}$$

(7.1)

We still have that $W(\mathbf {x})=0$ for $\mathbf {x}\in \mathcal {D}$.

Let $\mathcal {D}$ denote the set of diagonal states: $\mathcal {D}=\{\mathbf {x}:\mathbf {x}=\mathbf {x}^{\sigma }, \ \forall \sigma \in \Sigma _{K}\}$. The only such states are those for which all components are equal. The cost structure is such that $h(\mathbf {x})=1/K!$ for $\mathbf {x}\in \mathcal {D}$.

Consider the states that communicate directly with $\mathcal {D}$, i.e., those only one step away from a diagonal state. Since the underlying processes only jump one at a time there can only be a difference in one component, the others remaining fixed. There are a total of K! possible permutations in $\Sigma _{K}$, $(K-1)!$ of which keep a specific component fixed. Thus, for a state that is one step removed from the diagonal there are $(K-1)!$ permutations that result in the same state. Moreover, the diagonal state in question will communicate directly with the remaining K permutations as well.

The states one step away from a specific diagonal point can be viewed as forming disjoint sets of states according to the previous description. For each state $\mathbf {y}$ one step removed from an $\mathbf {x}$, there are K distinct states $\mathbf {y}_{1},\dots ,\mathbf {y}_{K}$ that are permutations of $\mathbf {y}$ and communicate directly with $\mathbf {x}$. For each such collection of states we can pick one to represent the collection (does not matter which one we pick). Let $\mathcal {A}_{x}$ denote the collection of such representative states $\mathbf {y}$. In the case of two temperatures this can be phrased as only looking at states above the diagonal.

The Bellman equation for a diagonal state $\mathbf {x}$ takes the form

$$\begin{aligned} 0=\sum _{\mathbf {y}\in \mathcal {A}_{x}}\sum _{\sigma :\mathbf {y}^{\sigma } \ne \mathbf {y}}r(\mathbf {x},\mathbf {y})\left[ 1-e^{-W(\mathbf {y}^{\sigma })+W(\mathbf {x})}\right] -\gamma +\frac{1}{K!}. \end{aligned}$$

The rates $r(\mathbf {x},\mathbf {y}^{\sigma })$ are all equal due to symmetry. Combined with $W(\mathbf {x})=0$ for $\mathbf {x}\in \mathcal {D}$, this allows the Bellman equation to be expressed as

$$\begin{aligned} 0=\sum _{\mathbf {y}\in \mathcal {A}_{x}}r(\mathbf {x},\mathbf {y})\left[ K-\sum _{\sigma :\mathbf {y}^{\sigma }\ne \mathbf {y}}e^{-W(\mathbf {y}^{\sigma })}\right] -\gamma +\frac{1}{K!}. \end{aligned}$$

Since $\gamma <(1/K!)$ and the rates are all non-negative it must be the case that for at least one $\mathbf {y}\in \mathcal {A}_{x}$

$$\begin{aligned} \sum _{\sigma :\mathbf {y}^{\sigma }\ne \mathbf {y}}e^{-W(\mathbf {y}^{\sigma })}-K>0. \end{aligned}$$

For states one step from the diagonal, since $(K-1)!$ permutations will result in the same state, the condition (7.1) takes the form

$$\begin{aligned} (K-1)!\sum _{\sigma :\mathbf {x}^{\sigma }\ne \mathbf {x}}e^{-W(\mathbf {x}^{\sigma })}-K!=0\Leftrightarrow \sum _{\sigma :\mathbf {x}^{\sigma }\ne \mathbf {x} }e^{-W(\mathbf {x}^{\sigma })}-K=0. \end{aligned}$$

That is we need only be concerned with the permutations that switch the location of the component that differs from the diagonal state (and the $\sigma $ that corresponds to the identify map in $\Sigma _{K}$). There will then be $(K-1)!$ permutations that produces the exact same state, yielding the factor $(K-1)!$ in front of the sum.

For the reduced form of (7.1) and the Bellman equation to hold, we must have that

$$\begin{aligned} \sum _{\sigma :\mathbf {x}^{\sigma }\ne \mathbf {x}}e^{-W(\mathbf {x}^{\sigma })}-K=0, \end{aligned}$$

for all $\mathbf {y}$ that communicate with $\mathbf {x}$, and for at least one such $\mathbf {y}$,

$$\begin{aligned} \sum _{\sigma :\mathbf {y}^{\sigma }\ne \mathbf {y}}e^{-W(\mathbf {y}^{\sigma })}-K>0. \end{aligned}$$

This is precisely the setting of Lemma 1 with the $a_{i}$s represented by $e^{-2W(\mathbf {y}^{\sigma })}$ for the K relevant permutations $\sigma $. The lemma then implies that (7.1) is inconsistent with the Bellman equation and therefore cannot hold. This contradicts that $(M\bar{\nu })=\mu $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doll, J., Dupuis, P. & Nyquist, P. A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms. Appl Math Optim 78, 103–144 (2018). https://doi.org/10.1007/s00245-017-9401-9

Download citation

Published: 08 February 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00245-017-9401-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms

Abstract

Access this article

Similar content being viewed by others

Methodological and Computational Aspects of Parallel Tempering Methods in the Infinite Swapping Limit

Simulated Tempering and Swapping on Mean-Field Models

Large Deviations in Monte Carlo Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ancillary Results

Lemma 7.1

Proof

Lemma 7.2

Proof

Lemma 7.3

Proof

Remark 6.1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms

Abstract

Access this article

Similar content being viewed by others

Methodological and Computational Aspects of Parallel Tempering Methods in the Infinite Swapping Limit

Simulated Tempering and Swapping on Mean-Field Models

Large Deviations in Monte Carlo Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ancillary Results

Ancillary Results

Lemma 7.1

Proof

Lemma 7.2

Proof

Lemma 7.3

Proof

Remark 6.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation