Abstract
Parallel tempering, or replica exchange, is a popular method for simulating complex systems. The idea is to run parallel simulations at different temperatures, and at a given swap rate exchange configurations between the parallel simulations. From the perspective of large deviations it is optimal to let the swap rate tend to infinity and it is possible to construct a corresponding simulation scheme, known as infinite swapping. In this paper we propose a novel use of large deviations for empirical measures for a more detailed analysis of the infinite swapping limit in the setting of continuous time jump Markov processes. Using the large deviations rate function and associated stochastic control problems we consider a diagnostic based on temperature assignments, which can be easily computed during a simulation. We show that the convergence of this diagnostic to its a priori known limit is a necessary condition for the convergence of infinite swapping. The rate function is also used to investigate the impact of asymmetries in the underlying potential landscape, and where in the state space poor sampling is most likely to occur.
Similar content being viewed by others
References
Boué, M., Dupuis, P.: A variational representation for certain functionals of Brownian motion. Ann. Prob. 26, 1641–1659 (1998)
Budhiraja, A., Dupuis, P., Maroulas, V.: Variational representations for continuous time processes. Ann. l’Inst. H. Poincaré 47, 725–747 (2011)
Doll, J., Dupuis, P.: On performance measures for infinite swapping Monte Carlo methods. J. Chem. Phys 142, 024111 (2015)
Doll, J., Plattner, N., Freeman, D.L., Liu, Y., Dupuis, P.: Rare-event sampling: occupation-based performance measures for parallel tempering and infinite swapping Monte Carlo methods. J. Chem. Phys 137, 204112 (2012)
Dupuis, P., Ellis, R.S.: The large deviation principle for a general class of queueing systems. I. Trans. Am. Math. Soc. 347, 2689–2751 (1996)
Dupuis, P., Ellis, R.S.: A Weak Convergence Approach to the Theory of Large Deviations. Wiley, New York (1997)
Dupuis, P., Liu, Y.: On the large deviation rate for the empirical measure of a reversible pure jump markov processes. Ann. Probab. 43, 1121–1156 (2015)
Dupuis, P., Liu, Y., Plattner, N., Doll, J.D.: On the infinite swapping limit for parallel tempering. SIAM J. Multiscale Model. Simul. 10, 986–1022 (2012)
Earl, D.J., Deem, M.W.: Parallel tempering: theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 7, 3910–3916 (2005)
Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence. Wiley, New York (1986)
Fleming, W.H.: Exit probabilities and optimal stochastic control. Appl. Math. Optim. 4, 329–346 (1978)
Fleming, W.H., Soner, H.M.: Asymptotic expansions for Markov processes with Levy generators. Appl. Math. Optim. 19, 203–223 (1989)
Geyer, C.J.: Markov chain Monte Carlo maximum likelihood. In: Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pp. 156–163. American Statistical Association, New York (1991)
Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, New York (2001). Revised Second Edition
David, G.: Luenberger, Optimization by Vector Space Methods, 1st edn. Wiles, New York (1969)
Rogers, L.C.G., Williams, D.: Diffusions, Markov Processes, and Martingales, Vol. 2. Cambridge University Press, Cambridge (2000) (Itô calculus, Reprint of the second (1994) edition)
Stroock, D.W.: An Introduction to Markov Processes. Graduate Texts in Mathematics, vol. 230. Springer, Berlin (2005)
Sugita, Y., Okamoto, Y.: The incomplete beta function law for parallel tempering sampling of classical canonical systems. Chem. Phys. Lett. 314, 141–151 (1999)
Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin glasses. Phys. Rev. Lett. 57, 2607–2609 (1986)
Acknowledgements
J. Doll: Research supported in part by the National Science Foundation (DMS-1317199), and the Defense Advanced Research Projects Agency (W911NF-15-2-0122). P. Dupuis: Research supported in part by the Department of Energy (DE-SC0010539), the National Science Foundation (DMS-1317199), and the Defense Advanced Research Projects Agency (W911NF-15-2-0122). P. Nyquist: Research supported in part by National Science Foundation (DMS-1317199).
Author information
Authors and Affiliations
Corresponding author
Ancillary Results
Ancillary Results
Lemma 7.1
For any sequence \(a_{1},a_{2},\dots ,a_{K}\) such that \(a_{i}\ge 0\) for all i and \(K\in [0,\infty )\),
and
cannot both be true.
Proof
We can assume without loss that \(K>0\). Let \(b_{i}=a_{i}/K\), so that \(\left\{ b_{i},i=1,\ldots ,K\right\} \) is a probability. By Hölder’s inequality
Using \(b_{i}=a_{i}/K\) gives \(\sum _{i=1}^{K}a_{i}^{1/2}\le K\), which completes the argument. \(\square \)
Lemma 7.2
Consider the ergodic control problem or equivalent minimization problem of Lemma 5.4, with \(h(\mathbf {x})=\lambda \rho (\mathbf {x})\). For two temperatures the optimal cost \(\gamma ^{*}\) satisfies
In the general case with K temperatures, \(\gamma ^{*}<\lambda /K!\)
Proof
To simplify the notation we consider the case \(\lambda =1\). From Lemma 5.4 we know that there is a minimizing measure \(\nu ^{*}\) in (5.1) and that the optimal cost \(\gamma ^{*}\) satisfies
Moreover, if \(W^{*}\) is defined as
then \((\gamma ^{*},W^{*})\) is a solution to the Bellman equation (4.9).
Suppose that \(W^{*}\) is a constant. Inserting this into the Bellman equation yields, for each \(\mathbf {x}\in \mathcal {S}^{2}\),
which cannot hold since \(\rho \) is not a constant. Thus, \(W^{*}\) cannot be a constant function. This in turn implies that the likelihood ratio \([d\nu ^{*}/d\bar{\mu }]\) is not constant equal to 1 (the only possible constant value). Thus \(\nu ^{*}\) is not \(\bar{\mu }\), the invariant measure for the original symmetrized dynamics.
Inserting \(\bar{\mu }\) into the objective function in (5.1) gives
where the second equality comes from \(\rho (\mathbf {x})+\rho (\mathbf {x}^{R})=1=1/2+1/2\) and the symmetry of \(\bar{\mu }\). Thus the cost associated with the uncontrolled dynamics is 1 / 2. Since \(\nu ^{*}\) is the unique minimizer in (5.1) and \(\nu ^{*}\ne \bar{\mu }\), it holds that
The argument for \(K>2\) temperatures is completely analogous. \(\square \)
Lemma 7.3
Assume S is a finite set and that \(\Gamma _{x,y}\), \(x,y\in S\) is the intensity matrix of an ergodic Markov chain on S with invariant probability distribution \(\bar{\mu }\). Let \(q(x)=\sum _{y\in S}\Gamma _{x,y}\), and for \(\nu \in \mathcal {P}(S)\) with \(\theta (x)=\nu (x)/\bar{\mu }(x)\) let
Then \(J(\nu )\) is strictly convex on the relative interior of \(\mathcal {P}(S)\).
Proof
It is enough to show the strict convexity of
for \(\theta (x)\ge 0\), \(\sum _{x\in S}\theta (x)\bar{\mu }(x)=1\). Let \(\{x_{1},x_{2},\ldots ,x_{K}\}\) be an enumeration of the distinct elements of S, \(\theta _{i}=\theta (x_{i}),\bar{\mu }_{i}=\bar{\mu }(x_{i})\) and \(f_{i,j}(\theta )=-\theta _{i}^{1/2}\theta _{j}^{1/2}\). If \(M_{i,j}(\theta )\) denotes the matrix of second order partial derivatives of \(f_{i,j}(\theta )\) at \(\theta \), then straightforward calculation shows that the eigenvalue 0 is repeated \(K-1\) times, and \((\theta _{i}/\theta _{j}+\theta _{j}/\theta _{i})\) is also an eigenvalue with eigenvector \(\theta _{j}e_{i}-\theta _{i}e_{j}\). Hence the null space of this matrix is the collection of vectors orthogonal to \(\theta _{j}e_{i}-\theta _{i}e_{j}\). Since \((\theta _{i}/\theta _{j}+\theta _{j}/\theta _{i})>0\), \(f_{i,j}(\theta )\) is strictly convex (as a function in \(\mathbb {R}^{K}\)) at \(\theta \) except in those directions orthogonal to \(\theta _{j}e_{i}-\theta _{i}e_{j}\).
Since \(\Gamma _{x,y}\) is ergodic all states communicate, and so there exists a sequence \(1=i_{1},i_{2},\ldots ,i_{K},i_{K+1}=1\) such that \(\Gamma _{x_{i_{k}},x_{i_{k+1}}}>0\) for \(k=1,\ldots ,K\). Thus \(-\sum _{x,y\in S}\theta ^{1/2}(x)\theta ^{1/2}(y)\Gamma _{x,y}\bar{\mu }(x)\) is strictly convex except in those directions that are orthogonal to each of \(\theta _{i_{k+1}}e_{i_{k}}-\theta _{i_{k}}e_{i_{k+1}}\), which is exactly the set of directions spanned by \((\theta _{1},\theta _{2},\ldots ,\theta _{K})\). Since this direction cannot be parallel to \(\{\theta :\sum _{k=1}^{K}\theta _{k}\bar{\mu }_{k}=1\}\), \(J(\nu )\) is strictly convex on this set. \(\square \)
Remark 6.1
The proofs in Sect. 5 were largely confined to the setting of two temperatures \(\tau _{1},\tau _{2}\). This was to keep the notation simple and the results generalize to any number \(K\ge 2\) of temperatures. The only result which appears to substantially use that two temperatures are considered is Lemma 5.3, and specifically the argument by contradiction. Here we outline how the proof would proceed for the general setting.
In the setting of K temperatures the assumption (5.7) becomes
We still have that \(W(\mathbf {x})=0\) for \(\mathbf {x}\in \mathcal {D}\).
Let \(\mathcal {D}\) denote the set of diagonal states: \(\mathcal {D}=\{\mathbf {x}:\mathbf {x}=\mathbf {x}^{\sigma }, \ \forall \sigma \in \Sigma _{K}\}\). The only such states are those for which all components are equal. The cost structure is such that \(h(\mathbf {x})=1/K!\) for \(\mathbf {x}\in \mathcal {D}\).
Consider the states that communicate directly with \(\mathcal {D}\), i.e., those only one step away from a diagonal state. Since the underlying processes only jump one at a time there can only be a difference in one component, the others remaining fixed. There are a total of K! possible permutations in \(\Sigma _{K}\), \((K-1)!\) of which keep a specific component fixed. Thus, for a state that is one step removed from the diagonal there are \((K-1)!\) permutations that result in the same state. Moreover, the diagonal state in question will communicate directly with the remaining K permutations as well.
The states one step away from a specific diagonal point can be viewed as forming disjoint sets of states according to the previous description. For each state \(\mathbf {y}\) one step removed from an \(\mathbf {x}\), there are K distinct states \(\mathbf {y}_{1},\dots ,\mathbf {y}_{K}\) that are permutations of \(\mathbf {y}\) and communicate directly with \(\mathbf {x}\). For each such collection of states we can pick one to represent the collection (does not matter which one we pick). Let \(\mathcal {A}_{x}\) denote the collection of such representative states \(\mathbf {y}\). In the case of two temperatures this can be phrased as only looking at states above the diagonal.
The Bellman equation for a diagonal state \(\mathbf {x}\) takes the form
The rates \(r(\mathbf {x},\mathbf {y}^{\sigma })\) are all equal due to symmetry. Combined with \(W(\mathbf {x})=0\) for \(\mathbf {x}\in \mathcal {D}\), this allows the Bellman equation to be expressed as
Since \(\gamma <(1/K!)\) and the rates are all non-negative it must be the case that for at least one \(\mathbf {y}\in \mathcal {A}_{x}\)
For states one step from the diagonal, since \((K-1)!\) permutations will result in the same state, the condition (7.1) takes the form
That is we need only be concerned with the permutations that switch the location of the component that differs from the diagonal state (and the \(\sigma \) that corresponds to the identify map in \(\Sigma _{K}\)). There will then be \((K-1)!\) permutations that produces the exact same state, yielding the factor \((K-1)!\) in front of the sum.
For the reduced form of (7.1) and the Bellman equation to hold, we must have that
for all \(\mathbf {y}\) that communicate with \(\mathbf {x}\), and for at least one such \(\mathbf {y}\),
This is precisely the setting of Lemma 1 with the \(a_{i}\)s represented by \(e^{-2W(\mathbf {y}^{\sigma })}\) for the K relevant permutations \(\sigma \). The lemma then implies that (7.1) is inconsistent with the Bellman equation and therefore cannot hold. This contradicts that \((M\bar{\nu })=\mu \).
Rights and permissions
About this article
Cite this article
Doll, J., Dupuis, P. & Nyquist, P. A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms. Appl Math Optim 78, 103–144 (2018). https://doi.org/10.1007/s00245-017-9401-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-017-9401-9