In standard Parallel Tempering algorithm the proposal of swap step is drawn from distribution independent of the current state of process. Usual choice amounts to a uniform draw from all possible pairs of temperatures or only from pairs of adjacent temperatures. However, even in trivial 1D examples one notices that swap probabilities depend on current positioning of chains, cf. Fig. 1. Our studies on swap strategies that depend on the current state are motivated by a seminal paper on Equi-Energy sampler by Kou et al. (2006), further adopted to Parallel Tempering algorithm by Baragatti et al. (2013).
The main idea behind these algorithms is to exchange states with similar energy (i.e. value of log-density). The original Equi-Energy sampler is greedy in memory usage: it must store all points drawn from differently tempered trajectories. This might become problematic in high-dimensional settings. Equi Energy also targets a biased distribution instead of the proper one due to the use of asymptotic values in acceptance probability. Running this algorithm also requires specifying in advance the so called Energy Rings, i.e. the partition of the state space into regions with similar energy. Schreck et al. (2013) address the problem of choosing the Energy Rings and provide an adaptive version of the algorithm.
The Parallel Tempering algorithm with Equi-Energy moves (PTEE), proposed by Baragatti et al. (2013), also requires specification of the Energy Rings. In order to circumvent this hindrance we propose using state-dependent swap strategies that perform Equi-Energy-like moves within the Parallel Tempering algorithm framework. Our approach is flexible and might use different strategies, e.g. promoting larger jumps or other features.
The general algorithm is as follows: given the state of process \({ x}= ( { x}_1, \ldots , { x}_L )\) after the random walk phase, we propose at random a transpositionFootnote 3
\(T_{{ ij}}( { x})\) from a state-dependent discrete distribution with probabilities \(p_{{ ij}}({ x})\) defined on a discrete simplex of indices, \(i<j\).
To assure reversibility, the swap acceptance probability should be defined by
$$\begin{aligned} \alpha _{{ ij}}({ x}) := \frac{ p_{{ ij}}\big ( T_{{ ij}}( { x}) \big ) }{ p_{{ ij}}({ x}) } \left( \frac{ \pi ( { x}_i )}{ \pi ( { x}_j ) } \right) ^{ \beta _j-\beta _i } \wedge 1\;. \end{aligned}$$
(3)
The definition of acceptance probability (3) assures that kernel \({\mathcal {S}}\) is reversible with respect to \(\varvec{\pi }\).
Proposition 1
Kernel \({\mathcal {S}}\) defined by (2) is reversible with respect to \(\varvec{\pi }( { x}) \propto \pi ({ x}_1)^{ \beta _1 } \times \cdots \times \pi ( { x}_L )^{\beta _L} \).
Proof
We need to show that for all \({ A}, { B}\in \varvec{\mathfrak {F}}\) we have
$$\begin{aligned} \int _{{ A}\times { B}}\varvec{\pi }( \mathrm {d}{ x}) {\mathcal {S}}( { x}, \mathrm {d}{ y}) = \int _{{ A}\times { B}}\varvec{\pi }( \mathrm {d}{ y}) {\mathcal {S}}( { y}, \mathrm {d}{ x}) \;. \end{aligned}$$
For all \(i<j\) let us define
$$\begin{aligned} {\mathcal {S}}_{{ ij}}( { x}, { A}) := \mathbf {1}_{\{ T_{{ ij}}( { x}) \in { A} \}} p_{{ ij}}( { x}) \alpha _{{ ij}}( { x})\;. \end{aligned}$$
It is enough to verify that for every \(i<j\)
$$\begin{aligned} \int _{{ A}\times { B}}\varvec{\pi }( \mathrm {d}{ x}) {\mathcal {S}}_{{ ij}}( { x}, \mathrm {d}{ y}) = \int _{{ A}\times { B}}\varvec{\pi }( \mathrm {d}{ y}) {\mathcal {S}}_{{ ij}}( { y}, \mathrm {d}{ x})\;. \end{aligned}$$
For any arbitrary chosen \(i<j\) define a measure \(\mu \) on \({\mathbb {R}}^{2dL}\) as follows: for \({ A}, { B}\in \varvec{\mathfrak {F}}\) let
$$\begin{aligned} \mu ( {{ A}\times { B}}) := \lambda _\text {Leb}\Big ( \big \{ { x}\in { A}\;:\; T_{{ ij}}( { x}) \in { B}) \big \} \Big )\;, \end{aligned}$$
where \(\lambda _\text {Leb}\) denotes the Lebesgue measure on \({\mathbb {R}}^{dL}\).
Since \(T_{{ ij}}\big ( T_{{ ij}}( { x}) \big ) = { x}\), by symmetry of Lebesgue measure we get
$$\begin{aligned} \begin{aligned} \mu ( {{ A}\times { B}})&= \lambda _\text {Leb}\Big ( { A}\cap T_{{ ij}}({ B}) \Big ) \\&= \lambda _\text {Leb}\Big ( T_{{ ij}}({ A}) \cap { B}\Big ) = \mu ( {{ B}\times { A}})\;, \end{aligned} \end{aligned}$$
(4)
and by definition of \({\mathcal {S}}_{{ ij}}\) we obtain
$$\begin{aligned} \begin{aligned} \int _{{ A}\times { B}}&\varvec{\pi }( \mathrm {d}{ x}) {\mathcal {S}}_{{ ij}}( { x}, \mathrm {d}{ y}) \\&=\int _{{ A}\times { B}}\varvec{\pi }( { x}) \alpha _{{ ij}}({ x}) p_{{ ij}}({ x}) \mu ( \mathrm {d}{ x}, \mathrm {d}{ y})\\&=\int _{ { A}\cap T_{{ ij}}({ B}) } \varvec{\pi }( { x}) \alpha _{{ ij}}({ x}) p_{{ ij}}({ x}) \mathrm {d}{ x}\end{aligned} \end{aligned}$$
(5)
Now, using (3) we find that
$$\begin{aligned} \varvec{\pi }({ x}) \alpha _{{ ij}}({ x}) p_{{ ij}}({ x}) = \varvec{\pi }\big ( T_{{ ij}}({ x}) \big ) \alpha _{{ ij}}\big ( T_{{ ij}}({ x}) \big ) p_{{ ij}}\big ( T_{{ ij}}({ x}) \big )\;, \end{aligned}$$
and setting \({ y}=T_{{ ij}}({ x})\) to (5) and applying (4) we get
$$\begin{aligned} \begin{aligned} \int _{{ A}\times { B}}&\varvec{\pi }( \mathrm {d}{ x}) {\mathcal {S}}_{{ ij}}( { x}, \mathrm {d}{ y}) \\&= \int _{ T_{{ ij}}({ A}) \cap { B}} \varvec{\pi }( { x}) \alpha _{{ ij}}({ x}) p_{{ ij}}({ x}) \mathrm {d}{ x}\\&= \int _{{ A}\times { B}}\varvec{\pi }( { x}) \alpha _{{ ij}}({ x}) p_{{ ij}}({ x}) \mu ( \mathrm {d}{ y}, \mathrm {d}{ x}) \\&= \int _{{ A}\times { B}}\varvec{\pi }( \mathrm {d}{ y}) {\mathcal {S}}_{{ ij}}( { y}, \mathrm {d}{ x}), \\ \end{aligned} \end{aligned}$$
which completes the proof. \(\square \)
Remark 1
Thanks to the definition of kernel \({\mathcal {S}}\), for any positive measurable function \(F:{\mathbb {R}}^{Ld}\rightarrow {\mathbb {R}}^+\) invariant by permutation we get \({\mathcal {S}}F({ x}) = F({ x})\) which, under some regularity conditions (Miasojedow et al. 2013a), implies that Parallel Tempering algorithm with state-dependent swap steps is geometrically ergodic. For under the same set of assumptions Theorem 1 precised in the above-mentioned œuvre holds with state-dependent swap steps.
To pursue Equi-Energy-type moves without a need to fine-tune some additional parameters, e.g. precising the Energy Rings, we propose to set the swap probabilities as follows: for \(i<j\) let
$$\begin{aligned} p_{{ ij}}({ x}) \propto \exp \big \{ -| \log ( \pi ({ x}_i)-\log ( \pi ({ x}_j) | \big \}\;. \end{aligned}$$
(6)
The normalising constant is equal \(\sum _{i<j}p_{{ ij}}({ x})\), so it does not depend upon permutation of \({ x}\). Hence \(p_{{ ij}}({ x})=p_{{ ij}}\big (T_{{ ij}}({ x})\big )\) and the acceptance probability simplifies to that of the standard Parallel Tempering algorithm,
$$\begin{aligned} \alpha _{{ ij}}({ x}) = \left( \frac{ \pi ({ x}_i) }{ \pi ({ x}_j) } \right) ^{\beta _j-\beta _i} \wedge 1\;. \end{aligned}$$
(7)
In comparison to standard PT, the state-dependent swaps’ version promotes acceptance of states that are closer, i.e. the difference between their energy levels being small, enlarging the probability that such a swap is accepted. This leads to an increased number of accepted global moves in the algorithm. The simulations’ results, presented below, confirm this improvement.
Other possible strategies commonly used in the literature include uniform random choice of swap from the set of all possible pairs, i.e. \( p_{{ ij}}= \left( {\begin{array}{c}L\\ 2\end{array}}\right) ^{-1}\), or swapping at random only the adjacent temperature levels, i.e. \(p_{{ ij}}= (L-1)^{-1} \varvec{1}_{\{i=j-1\}}\). We shall refer to these strategies as RA and AL, respectively.
Remark 2
Note that the already mentioned PTEE algorithm can be considered a special case of state-dependent swap step. Let \(H_1,\ldots ,H_m\) be the Energy Rings. Denote by \(H_x\) the set \(H_i\) where x belongs to. Letting \(p_{{ ij}}({ x}) \propto \mathbf {1}_{\{ { x}_j \in H_{{ x}_i} \}}\) we note that the swap acceptance probability is reduced to (7). In the end we obtain an algorithm proposed by Baragatti et al. (2013).
Therefore, the theoretical results presented here can be directly applied to PTEEM. In particular, we note that it is geometrically ergodic under the same regularity condition and so both the convergence result and the law of large numbers for PTEEM with adaptive Metropolis step at each level can be readily obtained.