Parallel simulation of two--dimensional Ising models using Probabilistic Cellular Automata

We perform a numerical investigation of the \emph{shaken dynamics}, a parallel Markovian dynamics for spin systems with local interaction and whose transition probabilities depend on two parameters, $q$ and $J$, that tune the geometry of the underlying lattice. We determine a phase transition curve, in the $(q, J)$ plane, separating the disordered phase from the ordered one, study the mixing time of the Markov chain and evaluate the spin-spin correlations as $q$ and $J$ vary. Further, we investigate the relation between the equilibrium measure of the shaken dynamics and the Gibbs measure for the Ising model. Two different approaches are considered for the implementation of the dynamics: a multicore CPU approach, with code written in Julia and a GPU approach with code written in CUDA.


Introduction
The Gibbs sampling of lattice spin models is a major task for statistical mechanics. The numerical techniques developed for its realization are based mainly on Markov chain dynamics for single and cluster spin flip [6][15] [16], and can be easily implemented by means of random mapping representation [8] techniques.
A theory of parallel Markov chains as a Probabilistic Cellular Automaton (PCA) dates back to 1989 [7]. These processes are characterized by a factorized transition matrix on the configuration space, and their simulation updating all spins by means of the same random map [2]. More recently a class of PCAs where transition probabilities are defined in terms of a pair Hamiltonian and where the spins are simultaneously updated at each time step has been the subject of several works, e. g., [4,10] where PCA are exploited to study the Ising model on planar graphs. We explore the computational possibilities of this pair Hamiltonian model to generalize the random sampling algorithms for Ising spin systems on a set of two-dimensional lattices.
Formally a PCA is a Markov Chain (X n ) n∈N whose transition probabilities are such that given two generic configurations τ = (τ 1 , . . . , τ k ) and σ = (σ 1 , . . . , σ k ) so that for each time n, the components of the "configuration" are independently updated. From a computational point of view, the evolution of a Markov Chain of this type is well suited to be simulated on parallel processors and GPUs.
In this framework, a new PCA parameterized by J and q, called shaken dynamics has recently been introduced [2]. The equilibrium measure of the shaken dynamics has been extensively investigated in [1] and a critical curve in the plane (q, J) has been explicitly determined.
In particular in [1] a model has been proposed where the configurational variables are split into two groups τ = (τ 1 , . . . , τ k ) and σ = (σ 1 , . . . , σ k ), where τ i , σ i ∈ {−1, 1} for each i, and are arranged on a bipartite graph. Different interactions among the τ and σ variables give rise to the possibility of interpolation among different lattice geometries.
The PCA we take into account is a parallel and irreversible version of the heath bath dynamics and is obtained by concatenating two different update rules [2]. By means of the Hamiltonian defined in [1], which depends on the (J, q) parameters, we identify numerically two regions of the space (J, q) characterized by different behaviors of the dynamics.
In this framework, a new PCA parameterized by J and q, called shaken dynamics, has recently been introduced in [2]. The equilibrium measure of the shaken dynamics has been extensively investigated in [1] and a critical curve in the plane (q, J) has been explicitly determined.
The elementary step of the shaken dynamics is naturally defined on the a finite subset Λ of the square lattice Z 2 and consists of a sequence of two inhomogeneous half steps. However, in both [2,1] it has been pointed out that the shaken dynamics can be seen as an alternate dynamics on a subset of the honeycomb lattice. The proposed dynamics, although not faster than ad hoc dynamics for specific models, allows to simulate a whole class of statistical mechanics models spanning from the one-dimensional Ising model to the square lattice and hexagonal one across all the intermediate models.
Depending on the values of J and q, the shaken dynamics "formalism" defined on the square lattice can be used to simulate a class of Ising models on the honeycomb lattice (as pointed out in [1]). Some of the values of J and q are particularly interesting because they allow to use the shaken dynamics to simulate • the Ising model on the isotropic hexagonal lattice for J = q • (an approximation to) the Ising model on the square lattice for q >> 1 • the Ising model on a collection of weakly interacting unidimensional systems for small values of q.
The numerical investigation we put forward is aimed at: • illustrating a simple heuristic method to numerically determine the critical curve • evaluating the mixing time of the chain as a function of J and q • studying the spin-spin correlations as a function of J and q.
Further, for J = q we compare the mixing time of the shaken dynamics with that of a single spin flip dynamics for the Ising model on the hexagonal lattice and, for q >> 1 we also compare the mixing time of the shaken dynamics with that of a single spin flip and an alternate parallel dynamics for the Ising model on the square lattice and evaluate the distance between the equilibrium measure of the shaken dynamics from the Gibbs measure for the Ising model on the square lattice.
The paper is organized as follows. In the next section we define the lattice spin 3 model and the PCA dynamics. In Section 3 we present the numerical findings concerning the model under investigation. Finally, in Section 4, we provide some details concerning the two implementations of the dynamics taken into account: the mutlicore CPU one and the GPU one.

The model
Consider the Ising Hamiltonian on a graph G(V, E) where σ x ∈ {−1, 1} for all the x ∈ V and J xy ∈ R + .
We assume that V = Λ 1 ∪ Λ 2 , where Λ 1 and Λ 2 are finite squared subsets of the square lattice with L 2 sites and periodic boundary conditions and all edges in E have one endpoint in Λ 1 and the other in Λ 2 . The σ and τ variables denote the Ising configuration on the vertices of Λ 1 and Λ 2 . Each σ u , with u ∈ Λ 1 , can be put in a one-to-one correspondence with τ u with the same index u ∈ Λ 2 .
Let x = (i, j) be a vector of coordinate on the torus (Z/LZ) 2 . Then are the coordinates of the four points at unit distance from x. Set J xy = J for all (x, u) ∈ E with x = u and J xy = q if x = y.
With this notation we obtain the Hamiltonian studied in [1,2] on the pairs of Ising configurations σ on Λ 1 and τ Λ 2 . The interactions of this Hamiltonian can be visualized on the induced bipartite graph represented in Fig. 1  and 3. The parameter q is also referred to as the self interaction parameter. Figure 1: The lattices Λ 1 , Λ 2 with the q (red) and J (black) interactions.
As pointed out in [1] a careful look to the Hamiltonian (5) and to the graph of Fig. 1 shows that the bipartite graph is isomorphic to the hexagonal lattice G (V, E) with edges J and q on whose vertices are arranged the variables σ and τ as shown in Fig. 2. The Gibbs measure at temperature 1/β for the Hamiltonian where The critical value of β c separates the ordered phase where all the spin have the same probability to take the values +1 or −1 from the ordered phase where the measure is polarized [5].
Rescaling the interactions J and q by β βJ → J, βq → q it has been proven in [1] that there exists a function J c (q), shown in Fig. 4, which separate the ordered phase from the disordered one.
The partition function of the Ising model on the honeycomb lattice G is The graph G is a weighted planar graph, non degenerate, finite and doubly periodic. The periodic boundary conditions for Λ 1 and Λ 2 guarantee that the graph G is immersed in the torus.
Introducing the following notation on the hexagonal lattice the critical curve J c (q) for the Hamiltonian (5) is the unique solution for J, q > 0 of the equation where E 0 (G) is the set of even subgraphs of G winding an even number of times around each direction of the torus, and The explicit form of the equation (10) is The solution of eq.(11) with respect to the J is plotted in Fig. 4.
We observe that is the critical value of β for the Ising model on the lattice square, while on the point J c (q) = q the equation (12) gives the critical value for the Ising model on the hexagonal lattice J = q = 0.6585. If q → 0, J → ∞ there are no phase transitions as in the unidimensional Ising model.
Following [2] we let the system evolve as a Markov chain where the spins in Λ 1 and in Λ 2 are alternatively updated with a probability proportional to the exponential of the Hamiltonian of the target configuration in X × X .
More precisely, using the notation we consider a Markov chain on X × X with transition probabilities given by at odd times and at even times where Z σ = η∈X e −H(σ,η) and Z τ = η∈X e −H(η,τ ) .
The factorization in eq. (15) and (16) and the mutual dependence of the variables σ and τ makes it quite easy the parallel numerical implementation of this dynamics.
In particular, to simulate the evolution of the chain it is possible to sample the value ζ ∈ {−1, 1} of the spin at site u with probability at even times independently for all u ∈ Λ.
In this framework, the shaken dynamics introduced in [2] is obtained by looking at the evolution of the spin configuration in Λ 1 . In other words, the shaken dynamics is the marginal of the alternate dynamics defined by eq. (15) and (16) and the shaken transition probabilties are In [2] it has been proven that the equilibrium measure of this dynamics is In the remainder of the paper, we use the wording shaken dynamics when we are interested in the evolution on the sub-lattice Λ 1 whereas we call the dynamics on the hexagonal lattice subject to the transition probabilities (15) and (16) the alternate parallel dynamics (on the hexagonal lattice).

Numerical estimation of critical curve
As stated before, the critical curve (12) is the function that separates the ordered and the disordered phases. Above this line the values of the spins tend to be highly correlated whereas on the opposite side the value assumed by each spin is weakly dependent on the values taken by other spins. To determine whether the system is in the ordered or disordered phase we compute, over a large number of iterations, the average and the variance of the magnetization on one of the two layer Λ i is where the magnetization m is defined as By Theorem 2.1 in [1] π s (m) = π 2 (m), that is, the average magnetization (in Λ 1 ) of the shaken dynamics is the same as the average magnetization of the parallel alternate dynamics (on the hexagonal lattice Λ 1 ∪ Λ 2 ).
We take Λ to be a 200×200 torus and simulate the evolution of the shaken dynamics starting from configuration σ 0 = {−1, −1, . . . − 1} for (J, q) ∈ {(0, 2) × (0, 2)} on a 80 × 80 grid. We first let the system run for a warm-up time of 300000 steps and then record the average and the variance of the magnetization for 300000 additional steps. The results obtained on the whole grid (q, J) are summarized in Figure 6.
It is known that, at equilibrium, the average value of the magnetization fluctuates heavily only close to the critical value of the interactions (see [14] for a reference).    Figure 7 shows that the variance of the magnetization is significantly different from zero only for points of the (q, J) plane in the vicinity of the pins on the curve (12). This show that, even for a small lattice, the magnetization fluctuates only close to the critical line and for the whole class of Ising models that can be described tuning the values of J and q.

Coalescence times and perfect sampling
To assess whether the number of steps for which a Markov chain is run is large enough for its distribution to be close to the equilibrium distribution, it is conve- nient to look at its mixing time. For a Markov chain (X n ) n∈N with state space X and stationary distribution π, the mixing time is defined as where µ n σ is the distribution of X n conditioned on X 0 = σ, µ − ν TV denotes the total variation distance between the probability measures µ and ν and ε is some "small" number (for instance e −1 ). For a reference on mixing times see, for instance, [11]. Determining useful bounds for the mixing time of a Markov chain is, in general, a quite challenging task. However, indication on the mixing time of a Markov chain can be gathered looking at the coalescence times (see [8] for a reference).
Consider two Markov chains (X n ) n∈N and (Y n ) n∈N living on the same state space X and consider the coupling (Z n ) n∈N = (X n , Y n ) obtained by letting X n and Y n to evolve with the same update function and the same sequence of random numbers (for an introduction on the coupling method see [12]). Further assume that the update function is chosen is a way such that P Z (X n = Y n ) → 1 as n → ∞.
We define the coalescence time T between X n and Y n as T = min{n ∈ N : X n = Y n }. Note that, since X n and Y n evolve with the same update function and the same sequence of random numbers X n = Y n for all n > T . This definition extends naturally to a collection of K chains X k n with k ∈ 1 . . . K.
The mixing time of the chain (X n ) n∈N is estimated by the coalescence time of the chains (X k n ) n∈N for k = 1 . . . |X |, all defined on the state space X , where chain X k n has initial distribution concentrated on state k.
To effectively determine the coalescence time of the shaken dynamics, however, it is not necessary to run 2 |Λ| copies of the Markov chains, but it is possible to use the so called sandwiching technique since the shaken dynamics preserves the partial ordering between configurations 1 . In other words, it can be directly checked that if X k 0 ≤ X l 0 than X k n ≤ X l n for all n > 0 (see, again, [8], for a reference). To determine the coalescence time it is therefore sufficient to look at the coalescence times of two chains starting, respectively, from σ top = {1, 1, . . . , 1} (the largest possible configuration) and σ bot = {−1, −1, . . . , −1} (the smallest possible one).
Further note that, leveraging on coupling between Markov chains it is possible to perform an unbiased sampling from the equilibrium distribution of a Markov chain using the Propp-Wilson algorithm, introduced in [13], which requires two copies of the Markov chain to be run with the same update function and the same sequence of random numbers.
We studied the coalescence times of the shaken dynamics. The simulations were run taking Λ to be a 32×32 square lattice. This means that the induced hexagonal lattice Λ 1 ∪ Λ 2 has 32 × 32 × 2 points.
We computed the average coalescence time for values of J and q close to the critical line J c (q). The results obtained are summarized in Fig. 8.
For J = q, the shaken dynamics is the marginal of the alternate dynamics on the isotropic hexagonal lattice. More properly, pairs of configurations (σ, τ ) with τ the configuration obtained from σ by performing the first half step of the shaken dynamics can be regarded as spin configurations on the honeycomb lattice. The equilibrium distribution of these pairs is the Gibbs measure of the Ising model on the isotropic hexagonal lattice (see Theorem 2.1 in [2]). Therefore it makes sense to compare the mixing time of the shaken dynamics with the mixing time of a single spin flip dynamics defined on the hexagonal lattice and whose stationary distribution is the Gibbs measure. As a reference we take the heat bath dynamics defined as follows: where σ x is the configuration obtained from σ by flipping the spin at site u and h u (σ) = y∼x Jσ y . Also the heat bath dynamics preserves the partial ordering between configurations and, hence, also in this case it is sufficient to simulate the evolution of two chains one starting from all spins set to +1 and one starting from all spins set to −1.
Note that it is possible to argue that the parallel alternate dynamics studied here is a parallel version of the single spin flip heat bath described above.
The results obtained, for several values of J (and, consequently, q) are presented in Fig. 9. Note that for the single spin flip dynamics the value shown in the chart is the number of steps divided by 2|Λ| so that, for both algorithms, we are comparing the total number of "attempted spin flips".
It appears that the parallel alternate dynamics is faster than the single spin flip one even if the single spin flip one is "renormalized" with the volume of the box as described above.
In [2], Theorem 2.3 it has been shown that, for large values of q, the equilibrium distribution of the shaken dynamics approaches the Gibbs measure for the Ising model on the square lattice. More precisely it has been proven that, if then, for J sufficiently large, where π G is the Gibbs measure for the Ising model on the square lattice. Therefore it makes sense to evaluate numerically the goodness of this approximation as q increases. To this purpose we consider two observable: the magnetization m and the energy H(σ). For both observable we compare their sample mean and sample standard deviation over samples drawn from the equilibrium distribution of the shaken dynamics with the sample mean and the sample standard deviation of two other reference dynamics having the Gibbs measure as stationary distribution. One of the two reference dynamics taken into account is, again, the heat bath dynamics. The other dynamics is a parallel version of the heat bath dynamics that updates, alternatively, the spins on the odd and the even sites of the lattice. The latter is the equivalent for the square lattice of the alternate parallel dynamics on the hexagonal lattice defined by equations 15 and 16. Theorem 2.2 in [2] states that the equilibrium measure of this dynamics is, indeed, the Gibbs measure on the square lattice. For all these dynamics, samples are drawn using the Propp-Wilson algorithm introduced above. Several values of J close to the critical value for the Ising model on the square lattice and the results obtained are summarized in Figg. 10, 11, 12 and 13.
The data suggests that, for q ≥ 2.5 the approximation provided by the shaken dynamics is quite good. On the other hand, we also estimated the time required to approach the equilibrium distribution by comparing the coalesce time of the shaken dynamics with those of the two other reference dynamics. Also in this case the number of steps required by the single spin flip dynamics is renormalized with the volume of the box Λ.
The result obtained are summarized in Fig. 14. It is apparent that, though more flexible, the shaken dynamics becomes slower than "specialized" algorithms as the accuracy of the approximation increases.
Parts (b) of Figg. 15, 16 and 17 show configurations drawn from the equilibrium distribution of the alternate parallel on the hexagonal whereas parts (a) show the corresponding sub-configurations on the sublattice Λ 1 . These sub-configurations are, therefore, drawn from the equilibrium distribution of the shaken dynamics. In Fig. 15 it is possible to observe that the spins linked by a q-edge have almost always the same value. This is in good accordance with the fact that stationary measure of the shaken dynamics is close to the Gibbs measure for the Ising model on the square lattice. On the other side, Fig. 17 is consistent with the fact that for q very small the equilibrium measure of the shaken dynamics tends to that of a colection of weakly dependent unidimensional Ising models.

Correlations
Theorem 2.4 in [1] establishes that, if q is sufficiently small, π(σ 0,0 , σ , ) < π(σ 0, , σ ,0 ) where π is the equilibrium measure of the shaken dynamics and σ is, therefore, We study the SW-NE and the NW-SE correlations as varies with Λ a 32 × 32 square box. The results are shown in Table 1.
All pairs (q, J) taken into account correspond to points of the q, J plane close to the critical curve J c (q). It is possible to observe that, as q decreases, the SW-NE correlations become, indeed, smaller than the NW-SE ones, whereas, for q large the two are quite similar. Further, if the pair (q, J) is below the critical curve the correlations decay quite rapidly. On the other hand, if (q, J) is above J c the correlations are significant also for larger values of .

Implementation details
To approximate numerically the critical curve J c (q), we take samples for different values of J and q. The code used for the simulationin written in Julia [9] and simulations are performed through 80 thread processors running, in parallel, the  Fig.6 shows that the chosen simulation parameter is good enough to approximate the critical curve.
The elementary step of the shaken dynamics described in the previous section has been simulated by the Algorithm 3. A spin configuration is updated via a sequence of two similar half steps. The computation of the vector of local fields h that drives the transition probabilities of each spin is alternatively carried out using the functions collectUR and collectDL which determine the up-right and down-left contribution as in eq. (14).
The algorithm 3 is the complete update in two steps of the shaken dynamics, which is more general than the one used in [10].
The choice of collecting the statistics over 300,000 time steps (after a warm up time of 300,000 additional stime stesps) turned out to be good enough, and the results show, unmistakably,the separation of the two phases (ordered and disordered).
We implemented the algorithm 3 in two parallel ways. A CUDA 2 implementation of a parallel heat bath for large dimension Lattice spin, and a Julia [9] implementation on a single CPU to be used on a multiprocessor systems (trivial parallel on a multi data input). Both have been optimized to handle our problem, and used to simulate the shaken dynamics of the PCA, a quasi-similar behavior was observed during our experiment.

Parallel single-GPU code
The general heat bath procedure has been implemented on SIMD (Single Instruction Multiple Data) system. To optimize the code exploiting the CUDA memory architecture we implemented three kernels for the functions collectUR, collectDL and for updating the configuration. We used the default random generator from curand library. The collect function computes the transition probabilities in the given direction. Each thread handles one spin on the lattice field. The principal use of the global memory is the four square spin lattice, the two configuration sigma (σ) and tau (τ ), the fields which handles the Hamiltonian computation and the random-unit contains random uniform variables.
In our implementation all the operation are performed on register cache memory. We did not use shared memory for the random-unit. The code was written to run on the Nvidia-GPU Tesla P100 using by 16GB video memory, using 4 matrices of dimension L×L, two for the lattice spin field (single byte), and two for the collected fields and for the random uniform (four byte). All the matrices are allocated on the global memory.
For the management of the memory, before allocating the memory of the 4 matrices, the code used approximately 303 MB, leaving 15973.250000 MB. We used 2 * 4 * L * L + 2 * L * L bytes, but we can not go beyond 10 5 for this GPU.

Benchmarking
To measure the performance of our GPU code it is not fair to compare it with the single-CPU implementation from Julia. We have implemented a serial version of the shaken dynamics in a lower level language, a captured sample for a square lattice spins of size 512 × 512 with J = 0.99 and q = 0.5 is given in Fig. (19). For our simple measurement, we set the parameters J = 0.44 and q = 0.66 and to have more significant value we measure it in milliseconds. We compare the two implementation for different dimension L, hence the size of the square lattice 22 Figure 18: GPU sample: L = 512, J = 0.99, q = 0.5, iteration= 60 th which is L 2 . For this benchmark (Fig. (20)), we used an Nvidia graphic card Tesla P-100 vs single core of the CPU Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz. We measure the time for one update execution. As we have mentioned the GPU memory is limited so that we did the experiment under this condition for the size of the square lattice spin.
We observe that the GPU is much faster than the CPU with a factor of 500 as far as the lattice size grows. We can see that the CPU time looks linear while the case for GPU is when the size is more than 2048 × 2048.

Summary
The present work is a numerical experiment on the 2D Ising model. In particular the tasks are mainly focused on the shaken dynamics, in which we used to approximate numerically the critical curve which relate the parameter J and q, it separates the two phases on the region of parameters (J, q). Also we are able to evaluate numerically the fact that the equilibrium distribution of this dynamics are close to Gibbs measure on the region when q is large [2]. We provide two different parallel implementation perspective of the algorithm in which we give a benchmark to identify a speed-up that we can gain on using GPU.
As a MCMC algorithm we also give a comparison on the convergence to the equilibrium state of the algorithm by means of coalescence time. In this purpose we compare the coalescence time between the alternate and shaken dynamics on the critical line (red line in Fig. 4), this is followed by further discussion on the numerical aspect of the algorithm.

Work in progress
Most of the implementations we have used in this project are in Julia [9]. As pointed out above, the paper is a numerical supports for the two papers [2] and [1]. The code is quite complete for an academic use on the simulation of 2D Ising model, especially it contains the class of all the dynamics we have studied in this project and their methods. Our attempt is to provide a Julia library that can be used as framework to study the dynamics of the planar Ising model.