# Parallelization and implementation of multi-spin Monte Carlo simulation of 2D square Ising model using MPI and C++

- 282 Downloads

## Abstract

In this paper, we present a parallel algorithm for Monte Carlo simulation of the 2D Ising Model to perform efficiently on a cluster computer using MPI. We use C++ programming language to implement the algorithm. In our algorithm, every process creates a sub-lattice and the energy is calculated after each Monte Carlo iteration. Each process communicates with its two neighbor processes during the job, and they exchange the boundary spin variables. Finally, the total energy of lattice is calculated by map-reduce method versus the temperature. We use multi-spin coding technique to reduce the inter-process communications. This algorithm has been designed in a way that an appropriate load-balancing and good scalability exist. It has been executed on the cluster computer of Plasma Physics Research Center which includes 9 nodes and each node consists of two quad-core CPUs. Our results show that this algorithm is more efficient for large lattices and more iterations.

## Keywords

Ising model Monte Carlo method Multi-spin coding MPI## Introduction

The Ising model [1] gives a microscopic description of the ferromagnetism which is caused by the interaction between spins of the electrons in a crystal. The particles are assumed to be fixed on the sites of the lattice. Spin is considered as a scalar quantity which can achieve two values \(+1\) and \(-\,1\). The model is a simple statistical one which shows the phase transition between high-temperature paramagnetism phase and low-temperature ferromagnetic one at a specific temperature. In fact, the symmetry between up and down is spontaneously broken when the temperature goes below the critical temperature. However, the one-dimensional Ising model, which has been exactly solved, shows no phase transition. The two-dimensional Ising model has been solved analytically with zero [2] and nonzero [3] external field. In spite of a lot of attempts to solve 3D Ising model, one might say that this model has never been solved exactly. All the results for the three-dimensional Ising model have been used approximation approaches and Monte Carlo methods.

Monte Carlo methods or statistical simulation methods are widely used in different fields of science such as physics, chemistry, biology, computational finance and even new fields like econophysics [4, 5, 6, 7, 8, 9]. The simulation can proceed by sampling from the Probability Density Function and generating random numbers uniformly. The simulation of the Ising model on big lattices increases the cost of simulation. One way to reduce the simulation cost is to design the algorithms which work faster. Swendsen-Wang and Wolff algorithms [10, 11] and multi-spin coding methods [12, 13, 14] are the examples of such methods. Another way is to parallelize and execute the model on GPUs, GPU clusters and cluster computers [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27].

In this paper, we present a parallel algorithm to simulate the 2D Ising model using Monte Carlo Method. Then, we run the algorithm on a cluster computer using C++ programming language and MPI. Message Passing Interface (MPI) is a useful programming model in HPC systems [28, 29, 30, 31, 32, 33, 34] in which the processes communicate through message passing and was designed for distributed memory architectures. MPI provides functionalities which allow two specified processes to exchange data by sending and receiving messages. To get high efficiency, it is necessary to have good load balancing and also to have minimum communications between processes.

In our algorithm, each individual process creates its own sub-lattice, initializes it, gets all Monte Carlo iterations done and calculates the energy of the sub-lattice for a specific temperature. Each process communicates with its two neighbor processes during the job and they exchange the boundary spin variables. Finally, the total energy of lattice is calculated by map-reduce method. Since in multi-spin coding technique each spin is stored by 3 bits, inter-process communications are reduced considerably. Because computational load of each sub-lattice is assigned to each process and size of all sub-lattices is equal, an appropriate load balancing exists. Since each process—independent of number of processes—only communicates with its two neighbor processes and the lattice is decomposed into sub-lattices, the algorithm benefits a good scalability.

This paper has been organized as follows. In “Metropolis algorithm and Ising model” section, Metropolis algorithm and the Ising model are studied briefly. In “Multi-spin coding method” section, we explain how to use Multi-spin coding method to calculate the interaction energy between a specific spin and its nearest neighbors. We also study the boundary conditions in the memory-word lattice.^{1} Details of parallelization of the algorithm are discussed in “Parallelization” section and the method of implementation is given in “Implementation” section. Finally, the results are given in “Results” section.

## Metropolis algorithm and Ising model

*J*is the coupling coefficient. The summation in Eq. (1) is taken over the nearest neighbor pairs \(\langle m,n\rangle \). Periodic boundary conditions are used which state that spins on one edge of the lattice are neighbors with the spins on the opposite side. In this paper, we focus on simulation of the 2D square Ising model using Metropolis Monte Carlo algorithm [35]. The lattice is initialized randomly and is updated as the following:

- 1.
Select a spin (\(s_{i,j}\)) randomly and calculate the interaction energy between this spin and its nearest neighbors (

*E*). - 2.
Flip the spin \(s_{i,j}\) to \(s^{\prime }_{i,j}\) and again calculate the interaction energy (\(E^{\prime }\)).

- 3.
\(\triangle E=E^{\prime }-E\), if \(\triangle E\le 0\), \(s^{\prime }_{i,j}\) is accepted. Otherwise, \(s^{\prime }_{i,j}\) is accepted with the probability \(e^{-\triangle E/{KT}}\) where

*K*is Boltzmann constant and*T*is the temperature. - 4.
Repeat steps 1–3 till we are sure that every spin has been flipped.

- 5.
Calculate the total energy of the lattice for

*i*th iteration \(\left( E_{\mathrm{total}}^i\right) \).

*N*times) and finally average on \(\left( E_{\mathrm{total}}^i\right) \) to obtain \(E_{\mathrm{total}}\):

## Multi-spin coding method

Multi-spin coding refers to all techniques that store and process multiple spins in one memory word. In this paper, we apply the multi-spin coding technique to the 2D Ising model. In general, multi-spin coding technique results in a faster algorithm as a consequence of updating multiple spins simultaneously. However, we mainly employ this technique to reduce the inter-process communications.

*N*is an integer greater than one. Now, we need to convert the spin lattice (Fig. 1a) to the lattice of memory words (Fig. 1b). Therefore, the size of the memory-word lattice is considered as \(N \times 21N\). Each column of the spin lattice is coded into the same column of the memory word in the memory-word lattice. So, 21

*N*spins in one column of the spin lattice are arranged in

*N*memory words of a column in the memory-word lattice as follows:

*S*(

*I*,

*J*) represents the memory word at the row

*I*and the column

*J*, \(0 \le I \le N-1\), \(0 \le J \le 21N-1\). \(s_{i,j}\) shows the spin located at the row

*i*and the column

*j*where \(j=J\). The advantage of this arrangement is that each spin is placed in the appropriate position related to its neighbors. Consider

*k*th spin in a given memory word

*S*(

*I*,

*J*). The right/left/top/down neighbor of the

*k*th in the spin lattice is exactly

*k*th spin in the right/left/top/down neighbor of the memory word

*S*(

*I*,

*J*) in the memory-word lattice.

*S*(0,

*J*)) is not exactly

*S*(0,

*J*) (\(S(N-1,J)\)). For a memory word in the first (last) row, its up (down) neighbor—which is the memory word in the last (first) row and in same column—has to be shifted 3 bits to the right (left). These two cases have been shown in the diagrams (b) and (c) of Fig. 2. We should recall that the 64th bit is always set to zero.

### Calculation of energy

^{2}Hence, for a given memory word

*S*(

*I*,

*J*), the expression

*E*and the energy \(E^\prime \) calculated after flipping the selected spin have been presented in the forth and fifth rows, respectively.

Different configurations which might happen between a selected spin and its four nearest neighbors

Configuration | Selected spin | Nearest neighbors | | \(E^\prime \) | \(\Delta E\) | Value of a 3-bit group |
---|---|---|---|---|---|---|

1 | Up | 4 Up–0 Down | \(-\) 4J | 4J | 8J | 000 |

2 | Down | 0 Up–4 Down | \(-\) 4J | 4J | ||

3 | Up | 3 Up–1 Down | \(-\) 2J | 2J | 4J | 001 |

4 | Down | 1 Up–3 Down | \(-\) 2J | 2J | ||

5 | Up | 2 Up–2 Down | 0 | 0 | 0 | 010 |

6 | Down | 2 Up–2 Down | 0 | 0 | ||

7 | Up | 1 Up–3 Down | 2J | \(-\) 2J | \(-\) 4J | 011 |

8 | Down | 3 Up–1 Down | 2J | \(-\) 2J | ||

9 | Up | 0 Up–4 Down | 4J | \(-\) 4J | \(-\) 8J | 100 |

10 | Down | 4 Up–0 Down | 4J | \(-\) 4J |

## Parallelization

In a Monte Carlo Metropolis iteration, each memory word is updated at least once. The iterations must be performed enough times to yield accurate outcome energy. The given lattice could be vertically divided into \(N_p\) sub-lattices with equal sizes, where \(N_p\) is the number of processes. Computational load of each sub-lattice is assigned to the processes 0 to \(N_p-1\) from left to right. Each process creates a sub-lattice of the specific size, initializes the sub-lattice, performs all Monte Carlo iterations and calculates the energy of the sub-lattice using Eq. (2). When all individual processes calculate the energy of their own sub-lattice, the energies of the sub-lattices are added up, through a Map-Reduce operation, to calculate the total energy of the lattice. However, this approach results in two problems. As illustrated in Fig. 3, half of the neighbors of the memory words on the border, are placed in the sub-lattice of neighbor process. Therefore, to calculate the energy of the memory words on the border, some memory words of the side sub-lattice are needed. Therefore, these memory words have to be observed when needed. Moreover, we should note that neighbor memory words should not be updated simultaneously by different processes.

Now, we turn to the first problem. As mentioned before, in each phase half of a sub-lattice is updated. Before a process starts updating the half of the sub-lattice, it should receive the corresponding border memory words of the neighbor process. Suppose that the process p2 is going to update the left half of its sub-lattice in phase 1. It waits to receive the right-side border memory words of the process p1. The process p1 sends its right border memory words to the process p2 asynchronously just after it accomplishes the phase 2 of the last iteration. After p2 receives the border memory words from p1 synchronously, it starts updating the memory words in the phase 1. Just after finishing the phase 1, p2 sends its updated left-side border memory words to p1 asynchronously and goes to the phase 2. The similar procedure occurs for other processes as well. It should be mentioned that we use periodic boundary conditions thereby the left neighbor of the first process is the last process, and likewise the right neighbor of the last process is the first process.

## Implementation

*N*is an arbitrary integer bigger than one. We execute the algorithm on \(N_p\) processes and each process is identified by an integer number, 0 to \(N_p -1\), called rank. Each process is responsible for \(N_c\) columns of the memory-word lattice where \(N_c=\frac{21N}{N_p}\). Each individual process creates its own sub-lattice, initializes it, gets all Monte Carlo iterations done and calculates the energy of the sub-lattice for a specific temperature. Within each Monte Carlo iteration a sub-lattice is updated many times and the energy of iteration is calculated. Finally, the total energy of the memory-word lattice for a specific temperature is obtained via a reduce operation. This operation is illustrated in Fig. 4. Now, every step of the algorithm is studied in details.

### Initialization

### Updating

As mentioned before, updating process is done in two phases. In each phase, one half of the sub-lattice is updated. In cases where \(N_c\) is odd, \(\hbox {floor}(N_c/2)\) columns are updated in phase 1 and the rest of the columns is updated in phase 2. Before starting the update process in each phase, some inter-process communication should be carried out.

which means that the left neighbor of the process with the rank 0 is the process \(N_p-1\).

The received column of the memory words is stored in the last column of the S-lattice which has been reserved for the border memory words of the neighbor process (Fig. 5e).

### Calculating the Energy of a Monte Carlo Iteration

*E*. In each iteration, the energy of one 3bit-group is extracted and is added to rv where rv retains the sum of energies of the 3bit-groups. Therefore, rv contains the total energy of the 21 3bit-groups.

## Results

Three different cases which have been examined in this paper

Test cases | Number of iterations | | Average number of updates per spin in an iteration |
---|---|---|---|

1 | 5000 | 96 | 10 |

2 | 4500 | 96 | 10 |

3 | 4500 | 48 | 10 |

Now, we are able to inspect the impact of the lattice dimension and the number of Monte Carlo iterations on the performance of our algorithm. Comparing the test cases 2 and 3, it is inferred that bigger lattice sizes get better speedup and efficiency. In addition, the comparison between the test cases 1 and 2, we can claim when the number of Monte Carlo iterations increases, better speedup and efficiency is deduced. Therefore, our algorithm has better performance for bigger lattice sizes and more Monte Carlo iterations.

## Footnotes

## Notes

## References

- 1.Ising, E.: Beitrag zur theorie des ferromagnetismus. Z. Phys.
**31**(1), 253–258 (1925)ADSCrossRefGoogle Scholar - 2.Onsager, L.: Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev.
**65**, 117–149 (1944)ADSMathSciNetCrossRefGoogle Scholar - 3.Baxter, R.J.: Exactly Solved Models in Statistical Mechanics. Courier Corporation, North Chelmsford (2013)zbMATHGoogle Scholar
- 4.Deskins, W.R., Brown, G., Thompson, S.H., Rikvold, P.A.: Kinetic monte carlo simulations of a model for heat-assisted magnetization reversal in ultrathin films. Phys. Rev. B
**84**, 094431 (2011)ADSCrossRefGoogle Scholar - 5.Kozubski, R., Kozlowski, M., Wrobel, J., Wejrzanowski, T., Kurzydlowski, K.J., Goyhenex, C., Pierron-Bohnes, V., Rennhofer, M., Malinov, S.: Atomic ordering in nano-layered FePt: multiscale monte carlo simulation. Comput. Mater. Sci.
**49**(1), 80–84 (2010)CrossRefGoogle Scholar - 6.Lyberatos, A., Parker, G.J.: Cluster monte carlo methods for the FePt hamiltonian. J. Magn. Magn. Mater.
**400**, 266–270 (2016)ADSCrossRefGoogle Scholar - 7.Masrour, R., Bahmad, L., Hamedoun, M., Benyoussef, A., Hlil, E.K.: The magnetic properties of a decorated ising nanotube examined by the use of the Monte Carlo simulations. Solid State Commun.
**162**, 53–56 (2013)ADSCrossRefGoogle Scholar - 8.Müller, M., Albe, K.: Lattice monte carlo simulations of FePt nanoparticles: influence of size, composition, and surface segregation on order-disorder phenomena. Phys. Rev. B
**72**, 094203 (2005)ADSCrossRefGoogle Scholar - 9.Yang, B., Asta, M., Mryasov, O.N., Klemmer, T.J., Chantrell, R.W.: Equilibrium Monte Carlo simulations of A1-L10 ordering in FePt nanoparticles. Scr. Mater.
**53**(4), 417–422 (2005)CrossRefGoogle Scholar - 10.Swendsen, R.H., Wang, J.-S.: Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett.
**58**, 86–88 (1987)ADSCrossRefGoogle Scholar - 11.Wolff, U.: Collective Monte Carlo updating for spin systems. Phys. Rev. Lett.
**62**, 361 (1989)ADSCrossRefGoogle Scholar - 12.Jacobs, L., Rebbi, C.: Multi-spin coding: a very efficient technique for Monte Carlo simulations of spin systems. J. Comput. Phys.
**41**(1), 203–210 (1981)ADSCrossRefGoogle Scholar - 13.Williams, G.O., Kalos, M.H.: A new multispin coding algorithm for Monte Carlo simulation of the Ising model. J. Stat. Phys.
**37**(3), 283–299 (1984)ADSCrossRefGoogle Scholar - 14.Zorn, R., Herrmann, H.J., Rebbi, C.: Tests of the multi-spin-coding technique in Monte Carlo simulations of statistical systems. Comput. Phys. Commun.
**23**(4), 337–342 (1981)ADSCrossRefGoogle Scholar - 15.Block, B., Virnau, P., Preis, T.: Multi-GPU accelerated multi-spin Monte Carlo simulations of the 2D ising model. Comput. Phys. Commun.
**181**(9), 1549–1556 (2010)ADSCrossRefGoogle Scholar - 16.Block, B.J., Preis, T.: Computer simulations of the ising model on graphics processing units. Eur. Phys. J. Special Top.
**210**(1), 133–145 (2012)ADSCrossRefGoogle Scholar - 17.Hawick, K.A., Leist, A., Playne, D.P.: Regular lattice and small-world spin model simulations using CUDA and GPUs. Int. J. Parallel Program.
**39**(2), 183–201 (2011)CrossRefGoogle Scholar - 18.Komura, Y., Okabe, Y.: GPU-based swendsenwang multi-cluster algorithm for the simulation of two-dimensional classical spin systems. Comput. Phys. Commun.
**183**(6), 1155–1161 (2012)ADSCrossRefGoogle Scholar - 19.Komura, Y., Okabe, Y.: Gpu-based single-cluster algorithm for the simulation of the Ising model. J. Comput. Phys.
**231**(4), 1209–1215 (2012)ADSMathSciNetCrossRefGoogle Scholar - 20.Preis, T., Virnau, P., Paul, W., Schneider, J.J.: GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model. J. Comput. Phys.
**228**(12), 4468–4477 (2009)ADSCrossRefGoogle Scholar - 21.Komura, Y., Okabe, Y.: CUDA programs for the GPU computing of the swendsenwang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models. Comput. Phys. Commun.
**185**(3), 1038–1043 (2014)ADSCrossRefGoogle Scholar - 22.Altevogt, P., Linke, A.: Parallelization of the two-dimensional Ising model on a cluster of IBM RISC system/6000 workstations. Parallel Comput.
**19**(9), 1041–1052 (1993)CrossRefGoogle Scholar - 23.Ito, N.: Parallelization of the Ising simulation. Int. J. Mod. Phys. C
**4**(6), 1131–1135 (1993)ADSCrossRefGoogle Scholar - 24.Wansleben, S., Zabolitzky, J.G., Kalle, C.: Monte Carlo simulation of Ising models by multispin coding on a vector computer. J. Stat. Phys.
**37**(3), 271–282 (1984)ADSCrossRefGoogle Scholar - 25.Barkema, G.T., MacFarland, T.: Parallel simulation of the Ising model. Phys. Rev. E
**50**, 1623–1628 (1994)ADSCrossRefGoogle Scholar - 26.Kaupuzs, J., Rimsans, J., Melnik, R.V.N.: Parallelization of the wolff single-cluster algorithm. Phys. Rev. E
**81**, 026701 (2010)ADSCrossRefGoogle Scholar - 27.Weigel, M.: Simulating spin models on GPU. Comput. Phys. Commun.
**182**(9), 1833–1836 (2011)ADSCrossRefGoogle Scholar - 28.Petrov, G.M., Davis, J.: Parallelization of an implicit algorithm for multi-dimensional particle-in-cell simulations. Commun. Comput. Phys.
**16**(3), 599–611 (2014)MathSciNetCrossRefGoogle Scholar - 29.Geng, W.: Parallel higher-order boundary integral electrostatics computation on molecular surfaces with curved triangulation. J. Comput. Phys.
**241**, 253–265 (2013)ADSMathSciNetCrossRefGoogle Scholar - 30.Keppens, R., Meliani, Z., van Marle, A.J., Delmont, P., Vlasis, A., van der Holst, B.: Parallel, grid-adaptive approaches for relativistic hydro and magnetohydrodynamics. J. Comput. Phys.
**231**(3), 718–744 (2012)ADSMathSciNetCrossRefGoogle Scholar - 31.Oger, G., Le Touz, D., Guibert, D., de Leffe, M., Biddiscombe, J., Soumagne, J., Piccinali, J.-G.: On distributed memory mpi-based parallelization of SPH codes in massive HPC context. Comput. Phys. Commun.
**200**, 1–14 (2016)ADSMathSciNetCrossRefGoogle Scholar - 32.Cheng, J., Liu, X., Liu, T., Luo, H.: A parallel, high-order direct discontinuous galerkin method for the Navier–Stokes equations on 3D hybrid grids. Commun. Comput. Phys.
**21**(5), 1231–1257 (2017)MathSciNetCrossRefGoogle Scholar - 33.Leboeuf, J.-N.G., Decyk, V.K., Newman, D.E., Sanchez, R.: Implementation of 2D domain decomposition in the UCAN gyrokinetic particle-in-cell code and resulting performance of UCAN2. Commun. Comput. Phys.
**19**(1), 205–225 (2016)MathSciNetCrossRefGoogle Scholar - 34.Wang, K., Liu, H., Chen, Z.: A scalable parallel black oil simulator on distributed memory parallel computers. J. Comput. Phys.
**301**, 19–34 (2015)ADSMathSciNetCrossRefGoogle Scholar - 35.Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys.
**21**(6), 1087–1092 (1953)ADSCrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.