Efficiently Implementing Monte Carlo Electrostatics Simulations on Multicore Accelerators

  • Marcus Holm
  • Sverker Holmgren
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7134)


The field of high-performance computing is highly dependent on increasingly complex computer architectures. Parallel computing has been the norm for decades, but hardware architectures like the Cell Broadband Engine (Cell/BE) and General Purpose GPUs (GPGPUs) introduce additional complexities and are difficult to program efficiently even for well-suited problems. Efficiency is taken to include both maximizing the performance of the software and minimizing the programming effort required. With the goal of exposing the challenges facing a domain scientist using these types of hardware, in this paper we discuss the implementation of a Monte Carlo simulation of a system of charged particles on the Cell/BE and for GPUs. We focus on Coulomb interactions because their long-range nature prohibits using cut-offs to reduce the number of calculations, making simulations very expensive. The goal was to encapsulate the computationally expensive component of the program in a way so as to be useful to domain researchers with legacy codes. Generality and flexibility were therefore just as important as performance. Using the GPU and Cell/BE library requires only small changes in the simulation codes we’ve seen and yields programs that run at or near the theoretical peak performance of the hardware.


Monte Carlo GPU Cell electrostatics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arevalo, A., Matinata, R.M., Pandian, M., Peri, E., Ruby, K., Thomas, F., Almond, C.: Programming for the Cell Broadband Engine. IBM Redbooks (2008)Google Scholar
  2. 2.
    Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: International Conference for High Performance Computing, Networking, Storage and Analysis (2008)Google Scholar
  3. 3.
    Davidson, A., Owens, J.D.: Toward Techniques for Auto-tuning GPU Algorithms. In: Jónasson, K. (ed.) PARA 2010, Part II. LNCS, vol. 7134, pp. 110–119. Springer, Heidelberg (2012)Google Scholar
  4. 4.
    Farber, R.: Cuda, supercomputing for the masses. Dr Dobbs (2008)Google Scholar
  5. 5.
    Khan, M.O.: Polymer Electrostatics: From DNA to Polyampholytes. PhD thesis, Lund University (2001)Google Scholar
  6. 6.
    Khan, M.O., Kennedy, G., Chan, D.Y.C.: A scalable parallel monte carlo method for free energy simulations of molecular systems. Journal of Computational Chemistry 26(1), 72–77 (2005)CrossRefGoogle Scholar
  7. 7.
    Nyland, L., Harris, M., Prins, J.: GPU Gems 3. In: Fast N-Body Simulation with CUDA, ch.31. Pearson Education, Inc. (2007)Google Scholar
  8. 8.
    Williams, S., Carter, J., Oliker, L., Shalf, J., Yelick, K.: Lattice Boltzmann simulation optimization on leading multicore platforms. In: IEEE International Symposium on Parallel and Distributed Processing, Vols. 1-8 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marcus Holm
    • 1
  • Sverker Holmgren
    • 1
  1. 1.Department of Information TechnologyUppsala UniversitySweden

Personalised recommendations