Colloquium: Large scale simulations on GPU clusters

  • Massimo Bernaschi
  • Mauro Bisson
  • Massimiliano Fatica
Colloquium

Abstract

Graphics processing units (GPU) are currently used as a cost-effective platform for computer simulations and big-data processing. Large scale applications require that multiple GPUs work together but the efficiency obtained with cluster of GPUs is, at times, sub-optimal because the GPU features are not exploited at their best. We describe how it is possible to achieve an excellent efficiency for applications in statistical mechanics, particle dynamics and networks analysis by using suitable memory access patterns and mechanisms like CUDA streams, profiling tools, etc. Similar concepts and techniques may be applied also to other problems like the solution of Partial Differential Equations.

Keywords

Statistical and Nonlinear Physics 

References

  1. 1.
    NVIDIA CUDA Compute Unified Device Architecture Programming Guide, http://www.nvidia.com/cuda
  2. 2.
    J. Glaser, T.D. Nguyen, J.A. Anderson, P. Lui, F. Spiga, J.A. Millan, D.C. Morse, S.C. Glotzer, Comput. Phys. Commun. 192, 97 (2015) ADSCrossRefGoogle Scholar
  3. 3.
    M. Bernaschi, G. Parisi, L. Parisi, Comput. Phys. Commun. 182, 6 (2011) CrossRefGoogle Scholar
  4. 4.
    T. Preis, P. Virnau, W. Paul, J. Schneider, J. Comput. Phys. 228, 4468 (2009) MATHADSCrossRefGoogle Scholar
  5. 5.
    M. Weigel, Comput. Phys. Commun. 182, 1833 (2011) ADSCrossRefGoogle Scholar
  6. 6.
    M. Weigel, J. Comput. Phys. 231, 3064 (2012) MATHADSCrossRefGoogle Scholar
  7. 7.
    M. Lulli, M. Bernaschi, G. Parisi, accepted in Comput. Phys. Commun. Google Scholar
  8. 8.
    M. Bernaschi, G. Amati, M. Bisson, S. Melchionna, S. Succi, Comput. Phys. Commun. 184, 2 (2012) Google Scholar
  9. 9.
    M. Bisson, M. Bernaschi, S. Melchionna, Commun. Comput. Phys. 10, 1077 (2011) Google Scholar
  10. 10.
    G. Karypis, V. Kumar, SIAM J. Sci. Comput. 20, 359 (1999) MATHMathSciNetCrossRefGoogle Scholar
  11. 11.
    C. Chevalier, F. Pellegrini, Parallel Comput. 34, 318 (2008) MathSciNetCrossRefGoogle Scholar
  12. 12.
    C. Begau, G. Sutmann, Comput. Phys. Commun. 190, 51 (2015) ADSCrossRefGoogle Scholar
  13. 13.
    D. Merrill, M. Garland, A. Grimshaw, Scalable gpu graph traversal, in Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’12 (ACM, New York, 2012), pp. 117–128 Google Scholar
  14. 14.
    T. Hiragushi, D. Takahashi, in Algorithms and Architectures for Parallel Processing, Lect. Notes Computer Science (Springer, 2013), Vol. 8286, pp. 40–50 Google Scholar
  15. 15.
    G. Karypis, V. Kumar, SIAM J. Sci. Comput. 20, 359 (1999) MATHMathSciNetCrossRefGoogle Scholar
  16. 16.
    M. Bernaschi, M. Bisson, E. Mastrostefano, submitted to IEEE Transactions on Distributed and Parallel Systems, arXiv:1408.1605 (2014) Google Scholar
  17. 17.
    N. Satish, C. Kim, J. Chhugani, P. Dubey, Large-scale Energy-efficient Graph Traversal: A Path to Efficient Data-intensive Supercomputing, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012 Google Scholar
  18. 18.
    F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A.R. Choudhury, Y. Sabharwal, Breaking the speed and scalability barriers for graph exploration on distributed-memory machines, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2012 Google Scholar

Copyright information

© EDP Sciences, SIF, Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Massimo Bernaschi
    • 1
  • Mauro Bisson
    • 2
  • Massimiliano Fatica
    • 2
  1. 1.Istituto per le Applicazioni del Calcolo, National Research Council of ItalyRomaItaly
  2. 2.NVIDIA Corporation, 2701 San Tomas ExpresswaySanta ClaraUSA

Personalised recommendations