Advertisement

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

  • Bastien Plazolles
  • Didier El Baz
  • Martin Spel
  • Vincent Rivola
  • Pascal Gegout
Article
Part of the following topical collections:
  1. Special Issue on Programming Models and Algorithms for Data Analysis in HPC Systems

Abstract

The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of the SIMD parallel codes on the K40 and K80 GPUs as well as on the Intel Xeon Phi. We emphasize on loop and task parallelism, multi-threading and vectorization, respectively. The experiments show that GPU and MIC permit one to decrease computing time by non negligeable factors, as compared to a parallel code implemented on a two sockets CPU (E5-2680-v2) which finally allows us to use these devices in operational conditions.

Keywords

Parallel computing Multi-core CPU Xeon Phi OpenMP GPU CUDA Numerical integrator Monte-Carlo simulations 

Notes

Acknowledgements

Dr. Didier El Baz and Dr. Bastien Plazolles gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research work. The authors wish also to thank Dr. D. Gazen and Dr. J. Escobar of Observatoire Midi-Pyrénées for their advices and the access to the cluster in Toulouse. The authors thank the DEDALE work group coordinated by CNES, France. Finally, the authors thank the reviewers for their useful suggestions in order to improve the manuscript.

References

  1. 1.
    Aldinucci, M., Pezzi, G.P., Drocco, M., Spampinato, C., Torquati, M.: Parallel visual data restoration on multi-gpgpus using stencil-reduce pattern. Int. J. High Perform. Comput. Appl. 29(4), 461–472 (2015)CrossRefGoogle Scholar
  2. 2.
    Boyer, V., El Baz, D., Elkihel, M.: Solving knapsack problems on GPU. Comput. Oper. Res. 39(1), 42–47 (2012). doi: 10.1016/j.cor.2011.03.014. http://www.sciencedirect.com/science/article/pii/S0305054811000876. Special Issue on knapsack problems and applications
  3. 3.
    Boyer, V., El Baz, D.: Recent advances on GPU computing in operations research. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 1778–1787 (2013). doi: 10.1109/IPDPSW.2013.45
  4. 4.
    Cuomo, S., Michele, P.D., Galletti, A., Marcellino, L.: A parallel pde-based numerical algorithm for computing the optical flow in hybrid systems. J. Comput. Sci. (2017). doi: 10.1016/j.jocs.2017.03.011. http://www.sciencedirect.com/science/article/pii/S1877750317303010
  5. 5.
    Farber, R.: Programming Intel’s Xeon Phi: a jumpstart introduction. http://www.drdobbs.com/parallel/programming-intels-xeon-phi-a-jumpstart/240144160
  6. 6.
    Gegout, P., Oberle, P., Desjardins, C., Moyard, J., Brunet, P.M.: Ray-tracing of GNSS signal through the atmosphere powered by CUDA, HMPP and GPUs technologies. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(5), 1592–1602 (2014). doi: 10.1109/JSTARS.2013.2272600 CrossRefGoogle Scholar
  7. 7.
    Hoover, W.E., States., U.: Algorithms for confidence circles and ellipses [microform]. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Service Rockville, MD (1984)Google Scholar
  8. 8.
    Hwang, K., Fox, G.C., Dongarra, J.: Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)Google Scholar
  9. 9.
    Ilg, M., Rogers, J., Costello, M.: Projectile Monte-Carlo trajectory analysis using a graphics processing unit. AIAA Atmos. Flight Mech. Conf. (2011). doi: 10.2514/6.2011-6266
  10. 10.
  11. 11.
    Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kaufmann, Burlington (2013)Google Scholar
  12. 12.
    Karsten, A., Mario, M.: Odeint. http://headmyshoulder.github.io/odeint-v2/
  13. 13.
    NVIDIA: Nvidia. CUDA 7.0 programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
  14. 14.
  15. 15.
  16. 16.
    Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IPDPS ’13. pp. 1085–1097. IEEE Computer Society, Washington, DC, USA (2013). doi: 10.1109/IPDPS.2013.44
  17. 17.
    Plazolles, B., Spel, M., Rivola, V., El Baz, D.: Monte-Carlo analysis of object reentry in earth s atmosphere based on taguchi method. In: Proceedings of the 8th European Symposium on Aerothermodynamics for Space Vehicle, Lisbon (2015)Google Scholar
  18. 18.
    Rahman, R.: Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers, 1st edn. Apress, Berkely (2013)CrossRefGoogle Scholar
  19. 19.
    Robert, C.P., Casella, G.: Monte-Carlo Statistical Methods. Springer, New York (2004)CrossRefzbMATHGoogle Scholar
  20. 20.
    Rocchi, M.B.L., Sisti, D., Ditroilo, M., A. Calavalle, R.P.: The misuse of the confidence ellipse in evaluating statokinesigram. Ital. J. Sport Sci. 12(2), 169–171 (2005). http://hdl.handle.net/11576/2504321
  21. 21.
    Rogers, J., Slegers, N.: Robust parafoil terminal guidance using massively parallel processing. AIAA Atmos. Flight Mech. Conf. (2013). doi: 10.2514/6.2012-4736
  22. 22.
    Saini, S., Jin, H., Jesperson, D., Cheung, S., Djomehri, J., Chang, J., Hood, R.: Early multi-node performance evaluation of a knights corner (KNC) based NASA supercomputer. In: IEEE 24th International Heterogeneity Computing Whorkshop (2015)Google Scholar
  23. 23.
    Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. CoRR abs/1302.1078 (2013). arxiv:1302.1078
  24. 24.
    Teodoro, G., Kurc, T., Kong, J., Cooper, L., Saltz, J.: Comparative performance analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: a case study from microscopy image analysis. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14, pp. 1063-1072. IEEE Computer Society, Washington, DC, USA (2014). doi: 10.1109/IPDPS.2014.111
  25. 25.
    ul Hasan Khan, A., Al-Mouhamed, M., Firdaus, L.: Evaluation of Global Synchronization for Iterative Algebra Algorithms on Many-Core. In: 2015 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). pp. 1–6 (2015). doi: 10.1109/SNPD.2015.7176173

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Bastien Plazolles
    • 1
    • 2
  • Didier El Baz
    • 1
  • Martin Spel
    • 2
  • Vincent Rivola
    • 2
  • Pascal Gegout
    • 3
    • 4
  1. 1.LAAS-CNRSUniversité de Toulouse, CNRSToulouseFrance
  2. 2.R.TechVerniolleFrance
  3. 3.Géosciences Environnement Toulouse (CNRS UMR5563)ToulouseFrance
  4. 4.Université de ToulouseToulouseFrance

Personalised recommendations