Advertisement

Performance Portability Analysis for Real-Time Simulations of Smoke Propagation Using OpenACC

  • Anne KüstersEmail author
  • Sandra Wienke
  • Lukas Arnold
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10524)

Abstract

Real-time simulations of smoke propagation during fires in complex geometries challenge engineers, physicists, mathematicians and computer scientists due to the complexity of fluid dynamics and the large number of involved physical and chemical processes. Recently, several application scenarios emerged that require real-time predictions during an incident to support the rescue teams. Therefore, we develop the CFD-based simulation software JuROr aiming to run in real-time by leveraging parallel computer architectures like CPUs and GPUs. For that, we parallelize the code with OpenACC directives that promise maintenance of a single source base by delegating some architecture-agnostic optimizations to the compiler. We investigate the performance portability of JuROr using PGI’s OpenACC implementation across four Intel CPUs and three NVIDIA GPUs. We present the achieved performance shares as part of a roofline model where we focus on traditionally-computed arithmetic code intensities, as well as on a measurement approach based on performance counters.

Keywords

Parallel CFD applications Fire safety engineering GPU computing OpenACC Performance portability Roofline model 

Notes

Acknowledgements

This study was performed within the project ORPHEUS funded by the Federal Ministry of Education and Research (BMBF) Program on ‘Research for Civil Security - Protection and Rescue in complex Disaster Situations’ (funding code 13N13266). Some simulations were performed with computing resources granted by RWTH Aachen University under project rwth0207.

References

  1. 1.
    BMBF funded research project, Optimierung der Rauchableitung und Personenführung in U-Bahnhöfen: Experimente und Simulationen (ORPHEUS) - Teilvorhaben: Brand- und Personenstromsimulationen in unterirdischen Verkehrsstationen (2015–2018). http://www.orpheus-projekt.de
  2. 2.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  3. 3.
    Han, L., et al.: FireGrid: an e-infrastructure for next-generation emergency response support. J. Parallel Distrib. Comput. 70(11), 1128–1141 (2010)CrossRefGoogle Scholar
  4. 4.
    Koo, S.-H.: Forecasting fire development with sensor-linked simulation, Dissertation, University of Edinburgh (2010)Google Scholar
  5. 5.
    Glimberg, S.L., Erleben, K., Bennetsen, J.: Smoke simulation for fire engineering using a multigrid method on graphics hardware. In: VRIPHYS, pp. 11–20. Eurographics Association (2009)Google Scholar
  6. 6.
    Daniel, N., Rein, G.: The Fire Navigator: forecasting the spread of building fires on the basis of sensor data, FPE Extra Issue 3, March 2016. http://www.sfpe.org/general/custom.asp?page=FPEExtraIssue3
  7. 7.
    Pennycook, S.J., Hammond, S.D., Wright, S.A., Herdman, J.A., Miller, I., Jarvis, S.A.: An investigation of the performance portability of OpenCL. J. Parallel Distrib. Comput. 73(11), 1439–1450 (2013)CrossRefGoogle Scholar
  8. 8.
    Lopez, M.G., Larrea, V.V., Joubert, W., Hernandez, O., Haidar, A., Tomov, S., Dongarra, J.: Towards achieving performance portability using directives for accelerators. In: Third Workshop on Accelerator Programming Using Directives (WACCPD), pp. 13–24 (2016)Google Scholar
  9. 9.
    Sabne, A., Sakdhnagool, P., Lee, S., Vetter, J.S.: Evaluating performance portability of OpenACC. In: Brodman, J., Tu, P. (eds.) LCPC 2014. LNCS, vol. 8967, pp. 51–66. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-17473-0_4 Google Scholar
  10. 10.
    Herdman, J.A., Gaudin, W.P., Perks, O., Beckingsale, D.A., Mallinson, A.C., Jarvis, S.A.: Achieving portability and performance through OpenACC. In: First Workshop on Accelerator Programming using Directives, pp. 19–26. IEEE Press (2014)Google Scholar
  11. 11.
    Nicolini, M., Miller, J., Wienke, S., Schlottke-Lakemper, M., Meinke, M., Müller, M.S.: Software cost analysis of GPU-accelerated aeroacoustics simulations in C++ with OpenACC. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 524–543. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46079-6_36 CrossRefGoogle Scholar
  12. 12.
    Calore, E., Gabbana, A., Kraus, J., Schifano, S.F., Tripiccione, R.: Performance and portability of accelerated lattice Boltzmann applications with OpenACC. Concurr. Comput. Pract. Exper. 28(12), 3485–3502 (2016)CrossRefGoogle Scholar
  13. 13.
    Wang, Y., Qin, Q., See, S.C.W., Lin, J.: Performance portability evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler. In: HPC China (2013)Google Scholar
  14. 14.
    Chorin, A.: Numerical solution of the Navier-Stokes equations. Math. Comput. 22, 745–762 (1968)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Smagorinsky, J.: General circulation experiments with the primitive equations. Mon. Weather Rev. 91(3), 99–164 (1963)CrossRefGoogle Scholar
  16. 16.
    JURECA, Jülich Research on Exascale Cluster Architectures. http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/JURECA_node.html
  17. 17.
    Top500.org, Top500 List, November 2016. https://www.top500.org/list/2016/11/
  18. 18.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Techn. Committee Comput. Archit. (TCCA) Newsl. 19–25 (1995). https://www.cs.virginia.edu/stream/
  19. 19.
    Deakin, T., McIntosh-Smith, S.: GPU-STREAM v1.0/ v3.1. https://github.com/UoB-HPC/GPU-STREAM
  20. 20.
    Deakin, T., Price, J., Martineau, M., McIntosh-Smith, S.: GPU-STREAM v2.0: benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 489–507. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46079-6_34 CrossRefGoogle Scholar
  21. 21.
    Danalis, A., Marin, G., McCurdy, C., Meredith, J., Roth, P., Spafford, K., Tipparaju, V., Vetter, J.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the Third Workshop on General-Purpose Computation on Graphics Processors (GPGPU 2010), pp. 63–74 (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.JSCForschungszentrum Jülich GmbHJülichGermany
  2. 2.IT CenterRWTH Aachen UniversityAachenGermany
  3. 3.JARA-HPCAachenGermany

Personalised recommendations