Advertisement

On the Benefits of Tasking with OpenMP

  • Alejandro RicoEmail author
  • Isaac Sánchez Barrera
  • Jose A. Joao
  • Joshua Randall
  • Marc Casas
  • Miquel Moretó
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)

Abstract

Tasking promises a model to program parallel applications that provides intuitive semantics. In the case of tasks with dependences, it also promises better load balancing by removing global synchronizations (barriers), and potential for improved locality. Still, the adoption of tasking in production HPC codes has been slow. Despite OpenMP supporting tasks, most codes rely on worksharing-loop constructs alongside MPI primitives. This paper provides insights on the benefits of tasking over the worksharing-loop model by reporting on the experience of taskifying an adaptive mesh refinement proxy application: miniAMR. The performance evaluation shows the taskified implementation being 15–30% faster than the loop-parallel one for certain thread counts across four systems, three architectures and four compilers thanks to better load balancing and system utilization. Dynamic scheduling of loops narrows the gap but still falls short of tasking due to serial sections between loops. Locality improvements are incidental due to the lack of locality-aware scheduling. Overall, the introduction of asynchrony with tasking lives up to its promises, provided that programmers parallelize beyond individual loops and across application phases.

Keywords

Tasking OpenMP Parallelism Scaling 

Notes

Acknowledgments

This work was in collaboration with Cray and funded in part by the DOE ECP PathForward program. It has been partially supported by the Spanish Government through Programa Severo Ochoa (contract SEV-2015-0493), by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328), by the European Unions’s Horizon 2020 Framework Programme under the Mont-Blanc project (grant agreement number 779877), and by the Arm-BSC Centre of Excellence initiative. I. Sánchez Barrera has been partially supported by the Spanish Ministry of Education, Culture and Sport under Formación del Profesorado Universitario fellowship number FPU15/03612. M. Casas has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2017-23269. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104.

References

  1. 1.
    Atkinson, P., McIntosh-Smith, S.: On the performance of parallel tasking runtimes for an irregular fast multipole method application. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 92–106. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-65578-9_7CrossRefGoogle Scholar
  2. 2.
    Ayguadé, E., et al.: A proposal for task parallelism in OpenMP. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 1–12. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-69303-1_1CrossRefGoogle Scholar
  3. 3.
    Berger, M.J., Colella, P.: Local adaptive mesh refinement for shock hydrodynamics. J. Comput. Phys. 82, 64–84 (1989).  https://doi.org/10.1016/0021-9991(89)90035-1CrossRefzbMATHGoogle Scholar
  4. 4.
    Berger, M.J., Oliger, J.: Adaptive mesh refinement for hyperbolic partial differential equations. J. Comput. Phys. 53, 484–512 (1984).  https://doi.org/10.1016/0021-9991(84)90073-1MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-79561-2_9CrossRefGoogle Scholar
  6. 6.
    Duran, A., Perez, J.M., Ayguadé, E., Badia, R.M., Labarta, J.: Extending the OpenMP tasking model to allow dependent tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-79561-2_10CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Garcia-Gasulla, M., Mantovani, F., Josep-Fabrego, M., Eguzkitza, B., Houzeaux, G.: Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int. J. High Perform. Comput. Appl. (2019).  https://doi.org/10.1177/1094342019842919CrossRefGoogle Scholar
  9. 9.
    Gautier, T., Perez, C., Richard, J.: On the impact of OpenMP task granularity. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 205–221. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-98521-3_14CrossRefGoogle Scholar
  10. 10.
    Heroux, M.A., et al.: Improving performance via mini-applications. Technical report. SAND2009-5574, Sandia National Laboratories (2009). http://www.mantevo.org/MantevoOverview.pdf
  11. 11.
    Mantevo Project. https://mantevo.org/
  12. 12.
    MiniAMR Adaptive Mesh Refinement (AMR) Mini-app. https://github.com/Mantevo/miniAMR
  13. 13.
    Rico, A., Ramirez, A., Valero, M.: Available task-level parallelism on the cell BE. Sci. Program. 17(1–2), 59–76 (2009).  https://doi.org/10.3233/SPR-2009-0269CrossRefGoogle Scholar
  14. 14.
    Sasidharan, A., Snir, M.: MiniAMR - a miniapp for adaptive mesh refinement. Technical report. University of Illinois Urbana-Champaign (2016). http://hdl.handle.net/2142/91046
  15. 15.
    Teruel, X., Klemm, M., Li, K., Martorell, X., Olivier, S.L., Terboven, C.: A proposal for task-generating loops in OpenMP*. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 1–14. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40698-0_1CrossRefGoogle Scholar
  16. 16.
    Vaughan, C.T., Barrett, R.F.: Enabling tractable exploration of the performance of adaptive mesh refinement. In: 2015 IEEE International Conference on Cluster Computing, pp. 746–752 (2015).  https://doi.org/10.1109/CLUSTER.2015.129
  17. 17.
    Vidal, R., et al.: Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 60–72. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24595-9_5CrossRefGoogle Scholar
  18. 18.
    Virouleau, P., et al.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11454-5_2CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Arm ResearchAustinUSA
  2. 2.Barcelona Supercomputing CenterBarcelonaSpain
  3. 3.Universitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations