Advertisement

Nonintrusive AMR Asynchrony for Communication Optimization

  • Muhammad Nufail FarooqiEmail author
  • Didem UnatEmail author
  • Tan Nguyen
  • Weiqun Zhang
  • Ann Almgren
  • John Shalf
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10417)

Abstract

Adaptive Mesh Refinement (AMR) is a well known method for efficiently solving partial differential equations. A straightforward AMR algorithm typically exhibits many synchronization points even during a single time step, where costly communication often degrades the performance. This problem will be even more pronounced on future supercomputers containing billion way parallelism, which will raise the communication cost further. Re-designing AMR algorithms to avoid synchronization is not a viable solution due to the large code size and complex control structures. We present a nonintrusive asynchronous approach to hiding the effects of communication in an AMR application. Specifically, our approach reasons about data dependencies automatically using domain knowledge about AMR applications, allowing asynchrony to be discovered with only a modest amount of code modification. Using this approach, we optimize the synchronous AMR algorithm in the BoxLib software framework without severely affecting the productivity of the application programmer. We observe around 27–31% performance improvement for an advection solver on the Hazel Hen supercomputer using 12288 cores.

Keywords

Asynchronous execution Adaptive mesh refinement AMR algorithm Communication hiding 

Notes

Acknowledgements

Authors from Koç University are supported by the Turkish Science and Technology Research Centre Grant No: 215E185. Dr. Unat is supported by the Marie Sklodowska Curie Reintegration Grant 655965 by the European Commission. We acknowledge PRACE for awarding us access to the Hazel Hen supercomputer in Germany. Authors from Lawrence Berkeley National Laboratory were supported by the Office of Advanced Scientific Computing Research in the Department of Energy Office of Science under contract number DE-AC02-05CH11231.

References

  1. 1.
    Boxlib: An AMR software framework. https://ccse.lbl.gov/BoxLib/
  2. 2.
    Enzo: AMR project. http://enzo-project.org/
  3. 3.
    Almgren, A.S., Beckner, V.E., Bell, J.B., Day, M.S., Howell, L.H., Joggerst, C.C., Lijewski, M.J., Nonaka, A., Singer, M., Zingale, M.: CASTRO: a new compressible astrophysical solver. I. Hydrodynamics and self-gravity. Astrophys. J. 715(2), 1221–1238 (2010)CrossRefGoogle Scholar
  4. 4.
    Almgren, A.S., Bell, J.B., Rendleman, C.A., Zingale, M.: Low Mach Number Modeling of Type la Supernovae. I. Hydrodynamics. Astrophys. J. 637(2), 922–936 (2006)CrossRefGoogle Scholar
  5. 5.
    Almgren, A., Bell, J., Lijewski, M., Lukić, Z., Van Andel, E.: Nyx: a massively parallel AMR code for computational cosmology. Astrophys. J. 765, 39 (2013)CrossRefGoogle Scholar
  6. 6.
    Ang, J., Barrett, R., Benner, R., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S., Hemmert, K., Kelly, S., Le, H., Leung, V., Resnick, D., Rodrigues, A., Shalf, J., Stark, D., Unat, D., Wright, N.: Abstract machine models and proxy architectures for exascale computing. In: 2014 Hardware-Software Co-Design for High Performance Computing, pp. 25–32. IEEE, November 2014Google Scholar
  7. 7.
    Bell, J.B., Day, M.S., Lijewski, M.J.: Simulation of nitrogen emissions in a premixed hydrogen flame stabilized on a low swirl burner. Proc. Combust. Inst. 34(1), 1173–1182 (2013)CrossRefGoogle Scholar
  8. 8.
    Berger, M.J., Oliger, J.: Adaptive mesh refinement for hyperbolic partial differential equations. J. Comput. Phys. 53(3), 484–512 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Chan, C.P., Bachan, J.D., Kenny, J.P., Wilke, J.J., Beckner, V.E., Almgren, A.S., Bell, J.B.: Topology-aware performance optimization and modeling of adaptive mesh refinement codes for exascale. In: Proceedings of 1st Workshop on Optimization of Communication in HPC, COM-HPC 2016, pp. 17–28. IEEE Press, Piscataway (2016)Google Scholar
  10. 10.
    Colella, P., Graves, D.T., Johnson, J.N., Johansen, H.S., Keen, N.D., Ligocki, T.J., Martin, D.F., Mccorquodale, P.W., Modiano, D., Schwartz, P.O., Sternberg, T.D., Straalen, B.V.: Chombo software package for AMR applications design document. Technical report (2003)Google Scholar
  11. 11.
    Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: Flash: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)CrossRefGoogle Scholar
  12. 12.
    Goodale, T., Allen, G., Lanfermann, G., Massó, J., Radke, T., Seidel, E., Shalf, J.: The cactus framework and toolkit: design and applications. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds.) VECPAR 2002. LNCS, vol. 2565, pp. 197–227. Springer, Heidelberg (2003). doi: 10.1007/3-540-36569-9_13 CrossRefGoogle Scholar
  13. 13.
    Kale, L.V., Krishnan, S.: Charm++: a portable concurrent object oriented system based on C++. In: Proceedings of Conference on Object Oriented Programming Systems, Languages and Applications, pp. 91–108 (1993)Google Scholar
  14. 14.
    Langer, A., Lifflander, J., Miller, P., Pan, K.C., Kalé, L.V., Ricker, P.: Scalable algorithms for distributed-memory adaptive mesh refinement. In: 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing, pp. 100–107, October 2012Google Scholar
  15. 15.
    MacNeice, P., Olson, K.M., Mobarry, C., de Fainchtein, R., Packer, C.: PARAMESH: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)CrossRefzbMATHGoogle Scholar
  16. 16.
    Meng, Q., Luitjens, J., Berzins, M.: Dynamic task scheduling for the Uintah framework. In: 2010 IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS), pp. 1–10. IEEE (2010)Google Scholar
  17. 17.
    Nguyen, T., Unat, D., Zhang, W., Almgren, A., Farooqi, N., Shalf, J.: Perilla: Metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 81:1–81:12. IEEE Press, Piscataway (2016)Google Scholar
  18. 18.
    Rendleman, C.A., Beckner, V.E., Lijewski, M., Crutchfield, W., Bell, J.B.: Parallelization of structured, hierarchical adaptive mesh refinement algorithms. Comput. Vis. Sci. 3(3), 147–157 (2000)CrossRefzbMATHGoogle Scholar
  19. 19.
    Unfer, T., Boeuf, J.P., Rogier, F., Thivet, F.: Multi-scale gas discharge simulations using asynchronous adaptive mesh refinement. Comput. Phys. Commun. 181(2), 247–258 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 53:1–53:12. IEEE Press, Piscataway (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Koç UniversityIstanbulTurkey
  2. 2.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations