Advertisement

Batch Solution of Small PDEs with the OPS DSL

  • Istvan Z. RegulyEmail author
  • Branden Moore
  • Tim Schmielau
  • Jacques du Toit
  • Gihan R. Mudalige
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11887)

Abstract

In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs.

Keywords

Domain Specific Language Stencil computations Batching 

Notes

Acknowledgements

István Reguly was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. Project no. PD 124905 has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the PD_17 funding scheme. Supported by the ÚNKP-18-4-PPKE-18 new National Excellence Program of the Ministry of Human Capacities.

References

  1. 1.
    OPS Library (2014). https://github.com/OP-DSL/OPS
  2. 2.
    Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2012)Google Scholar
  3. 3.
    Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014).  https://doi.org/10.1016/j.jpdc.2014.07.003CrossRefGoogle Scholar
  4. 4.
    Chandra, R., Dagum, L., Kohr, D., Menon, R., Maydan, D., McDonald, J.: Parallel Programming in OpenMP. Morgan Kaufmann, San Francisco (2001)Google Scholar
  5. 5.
    Deakin, T., Price, J., Martineau, M., McIntosh-Smith, S.: Evaluating attainable memory bandwidth of parallel programming models via babelstream. Int. J. Comput. Sci. Eng. 17(3), 247–262 (2018)Google Scholar
  6. 6.
    Gropp, W., Thakur, R., Lusk, E.: Using MPI-2: Advanced Features of the Message Passing Interface. MIT press, Cambridge (1999)CrossRefGoogle Scholar
  7. 7.
    Hornung, R.D., Keasler, J.A.: The RAJA portability layer: Overview and status. Technical report, Lawrence Livermore National Lab. (LLNL) (9 2014).  https://doi.org/10.2172/1169830
  8. 8.
    Hundsdorfer, W.: Accuracy and stability of splitting with stabilizing corrections. Appl. Numer. Math. 42(1–3), 213–233 (2002)MathSciNetCrossRefGoogle Scholar
  9. 9.
    In’t Hout, K., Welfert, B.: Stability of adi schemes applied to convection-diffusion equations with mixed derivative terms. Appl. Numer. Math. 57(1), 19–35 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    In’t Hout, K., Welfert, B.: Unconditional stability of second-order adi schemes applied to multi-dimensional diffusion equations with mixed derivative terms. Appl. Numer. Math. 59(3–4), 677–692 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Jammy, S.P., Mudalige, G.R., Reguly, I.Z., Sandham, N.D., Giles, M.: Block-structured compressible navier-stokes solution using the ops high-level abstraction. Int. J. Comput. Fluid Dyn. 30(6), 450–454 (2016).  https://doi.org/10.1080/10618562.2016.1243663MathSciNetCrossRefGoogle Scholar
  12. 12.
    Kronawitter, S., Kuckuk, S., Köstler, H., Lengauer, C.: Automatic data layout transformations in the exastencils code generator. Mod. Phys. Lett. A 28(03), 1850009 (2018)MathSciNetGoogle Scholar
  13. 13.
    László, E., Giles, M., Appleyard, J.: Manycore algorithms for batch scalar and block tridiagonal solvers. ACM Trans. Math. Softw. 42(4), 31:1–31:36 (2016).  https://doi.org/10.1145/2830568. http://doi.acm.org/10.1145/2830568MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    MacNeice, P., Olson, K.M., Mobarry, C., De Fainchtein, R., Packer, C.: Paramesh: a parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun. 126(3), 330–354 (2000)CrossRefGoogle Scholar
  15. 15.
    Mudalige, G.R., Reguly, I.Z., Giles, M.B., Mallinson, A.C., Gaudin, W.P., Herdman, J.A.: Performance analysis of a high-level abstractions-based hydrocode on future computing systems. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 85–104. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-17248-4_5CrossRefGoogle Scholar
  16. 16.
    Nath, R., Tomov, S., Dongarra, J.: An improved magma gemm for fermi graphics processing units. Int. J. High Perform. Comput. Appl. 24(4), 511–515 (2010)CrossRefGoogle Scholar
  17. 17.
    Nvidia, C.: Programming guide (2010)Google Scholar
  18. 18.
    Reguly, I.Z., Mudalige, G.R., Giles, M.B.: Loop tiling in large-scale stencil codes at run-time with OPS. IEEE Trans. Parallel Distrib. Syst. 29(4), 873–886 (2018).  https://doi.org/10.1109/TPDS.2017.2778161CrossRefGoogle Scholar
  19. 19.
    Reguly, I.Z., Mudalige, G.R., Giles, M.B., Curran, D., McIntosh-Smith, S.: The ops domain specific abstraction for multi-block structured grid computations. In: 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, pp. 58–67, November 2014.  https://doi.org/10.1109/WOLFHPC.2014.7
  20. 20.
    Siklosi, B., Reguly, I.Z., Mudalige, G.R.: Heterogeneous cpu-gpu execution of stencil applications. In: 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–80, November 2018.  https://doi.org/10.1109/P3HPC.2018.00010
  21. 21.
    Stone, J.E., Gohara, D., Shi, G.: Opencl: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)CrossRefGoogle Scholar
  22. 22.
    Tataru, G., Fisher, T.: Stochastic local volatility. Quantitative Development Group, Bloomberg Version 1(February 5) (2010)Google Scholar
  23. 23.
    Verwer, J.G., Spee, E.J., Blom, J.G., Hundsdorfer, W.: A second-order rosenbrock method applied to photochemical dispersion problems. SIAM J. Sci. Comput. 20(4), 1456–1480 (1999)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Wang, H.: A parallel method for tridiagonal equations. ACM Trans. Math. Software (TOMS) 7(2), 170–183 (1981)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Wyns, M., Du Toit, J.: A finite volume-alternating direction implicit approach for the calibration of stochastic local volatility models. Int. J. Comput. Math. 94(11), 2239–2267 (2017)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Zingale, M., et al.: Meeting the challenges of modeling astrophysical thermonuclear explosions: castro, maestro, and the amrex astrophysics suite. In: Journal of Physics: Conference Series, vol. 1031, p. 012024. IOP Publishing (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Istvan Z. Reguly
    • 1
    • 2
    Email author
  • Branden Moore
    • 3
  • Tim Schmielau
    • 3
  • Jacques du Toit
    • 3
  • Gihan R. Mudalige
    • 2
  1. 1.Faculty of Information Technology and BionicsPázmány Péter Catholic UniversityBudapestHungary
  2. 2.University of Warwick, Department of Computer ScienceCoventryUK
  3. 3.Numerical Algorithms Group Ltd.OxfordUK

Personalised recommendations