\(\textsc {InKS}_{\textsf {}}\), a Programming Model to Decouple Performance from Algorithm in HPC Codes

  • Ksander EjjaaouaniEmail author
  • Olivier Aumage
  • Julien Bigot
  • Michel Mehrenberger
  • Hitoshi Murai
  • Masahiro Nakao
  • Mitsuhisa Sato
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)


Existing programming models tend to tightly interleave algorithm and optimization in HPC simulation codes. This requires scientists to become experts in both the simulated domain and the optimization process and makes the code difficult to maintain and port to new architectures. This paper proposes the \(\textsc {InKS}_{\textsf {}}\) programming model that decouples these two concerns with distinct languages for each. The simulation algorithm is expressed in the \(\textsc {InKS}_{\textsf {pia}}\) language with no concern for machine-specific optimizations. Optimizations are expressed using both a family of dedicated optimizations DSLs (\(\textsc {InKS}_{\textsf {O}}\)) and plain C++. \(\textsc {InKS}_{\textsf {O}}\) relies on the \(\textsc {InKS}_{\textsf {pia}}\) source to assist developers with common optimizations while C++ is used for less common ones. Our evaluation demonstrates the soundness of the approach by using it on synthetic benchmarks and the Vlasov-Poisson equation. It shows that \(\textsc {InKS}_{\textsf {}}\) offers separation of concerns at no performance cost.


Programming model Separation of concerns HPC DSL 


  1. 1.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exper. 23(2), 187–198 (2011). Scholar
  2. 2.
    Aumage, O., Bigot, J., Ejjaaouani, K., Mehrenberger, M.: INKS, a programming model to decouple performance from semantics in simulation codes. Technical report (2017).
  3. 3.
    Bailey, D.H., et al.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)CrossRefGoogle Scholar
  4. 4.
    Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)., Scholar
  5. 5.
    Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Menon, R.: Parallel Programming in OpenMP. Morgan Kaufmann Publishers Inc., San Francisco (2001)Google Scholar
  6. 6.
    Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: 2011 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 676–687. IEEE, May 2011.
  7. 7.
    Cosnard, M., Jeannot, E.: Compact DAG representation and its dynamicscheduling. J. Parallel Distrib. Comput. 58(3), 487–514 (1999)., Scholar
  8. 8.
    El-Ghazawi, T., Carlson, W., Sterling, T., Yelick, K.: UPC: distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing). Wiley, Hoboken (2005)CrossRefGoogle Scholar
  9. 9.
    Griebler, D., Löff, J., Fernandes, L., Mencagli, G., Danelutto, M.: Efficient NAS benchmark kernels with C++ parallel programming, January 2018Google Scholar
  10. 10.
    Höhnerbach, M., Ismail, A.E., Bientinesi, P.: The vectorization of the tersoff multi-body potential: an exercise in performance portability. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC 2016, pp. 7:1–7:13. IEEE Press, Piscataway (2016).
  11. 11.
    Hoque, R., Herault, T., Bosilca, G., Dongarra, J.: Dynamic task discovery in parsec: a data-flow task-based runtime. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems ScalA 2017, pp. 6:1–6:8. ACM, New York (2017).
  12. 12.
    Kamil, S.: Stencilprobe: a microbenchmark for stencil applications (2012). Accessed 25 Aug 2017
  13. 13.
    Kormann, K., Reuter, K., Rampp, M., Sonnendrücker, E.: Massively parallel semi-lagrangian solution of the 6D Vlasov-Poisson problem, October 2016Google Scholar
  14. 14.
    Lee, J., Sato, M.: Implementation and performance evaluation of xcalablemp: a parallel programming language for distributed memory systems. In: 2010 39th International Conference on Parallel Processing Workshops, pp. 413–420, September 2010.
  15. 15.
    Mehrenberger, M., Steiner, C., Marradi, L., Crouseilles, N., Sonnendrücker, E., Afeyan, B.: Vlasov on GPU (vog project)******. In: Proceedings ESAIM, vol. 43, pp. 37–58 (2013). Scholar
  16. 16.
    Steuwer, M., Remmelg, T., Dubach, C.: Lift: a functional data-parallel IR for high-performance GPU code generation. In: 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 74–85, February 2017.
  17. 17.
    Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The Pochoir stencil compiler. In: Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures SPAA 2011, pp. 117–128. ACM (2011).
  18. 18.
    Tanno, H., Iwasaki, H.: Parallel skeletons for variable-length lists in sketo skeleton library. In: Proceedings of the 15th International Euro-Par Conference on Parallel Processing Euro-Par 2009, pp. 666–677. Springer, Heidelberg (2009). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ksander Ejjaaouani
    • 1
    • 2
    Email author
  • Olivier Aumage
    • 3
    • 4
  • Julien Bigot
    • 1
  • Michel Mehrenberger
    • 5
  • Hitoshi Murai
    • 6
  • Masahiro Nakao
    • 6
  • Mitsuhisa Sato
    • 6
  1. 1.Maison de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ Université Paris-SaclayGif-sur-YvetteFrance
  2. 2.InriaNancyFrance
  3. 3.InriaBordeauxFrance
  4. 4.LaBRIBordeauxFrance
  5. 5.IRMAUniversité de StrasbourgStrasbourgFrance
  6. 6.Riken AICSKobeJapan

Personalised recommendations