TaskUniVerse: A Task-Based Unified Interface for Versatile Parallel Execution

  • Afshin ZafariEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)


Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks to provide these outcomes to the programmer, while making the underlying hardware architecture transparent to her. However, since programs are not portable between these frameworks, using one framework or the other is still a vital decision by the programmer whose concerns are expandability, adaptivity, maintainability and interoperability of the programs. In this work, we propose a unified programming interface that a programmer can use for working with different task based parallel frameworks transparently. In this approach we abstract the common concepts of task based parallel programming and provide them to the programmer in a single programming interface uniformly for all frameworks. We have tested the interface by running programs which implement matrix operations within frameworks that are optimized for shared and distributed memory architectures and accelerators, while the cooperation between frameworks is configured externally with no need to modify the programs. Further possible extensions of the interface and future potential research are also described.


High Performance Computing Task based programming Parallel programming Unified interface 



Thanks to Assoc. Prof. Elisabeth Larsson ( for her valuable comments on improving the quality of this paper. The computations were performed on resources provided by SNIC through the resources provided by High Performance Computing Center North (HPC2N) under project SNIC2016-7-34.


  1. 1.
    Agullo, E., Giraud, L., Guermouche, A., Nakov, S., Roman, J.: Task-based conjugate gradient: from multi-GPU towards heterogeneous architectures. Research Report RR-8912, Inria, May 2016Google Scholar
  2. 2.
    Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Ltaief, H., Thibault, S., Tomov, S.: QR factorization on a multicore node enhanced with multiple GPU accelerators. In: 2011 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 932–943. IEEE (2011)Google Scholar
  3. 3.
    Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.: Achieving high performance on supercomputers with a sequential task-based programming model. Research Report RR-8927, Inria Bordeaux Sud-Ouest; Bordeaux INP; CNRS; Université de Bordeaux; CEA, June 2016Google Scholar
  4. 4.
    Agullo, E., Bramas, B., Coulaud, O., Khannouz, M., Stanisic, L.: Task-based fast multipole method for clusters of multicore processors. Research Report RR-8970, Inria Bordeaux Sud-Ouest, October 2016Google Scholar
  5. 5.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing (2014)Google Scholar
  6. 6.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009). CrossRefGoogle Scholar
  7. 7.
    Bauer, P., Engblom, S., Widgren, S.: Fast event-based epidemiological simulations on national scales. Int. J. High Perform. Comput. Appl. 30(4), 438–453 (2016)CrossRefGoogle Scholar
  8. 8.
    Bauer, P., Engblom, S., Widgren, S.: Fast event-based epidemiological simulations on national scales. Int. J. High Perform. Comput. Appl. 30, 438–453 (2016)CrossRefGoogle Scholar
  9. 9.
    Boillot, L., Bosilca, G., Agullo, E., Calandra, H.: Task-based programming for seismic imaging: preliminary results. In: 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and System (HPCC, CSS, ICESS), 2014 IEEE International Conference on High Performance Computing and Communications, pp. 1259–1266. IEEE (2014)Google Scholar
  10. 10.
    Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)CrossRefGoogle Scholar
  11. 11.
    Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguade, E., Labarta, J.: Productive cluster programming with OmpSs. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 555–566. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  12. 12.
    Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)CrossRefGoogle Scholar
  13. 13.
    Danalis, A., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J.: PTG: an abstraction for unhindered parallelism. In: 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 21–30. IEEE (2014)Google Scholar
  14. 14.
    Danalis, A., Jagode, H., Bosilca, G., Dongarra, J.: PaRSEC in practice: optimizing a legacy chemistry application through distributed task-based execution. In: 2015 IEEE International Conference on Cluster Computing, pp. 304–313. IEEE (2015)Google Scholar
  15. 15.
    del Rio Astorga, D., Dolz, M.F., Sanchez, L.M., Blas, J.G., García, J.D.: A C++ generic parallel pattern interface for stream processing. In: Carretero, J., Garcia-Blas, J., Ko, R.K.L., Mueller, P., Nakano, K. (eds.) ICA3PP 2016. LNCS, vol. 10048, pp. 74–87. Springer, Cham (2016). CrossRefGoogle Scholar
  16. 16.
    Ernstsson, A., Li, L., Kessler, C.: Skepu 2: flexible and type-safe skeleton programming for heterogeneous parallel systems. Int. J. Parallel Program. 46, 1–19 (2017)Google Scholar
  17. 17.
    Goude, A., Engblom, S.: Adaptive fast multipole methods on the GPU. J. Supercomput. 63(3), 897–918 (2013)CrossRefGoogle Scholar
  18. 18.
    Intel: Intel Threading Building Blocks (2017).
  19. 19.
    Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, p. 6. ACM (2014)Google Scholar
  20. 20.
    Lacoste, X., Faverge, M., Ramet, P., Thibault, S., Bosilca, G.: Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes. Research Report RR-8446, INRIA, January 2014Google Scholar
  21. 21.
    Martínez, V., Michéa, D., Dupros, F., Aumage, O., Thibault, S., Aochi, H., Navaux, P.O.: Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system. In: 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 1–8. IEEE (2015)Google Scholar
  22. 22.
    Ohshima, S., Katagiri, S., Nakajima, K., Thibault, S., Namyst, R.: Implementation of FEM application on GPU with StarPU. In: SIAM CSE13-SIAM Conference on Computational Science and Engineering (2013)Google Scholar
  23. 23.
    OpenMP-ARB: OpenMP 4.5 Specifications (2017).
  24. 24.
    Rubensson, E.H., Rudberg, E.: Chunks and tasks: a programming model for parallelization of dynamic algorithms. Parallel Comput. 40(7), 328–343 (2014)CrossRefGoogle Scholar
  25. 25.
    Rubensson, E.H., Rudberg, E.: Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model. arXiv preprint arXiv:1501.07800 (2015)
  26. 26.
    Tillenius, M.: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Sci. Comput. 37(6), C617–C642 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Tillenius, M., Larsson, E., Lehto, E., Flyer, N.: A scalable RBF-FD method for atmospheric flow. J. Comput. Phys. 298, 406–422 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Zafari, A., Larsson, E., Righero, M., Francavilla, M.A., Giordanengo, G., Vipiana, F., Vecchi, G.: Task parallel implementation of a solver for electromagnetic scattering problems. Technical report 2016–015, Uppsala University, Division of Scientific Computing (2016)Google Scholar
  29. 29.
    Zafari, A., Larsson, E., Tillenius, M.: DuctTeip: a task-based parallel programming framework for distributed memory architectures. Technical report 2016–010, Uppsala University, Division of Scientific Computing (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Division of Scientific Computing, Department of Information TechnologyUppsala UniversityUppsalaSweden

Personalised recommendations