Productive Cluster Programming with OmpSs

  • Javier Bueno
  • Luis Martinell
  • Alejandro Duran
  • Montse Farreras
  • Xavier Martorell
  • Rosa M. Badia
  • Eduard Ayguade
  • Jesús Labarta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6852)

Abstract

Clusters of SMPs are ubiquitous. They have been traditionally programmed by using MPI. But, the productivity of MPI programmers is low because of the complexity of expressing parallelism and communication, and the difficulty of debugging. To try to ease the burden on the programmer new programming models have tried to give the illusion of a global shared-address space (e.g., UPC, Co-array Fortran). Unfortunately, these models do not support, increasingly common, irregular forms of parallelism that require asynchronous task parallelism. Other models, such as X10 or Chapel, provide this asynchronous parallelism but the programmer is required to rewrite entirely his application.

We present the implementation of OmpSs for clusters, a variant of OpenMP extended to support asynchrony, heterogeneity and data movement for task parallelism. As OpenMP, it is based on decorating an existing serial version with compiler directives that are translated into calls to a runtime system that manages the parallelism extraction and data coherence and movement. Thus, the same program written in OmpSs can run in a regular SMP machine, in clusters of SMPs, or even can be used for debugging with the serial version. The runtime uses the information provided by the programmer to distribute the work across the cluster while optimizes communications using affinity scheduling and caching of data.

We have evaluated our proposal with a set of kernels and the OmpSs versions obtain a performance comparable, or even superior, to the one obtained by the same version of MPI.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ayguade, E., Badia, R., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Orti, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: IWOMP: Evolving OpenMP in an Age of Extreme Parallelism, Dresden, Germany, pp. 154–167 (June 2009)Google Scholar
  2. 2.
    Basumallik, A., Eigenmann, R.: Towards automatic translation of openmp to mpi. In: Proceedings of the 19th Annual International Conference on Supercomputing, ICS 2005, pp. 189–198. ACM, New York (2005)Google Scholar
  3. 3.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30(8), 207–216 (1995)CrossRefGoogle Scholar
  4. 4.
    Bonachea, D.: GASNet Specification, v1.8. Technical report, U.C. Berkeley (2006)Google Scholar
  5. 5.
    Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)CrossRefGoogle Scholar
  6. 6.
    Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2005, New York, NY, USA (2005)Google Scholar
  7. 7.
    UPC Consortium. UPC Language Specifications v1.2 (May 2005)Google Scholar
  8. 8.
    Costa, J.J., Cortes, T., Martorell, X., Ayguade, E., Labarta, J.: Running OpenMP applications efficiently on an everything-shared SDSM. J. Parallel Distrib. Comput. (May 2006)Google Scholar
  9. 9.
    Duran, A., Pérez, J.M., Ayguadé, E., Badia, R.M., Labarta, J.: Extending the OpenMP Tasking Model to Allow Dependent Tasks. In: OpenMP in a New Era of Parallelism, pp. 111–122. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Ferrer, R., Planas, J., Bellens, P., Duran, A., Gonzalez, M., Martorell, X., Badia., R., Ayguade, E., Labarta, J.: Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL. In: Proceedings of the 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC 2010) (October 2010)Google Scholar
  11. 11.
    OpenMP ARB. OpenMP Application Program Interface, v. 3.0 (May 2008)Google Scholar
  12. 12.
    Josep, M., Perez, R.M.: Badia, and Jesus Labarta. A dependency-aware task-based programming environment for multi-core architectures. In: IEEE Int. Conference on Cluster Computing, pp. 142–151 (September 2008)Google Scholar
  13. 13.
    Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development 51(5), 593–604 (2007)CrossRefGoogle Scholar
  14. 14.
    Rico, A., Duran, A., Cabarcas, F., Ramirez, A., Etsion, Y., Valero, M.: Trace-driven Simulation of Multithreaded Applications. In: Proceedings of the 2011 ISPASS (to appear, 2011)Google Scholar
  15. 15.
    Ayguadé, E., Marjanovic, V., Labarta, J., Valero, M.: Effective communication and computation overlap with hybrid mpi/smpss. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 337–338. ACM, New York (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Javier Bueno
    • 1
    • 2
  • Luis Martinell
    • 1
  • Alejandro Duran
    • 1
  • Montse Farreras
    • 1
    • 2
  • Xavier Martorell
    • 1
    • 2
  • Rosa M. Badia
    • 1
    • 3
  • Eduard Ayguade
    • 1
    • 2
  • Jesús Labarta
    • 1
    • 2
  1. 1.Barcelona Supercomputing Center (BSC-CNS)Spain
  2. 2.Universitat Politècnica de Catalunya (UPC)Spain
  3. 3.Artificial Intelligence Research Institute (IIIA) - Spanish National Research Council (CSIC)Spain

Personalised recommendations