Quantifying the Potential Task-Based Dataflow Parallelism in MPI Applications

  • Vladimir Subotic
  • Roger Ferrer
  • Jose Carlos Sancho
  • Jesús Labarta
  • Mateo Valero
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6852)

Abstract

Task-based parallel programming languages require the programmer to partition the traditional sequential code into smaller tasks in order to take advantage of the existing dataflow parallelism inherent in the applications. However, obtaining the partitioning that achieves optimal parallelism is not trivial because it depends on many parameters such as the underlying data dependencies and global problem partitioning. In order to help the process of finding a partitioning that achieves high parallelism, this paper introduces a framework that a programmer can use to: 1) estimate how much his application could benefit from dataflow parallelism; and 2) find the best strategy to expose dataflow parallelism in his application. Our framework automatically detects data dependencies among tasks in order to estimate the potential parallelism in the application. Furthermore, based on the framework, we develop an interactive approach to find the optimal partitioning of code. To illustrate this approach, we present a case study of porting High Performance Linpack from MPI to MPI/SMPSs. The presented approach requires only superficial knowledge of the studied code and iteratively leads to the optimal partitioning strategy. Finally, the environment provides visualization of the simulated MPI/SMPSs execution, thus allowing the developer to qualitatively inspect potential parallelization bottlenecks.

Keywords

Parallel Machine High Parallelism Target Machine Input Code Potential Parallelism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Top500 List: List of top 500 supercomputers, http://www.top500.org/
  2. 2.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. J. Parallel Distrib. Comput. 37, 55–69 (1996)CrossRefGoogle Scholar
  3. 3.
    Carpenter, P.M., Ramirez, A., Ayguade, E.: Starsscheck: A tool to find errors in task-based parallel programs. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6271, pp. 2–13. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Girona, S., Labarta, J., Badia, R.M.: Validation of dimemas communication model for mpi collective operations. In: PVM/MPI, pp. 39–46 (2000)Google Scholar
  5. 5.
    Kale, V., Gropp, W.: Load Balancing for Regular Meshes on SMPs with MPI. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 229–238. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Leijen, D., Hall, J.: Parallel performance: Optimize managed code for multi-core machines. MSDN Magazine (2007)Google Scholar
  7. 7.
    Mak, J., Faxén, K.-F., Janson, S., Mycroft, A.: Estimating and Exploiting Potential Parallelism by Source-Level Dependence Profiling. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6271, pp. 26–37. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Marjanovic, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In: ICS, pp. 5–16 (2010)Google Scholar
  9. 9.
    Nethercote, N., Seward, J.: Valgrind, http://valgrind.org/
  10. 10.
    Pérez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: CLUSTER, pp. 142–151 (2008)Google Scholar
  11. 11.
    Proposed Industry Standard. Openmp: A proposed industry standard api for shared memory programmingGoogle Scholar
  12. 12.
    Reinders, J.: Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O’Reilly Media, Inc., Sebastopol (2007)Google Scholar
  13. 13.
    Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI: The Complete Reference. The MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Subotic, V., Sancho, J.C., Labarta, J., Valero, M.: A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications. In: CLUSTER 2010 (2010)Google Scholar
  15. 15.
    Wall, D.W.: Limits of Instruction-Level Parallelism. In: ASPLOS (1991)Google Scholar
  16. 16.
    Zhang, X., Navabi, A., Jagannathan, S.: Alchemist: A transparent dependence distance profiling infrastructure. In: CGO 2009 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Vladimir Subotic
    • 1
  • Roger Ferrer
    • 1
  • Jose Carlos Sancho
    • 1
  • Jesús Labarta
    • 1
  • Mateo Valero
    • 1
  1. 1.Barcelona Supercomputing CenterUniversitat Politecnica de CatalunyaSpain

Personalised recommendations