Advertisement

Static Approximation of MPI Communication Graphs for Optimized Process Placement

  • Andrew J. McPherson
  • Vijay Nagarajan
  • Marcelo Cintra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8967)

Abstract

Message Passing Interface (MPI) is the de facto standard for programming large scale parallel programs. Static understanding of MPI programs informs optimizations including process placement and communication/computation overlap, and debugging. In this paper, we present a fully context and flow sensitive, interprocedural, best-effort analysis framework to statically analyze MPI programs. We instantiate this to determine an approximation of the point-to-point communication graph of an MPI program. Our analysis is the first pragmatic approach to realizing the full point-to-point communication graph without profiling – indeed our experiments show that we are able to resolve and understand 100 % of the relevant MPI call sites across the NAS Parallel Benchmarks. In all but one case, this only requires specifying the number of processes.

To demonstrate an application, we use the analysis to determine process placement on a Chip MultiProcessor (CMP) based cluster. The use of a CMP-based cluster creates a two-tier system, where inter-node communication can be subject to greater latencies than intra-node communication. Intelligent process placement can therefore have a significant impact on the execution time. Using the 64 process versions of the benchmarks, and our analysis, we see an average of 28 % (7 %) improvement in communication localization over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement.

Keywords

Message Passing Interface Execution Path Communication Graph Flow Sensitivity Process Placement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We thank Rajiv Gupta, Michael O’Boyle and the anonymous reviewers for their helpful comments for improving the paper. This research is supported by EPSRC grant EP/L000725/1 and an Intel early career faculty award to the University of Edinburgh.

References

  1. 1.
    GCC: GNU compiler collection. http://gcc.gnu.org
  2. 2.
    Agarwal, T., Sharma, A., Laxmikant, A., Kalé, L.V.: Topology-aware task mapping for reducing communication contention on large parallel machines. In: IPDPS (2006)Google Scholar
  3. 3.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn, pp. 906–908. Addison-Wesley Longman Publishing Co., Inc, Boston (2006)Google Scholar
  4. 4.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. IJHPCA 5(3), 63–73 (1991). doi: 10.1177/109434209100500306 Google Scholar
  5. 5.
    Bronevetsky, G.: Communication-sensitive static dataflow for parallel message passing applications. In: CGO, pp. 1–12 (2009)Google Scholar
  6. 6.
    Cappello, F., Guermouche, A., Snir, M.: On communication determinism in parallel HPC applications. In: ICCCN, pp. 1–8 (2010)Google Scholar
  7. 7.
    Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In: ICS, pp. 353–360 (2006)Google Scholar
  8. 8.
    Danalis, A., Pollock, L.L., Swany, D.M., Cavazos, J.: MPI-aware compiler optimizations for improving communication-computation overlap. In: ICS, pp. 316–325 (2009)Google Scholar
  9. 9.
    Duesterwald, E., Gupta, R., Soffa, M.L.: Demand-driven computation of interprocedural data flow. In: POPL, pp. 37–48 (1995)Google Scholar
  10. 10.
    Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Dongarra, J., Kacsuk, P., Kranzlmüller, D. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Faraj, A., Yuan, X.,: Communication characteristics in the NAS parallel benchmarks. In: IASTED PDCS, pp. 724–729 (2002)Google Scholar
  12. 12.
    Grove, D., Torczon, L.: Interprocedural constant propagation: a study of jump function implementations. In: PLDI, pp. 90–99 (1993)Google Scholar
  13. 13.
    Gupta, R., Soffa, M.L.: A framework for partial data flow analysis. In: ICSM, pp. 4–13 (1994)Google Scholar
  14. 14.
    Hall, M.W., Mellor-Crummey, J.M., Carle, A., Rodríguez, R.G.: FIAT: a framework for interprocedural analysis and transformation. In: Banerjee, U., Gelernter, D., Nicolau, Alexandru, Padua, David A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 522–545. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  15. 15.
    Huang, C., Lawlor, O.S., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, Lawrence (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  16. 16.
    Huang, C., Zheng, G., Kalé, L.V., Kumar, S.: Performance evaluation of adaptive MPI. In: PPOPP, pp. 12–21 (2006)Google Scholar
  17. 17.
    Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(1), 291–307 (1970)CrossRefMATHGoogle Scholar
  18. 18.
    Kreaseck, B., Strout, M.M., Hovland, P.: Depth analysis of MPI programs. In: AMP (2010)Google Scholar
  19. 19.
    Mercier, G., Jeannot, E.: Improving MPI applications performance on multicore clusters with rank reordering. In: Cotronis, Y., Danalis, A., Dongarra, J., Nikolopoulos, D.S. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 39–49. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  20. 20.
    Mohr, B., Wolf, F.: KOJAK – a tool set for automatic performance analysis of parallel programs. In: Böszörményi, L., Hellwagner, H., Kosch, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1301–1304. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  21. 21.
    Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.-C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12, 69–80 (1996)Google Scholar
  22. 22.
    Preissl, R., Schulz, M., Kranzlmüller, D., de Supinski, B.R., Quinlan, D.J.: Using MPI communication patterns to guide source code transformations. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008, Part III. LNCS, vol. 5103, pp. 253–260. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  23. 23.
    Sameer, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)CrossRefGoogle Scholar
  24. 24.
    Shires, D.R., Pollock, L.L., Sprenkle, S.: Program flow graph construction for static analysis of MPI programs. In: PDPTA, pp. 1847–1853 (1999)Google Scholar
  25. 25.
    Strout, M.M., Kreaseck, B., Hovland, P.D.: Data-flow analysis for MPI programs. In: ICPP, pp. 175–184 (2006)Google Scholar
  26. 26.
    Vetter, J.S., McCracken, M.O.: Statistical scalability analysis of communication operations in distributed applications. In: PPOPP, pp. 123–132 (2001)Google Scholar
  27. 27.
    Xue, R., Liu, X., Wu, M., Guo, Z., Chen, W., Zheng, W., Zhang, Z., Voelker, G.M.: MPIWiz: subgroup reproducible replay of MPI applications. In: PPOPP, pp. 251–260 (2009)Google Scholar
  28. 28.
    Zhai, J., Sheng, T., He, J., Chen, W., Zheng, W.: FACT: fast communication trace collection for parallel applications through program slicing. In: SC (2009)Google Scholar
  29. 29.
    Zhang, J., Zhai, J., Chen, W., Zheng, W.: Process mapping for MPI collective communications. In: Sips, H., Epema, D., Lin, H. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 81–92. Springer, Heidelberg (2009) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Andrew J. McPherson
    • 1
  • Vijay Nagarajan
    • 1
  • Marcelo Cintra
    • 2
  1. 1.School of InformaticsUniversity of EdinburghEdinburghScotland
  2. 2.IntelIntelGermany

Personalised recommendations