The Journal of Supercomputing

, Volume 34, Issue 2, pp 201–217 | Cite as

Communication Benchmarking and Performance Modelling of MPI Programs on Cluster Computers

Article

Abstract

This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message-passing communication and application programs on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines. It can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights to be made into the MPI communication performance of parallel computers, and in particular how performance is affected by network contention. The Performance Evaluating Virtual Parallel Machine (PEVPM) provides a simple, fast and accurate technique for modelling and predicting the performance of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, which is difficult to model accurately using other tools. Experiments with example parallel programs demonstrate that PEVPM gives accurate performance predictions on commodity clusters. We also show that modelling communication performance using average times rather than sampling from probability distributions can give misleading results, particularly for programs running on a large number of processors.

Keywords

parallel computing cluster computing performance modelling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    V. Adve. Analyzing the Behavior and Performance of Parallel Programs. PhD thesis, University of Wisconsin, Computer Sciences Department, December 1993.Google Scholar
  2. 2.
    R. Alur, C. Courcoubetis, and D. Dill. Model-checking for probabilistic real-time systems. In Proceedings of the 18th International Conference on Automata, Languages and Programming (LNCS 510), 1991.Google Scholar
  3. 3.
    R. Alur and D. Dill. A theory of timed automata. Theoretical Computer Science, 126:183–236, 1994.CrossRefGoogle Scholar
  4. 4.
    G. Amdahl. Validity of the single-processor approach to achieving large scale computing capabilities. Proceedings of the American Federation of Information Processing Societies, 30:483–485, 1967.Google Scholar
  5. 5.
    S. Fortune and J. Wylie. Parallelism in random access machines. In Proceedings of the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.Google Scholar
  6. 6.
    A. Grama, A. Gupta, and V. Kumar. Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Parallel and Distributed Technology, 1(3):12–21, 1993.CrossRefGoogle Scholar
  7. 7.
    W. Gropp and E. Lusk. Reproducible measurements of MPI performance characteristics. In Proceedings of the PVMMPI Users’ Group Meeting (LNCS 1697), pp. 11–18, 1999.Google Scholar
  8. 8.
    D. A. Grove and P. D. Coddington. Precise MPI performance measurement using MPIBench. In Proceedings of HPC Asia, September 2001.Google Scholar
  9. 9.
    D. A. Grove. Performance Modelling of Message-Passing Parallel Programs. PhD thesis, University of Adelaide, Department of Computer Science, January 2003.Google Scholar
  10. 10.
    D. A. Grove and P. D. Coddington. Modeling message-passing programs with a performance evaluating virtual parallel machine. Performance Evaluation: An International Journal, 60:165–187, 2005.CrossRefGoogle Scholar
  11. 11.
    K. Hawick, D. Grove, P. Coddington, and M. Buntine. Commodity cluster computing for computational chemistry. Internet Journal of Chemistry, 3(4), 2000.Google Scholar
  12. 12.
    A. J. Hey, A. N. Dunlop, and E. Hernández. Realistic parallel performance estimation. Parallel Computing, 23(1/2):5–21, 1997.CrossRefGoogle Scholar
  13. 13.
    C. Hoare. Communicating Sequential Processes. Prentice-Hall, Englewood Cliffs, New Jersey, 1985.Google Scholar
  14. 14.
    R. Hockney. Performance parameters and benchmarking of supercomputers. Parallel Computing, 17(10), 1991.Google Scholar
  15. 15.
    C. J. Hughes, V. S. Pai, P. Ranganathan, and S. V. Adve. Rsim: Simulating shared-memory multiprocessors with ILP processors. IEEE Computer, February 2002.Google Scholar
  16. 16.
    H. Jonkers. Performance Analysis of Parallel Systems: A Hybrid Approach. PhD thesis, Delft University of Technology, Information Technology and Systems, October 1995.Google Scholar
  17. 17.
    D. Kranzlmüller and J. Volkert. NOPE: A nondeterministic program evaluator. In Proceedings of the 4th International ACPC Conference (LNCS 1557), pp. 490–499, 1992.Google Scholar
  18. 18.
    J. Labarta, S. Girona, Pillet, C. A. T. V., and L. Gregoris. DiP: A parallel program development environment. In Proceedings of the 2nd International Euro-Par Conference, vol. II, pp. 665–674, August 1996.Google Scholar
  19. 19.
    P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Höllberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, February 2002.Google Scholar
  20. 20.
    P. Mehra, C. Schulback, and J. Yan. A comparison of two model-based performance prediction techniques for message-passing parallel programs. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 181–189, May 1994.Google Scholar
  21. 21.
    R. Milner. A calculus of communicating systems. In Lecture Notes in Computer Science (92). Springer-Verlag, New York, 1980.Google Scholar
  22. 22.
    P. J. Mucci, K. London, and J. Thurman. The MPBench report. Technical Report UT-CS-98-394, University of Tenessee, Department of Computer Science, November 1998.Google Scholar
  23. 23.
    Pallas GmbH. Pallas MPI benchmarks home page. http://www.pallas.com/e/produces/pmb/.
  24. 24.
    M. Parashar. Interpretive Performance Prediction for High Performance Parallel Computing. PhD thesis, Syracuse University, Department of Electrical and Computer Engineering, July 1994.Google Scholar
  25. 25.
    R. Reussner, P. Sanders, and J. Larsson Träff. SKaMPI: A comprehensive benchmark for public benchmarking of MPI. Scientific Computing, 10, 2001.Google Scholar
  26. 26.
    C. Schaubschläger. Automatic testing of nondeterministic programs in message passing systems. Master’s thesis, Johannes Kepler University Linz, Department for Computer Graphics and Parallel Processing, 2000.Google Scholar
  27. 27.
    A. van Gemund. Performance Modeling of Parallel Systems. PhD thesis, Delft University of Technology, Information Technology and Systems, April 1996.Google Scholar
  28. 28.
    F. Vaughan, D. Grove, and P. Coddington. Network performance issues in two high performance cluster computers. In Proceedings of the Australasian Computer Science Conference, February 2003.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity of AdelaideAdelaideAustralia

Personalised recommendations