Accurate performance prediction for massively parallel systems and its applications

  • Jens Simon
  • Jens-Michael Wierum
Workshop 19 Performance Evaluation
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1124)


A performance prediction method is presented, which accurately predicts the expected program execution time on massively parallel systems. We consider distributed-memory architectures with SMD nodes and a fast communication network. The method is based on a relaxed task graph model, a queuing model, and a memory hierarchy model. The relaxed task graph is a compact representation of communicating processes of an application mapped onto the target machine. Simultaneous accesses to the resources of a multi-processor node are modeled by a queuing network. The execution time of the application is computed by an evaluation algorithm. An example application implemented on a massively parallel computer demonstrates the high accuracy of our model. Furthermore, two applications of our accurate prediction method are presented.


Performance Prediction Parallel Machine Task Graph Memory Hierarchy Loop Edge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [ACFS94]
    B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. Algorithmica, 12:72–109, 1994.CrossRefMathSciNetGoogle Scholar
  2. [BCKL94]
    R. Bianchini, M.E. Crovella, L. Kontothanassis, and T.J. LeBlanc. Alleviating memory contention in matrix computations on large-scale sharedmemory multiprocessors. Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, pages 56–65, October 1994.Google Scholar
  3. [CKP+93]
    D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. van Eicken. LogP: Towards a realistic model of parallel computation. Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1993.Google Scholar
  4. [FJ78]
    S. Fortune and J.Wyllie. Parallism in random access machines. Proceedings of the 10th Annual Symosium on Theory of Computing, pages 114–118, 1978.Google Scholar
  5. [HK95]
    S.E. Hambrush and A.A. Khokhar. C3: A parallel model for coarse-grained machines. Technical report, Purdue, University, January 1995.Google Scholar
  6. [KME92]
    A. Kapelnikov, R.R. Muntz, and M.D. Ercegevac. A methodology for performance analysis of parallel computations with looping constructs. Journal of Parallel and Distributed Computing, 14(2), February 1992.Google Scholar
  7. [Lav83]
    S.S. Lavenberg. Computer Performance Modeling Handbook. Academic Press, New York, 1983.Google Scholar
  8. [MB92]
    D.A. Menasce and L.A. Barroso. A methodology for performance evaluation of parallel applications in shared memory multiprocessors. Journal of Parallel and Distributed Computing, 14(1), January 1992.Google Scholar
  9. [MNT93]
    D.A. Menasce, S.H. Noh, and S.K. Tripath. A methodology for performance prediction of massively parallel applications. Proc. of the 5th IEEE Symposium on Parallel and Distributed Processing, pages 250–257, 1993.Google Scholar
  10. [MST94]
    Hermann Mierendorff, Helmut Schwanborn, and Maurizio Tazza. Performance modelling of grid problems — a case study on the SUPRENUM system. Parallel Computing 20, pages 1527–1546, 1994.Google Scholar
  11. [RR95]
    T. Rauber and G. Rünger. A computation model for the parallel solution of differential equations. Proceedings of the 5th Workshop on Compilers for Parallel Computers, pages 294–306, June 1995.Google Scholar
  12. [SW96a]
    J. Simon and J.-M. Wierum. On accurate performance prediction for massively parallel systems and its applications. Technical report, Paderborn Center for Parallel Computing, April 1996.Google Scholar
  13. [SW96b]
    J. Simon and J.-M. Wierum. Sequential performance versus scalability: Optimizing parallel LU-decomposition. Proc. of HPCN'96 in Lecture Notes in Computer Science 1067, pages 627–632, 1996.Google Scholar
  14. [Val90]
    L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.CrossRefGoogle Scholar
  15. [Zha91]
    X. Zhang. Performance measurement and modeling to evaluate various effects on a shared memory multiprocessor. IEEE Transactions on Software Engineering, 17(1):87–93, 1991.CrossRefGoogle Scholar
  16. [ZYC95]
    X. Zhang, Y. Yan, and R. Castaneda. Comparative performance evaluation of hot spot contention between min-based and ring-based shared-memory architectures. IEEE Transactions on Parallel and Distributed Systems, 6(8):872–886, 1995.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Jens Simon
    • 1
  • Jens-Michael Wierum
    • 1
  1. 1.Paderborn Center for Parallel Computing-PC2PaderbornGermany

Personalised recommendations