Accurate performance prediction for massively parallel systems and its applications
A performance prediction method is presented, which accurately predicts the expected program execution time on massively parallel systems. We consider distributed-memory architectures with SMD nodes and a fast communication network. The method is based on a relaxed task graph model, a queuing model, and a memory hierarchy model. The relaxed task graph is a compact representation of communicating processes of an application mapped onto the target machine. Simultaneous accesses to the resources of a multi-processor node are modeled by a queuing network. The execution time of the application is computed by an evaluation algorithm. An example application implemented on a massively parallel computer demonstrates the high accuracy of our model. Furthermore, two applications of our accurate prediction method are presented.
KeywordsPerformance Prediction Parallel Machine Task Graph Memory Hierarchy Loop Edge
Unable to display preview. Download preview PDF.
- [BCKL94]R. Bianchini, M.E. Crovella, L. Kontothanassis, and T.J. LeBlanc. Alleviating memory contention in matrix computations on large-scale sharedmemory multiprocessors. Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, pages 56–65, October 1994.Google Scholar
- [CKP+93]D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. van Eicken. LogP: Towards a realistic model of parallel computation. Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1993.Google Scholar
- [FJ78]S. Fortune and J.Wyllie. Parallism in random access machines. Proceedings of the 10th Annual Symosium on Theory of Computing, pages 114–118, 1978.Google Scholar
- [HK95]S.E. Hambrush and A.A. Khokhar. C3: A parallel model for coarse-grained machines. Technical report, Purdue, University, January 1995.Google Scholar
- [KME92]A. Kapelnikov, R.R. Muntz, and M.D. Ercegevac. A methodology for performance analysis of parallel computations with looping constructs. Journal of Parallel and Distributed Computing, 14(2), February 1992.Google Scholar
- [Lav83]S.S. Lavenberg. Computer Performance Modeling Handbook. Academic Press, New York, 1983.Google Scholar
- [MB92]D.A. Menasce and L.A. Barroso. A methodology for performance evaluation of parallel applications in shared memory multiprocessors. Journal of Parallel and Distributed Computing, 14(1), January 1992.Google Scholar
- [MNT93]D.A. Menasce, S.H. Noh, and S.K. Tripath. A methodology for performance prediction of massively parallel applications. Proc. of the 5th IEEE Symposium on Parallel and Distributed Processing, pages 250–257, 1993.Google Scholar
- [MST94]Hermann Mierendorff, Helmut Schwanborn, and Maurizio Tazza. Performance modelling of grid problems — a case study on the SUPRENUM system. Parallel Computing 20, pages 1527–1546, 1994.Google Scholar
- [RR95]T. Rauber and G. Rünger. A computation model for the parallel solution of differential equations. Proceedings of the 5th Workshop on Compilers for Parallel Computers, pages 294–306, June 1995.Google Scholar
- [SW96a]J. Simon and J.-M. Wierum. On accurate performance prediction for massively parallel systems and its applications. Technical report, Paderborn Center for Parallel Computing, April 1996.Google Scholar
- [SW96b]J. Simon and J.-M. Wierum. Sequential performance versus scalability: Optimizing parallel LU-decomposition. Proc. of HPCN'96 in Lecture Notes in Computer Science 1067, pages 627–632, 1996.Google Scholar