Performance-oriented development of irregular, unstructured and unbalanced parallel applications in the N-MAP environment
Performance prediction methods and tools based on analytical models often fail in forecasting the performance of real systems due to inappropriateness of model assumptions, irregularities in the problem structure that cannot be described within the modeling formalism, unstructured execution behavior that leads to unforeseen system states, etc. Prediction accuracy and tractability is acceptable for systems with deterministic operational characteristics, for static, regularly structured problems, and non-changing environments.
In this work we present a method and the corresponding tools that we have developed to support a performance-oriented development process of parallel software. The N-MAP environment incorporates tools for the specification and early evaluation of skeletal program designs from a performance viewpoint, providing the possibility for the application developer to investigate performance critical design choices far ahead of coding the program. Program skeletons are incrementally refined to the full implementation under N-MAP's performance supervision, i.e. the real code instead of an (analytical) performance model is “engineered”. We demonstrate the use of N-MAP for the development of a challenging application with extensive irregularities in the execution behavior, unstructured communication patterns and dynamically varying workload characteristics, thus resisting an automatic parallelization by a compiler and the respective runtime system, but also being prohibitive to classical “model based” performance prediction.
KeywordsPerformance Prediction Parallel Programming Task Level Parallelism Irregular Problems Parallel Simulation Time Warp CM-5 Cluster Computing
Unable to display preview. Download preview PDF.
- 1.G. Agrawal, A. Sussman, and J. Saltz. Efficient Runtime Support for Parallelizing Block Structured Applications. In Proc. of the Scalable High Performance Computing Conference, pages 158–167. IEEE CS Press, 1994.Google Scholar
- 3.M. Calzarossa and G. Serazzi. Workload Characterization: A Survey. In Proceedings of the IEEE, 1993.Google Scholar
- 4.Ch. D. Carothers, R. M. Fujimoto, and P. England. Effect of Communication Overheads on Time Warp Performance: An Experimental Study. In D. K. Arvind, Rajive Bagrodia, and Jason Yi-Bing Lin, editors, Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS '94), pages 118–125, July 1994.Google Scholar
- 6.G. Chiola and A. Ferscha. Performance Comparable Design of Efficient Synchronization Protocols for Distributed Simulation. In Proc. of MASCOTS'95, pages 343–348. IEEE Computer Society Press, 1995.Google Scholar
- 7.S. Das, R. Fujimoto, K. Panesar, D. Allison, and M. Hybinette. GTW: A Time Warp System for Shared Memory Multiprocessors. In J. D. Tew and S. Manivannan, editors, Proceedings of the 1994 Winter Simulation Conference, pages 1332–1339, 1994.Google Scholar
- 8.T. Fahringer and H.P. Zima. A Static Parameter based Performance Prediction Tool for Parallel Program. In Proc. 1993 ACM Int. Conf. on Supercomputing, July 1993, Tokyo, Japan, 1993.Google Scholar
- 10.A. Ferscha. Parallel and Distributed Simulation of Discrete Event Systems. In A. Y. Zomaya, editor, Parallel and Distributed Computing Handbook. McGraw-Hill, 1995.Google Scholar
- 11.A. Ferscha and G. Chiola. Accelerating the Evaluation of Parallel Program Performance Models using Distributed Simulation. In Proc. of. the 7 th Int. Conf. on Modelling Techniques and Tools for Computer Performance Evaluation., Lecture Notes in Computer Science, pages 231–252. Springer Verlag, 1994.Google Scholar
- 12.A. Ferscha and J. Johnson. Performance Oriented Development of SPMD Programs Based on Task Structure Specifications. In B. Buchberger and J. Volkert, editors, Parallel Processing: CONPAR94-VAPP VI, LNCS 854, pages 51–65. Springer Verlag, 1994.Google Scholar
- 13.A. Ferscha and J. Johnson. N-MAP: A Virtual Processor Discrete Event Simulation Tool for Performance Predicition in CAPSE. In Proceedings of the HICCS28. IEEE Computer Society Press, 1995. to appear.Google Scholar
- 14.R. M. Fujimoto. Performance of Time Warp under Sythetic Workloads. In D. Nicol, editor, Proc. of the SCS Multiconf. on Distributed Simulation, pages 23–28, 1990.Google Scholar
- 15.G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley A users' guide to PICL: a portable instrumented communication library. Technical Report ORNL/TM-11616, Oak Ridge National Laboratory, August 1990.Google Scholar
- 16.M. T. Heath and J. A. Etheridge. Visualizing Performance of Parallel Programs. Technical Report ORNL/TM-11813, Oak Ridge National Laboratory, May 1991.Google Scholar
- 17.Y-S. Hwang, B. Moon, Sh. Sharma, R. Das, and J. Saltz. Runtime Support to Parallelize Adaptive Irregular Programs. In L. L. Dongarra and B. Tourancheau, editors, Proc. of the 2 nd Workshop on Environments and Tools for Parallel Scientific Computing, pages 19–32. SIAM, 1994.Google Scholar
- 20.A. D. Malony. Performance Observability. PhD thesis, University of Illinois, Department of Computer Science, University of Illinois, 1304 W. Springfield Avenue, Urbana, IL 61801, October 1990.Google Scholar
- 24.P. L. Reiher, R. M. Fujimoto, S. Bellenot, and D. Jefferson. Cancellation Strategies in Optimistic Execution Systems. In Proceedings of the SCS Multiconference on Distributed Simulation Vol. 22 (1), pages 112–121. SCS, January 1990.Google Scholar
- 25.J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Technical report, Computer Systems Laboratory, Stanford University, CA 94305, 1993.Google Scholar