Performance-oriented development of irregular, unstructured and unbalanced parallel applications in the N-MAP environment

  • Alois Ferscha
  • Allen D. Malony
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 977)


Performance prediction methods and tools based on analytical models often fail in forecasting the performance of real systems due to inappropriateness of model assumptions, irregularities in the problem structure that cannot be described within the modeling formalism, unstructured execution behavior that leads to unforeseen system states, etc. Prediction accuracy and tractability is acceptable for systems with deterministic operational characteristics, for static, regularly structured problems, and non-changing environments.

In this work we present a method and the corresponding tools that we have developed to support a performance-oriented development process of parallel software. The N-MAP environment incorporates tools for the specification and early evaluation of skeletal program designs from a performance viewpoint, providing the possibility for the application developer to investigate performance critical design choices far ahead of coding the program. Program skeletons are incrementally refined to the full implementation under N-MAP's performance supervision, i.e. the real code instead of an (analytical) performance model is “engineered”. We demonstrate the use of N-MAP for the development of a challenging application with extensive irregularities in the execution behavior, unstructured communication patterns and dynamically varying workload characteristics, thus resisting an automatic parallelization by a compiler and the respective runtime system, but also being prohibitive to classical “model based” performance prediction.


Performance Prediction Parallel Programming Task Level Parallelism Irregular Problems Parallel Simulation Time Warp CM-5 Cluster Computing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Agrawal, A. Sussman, and J. Saltz. Efficient Runtime Support for Parallelizing Block Structured Applications. In Proc. of the Scalable High Performance Computing Conference, pages 158–167. IEEE CS Press, 1994.Google Scholar
  2. 2.
    P. Brinch Hansen. Householder Reduction of Linear Equations. ACM Computing Surveys, 24(2):185–194, June 1992.CrossRefGoogle Scholar
  3. 3.
    M. Calzarossa and G. Serazzi. Workload Characterization: A Survey. In Proceedings of the IEEE, 1993.Google Scholar
  4. 4.
    Ch. D. Carothers, R. M. Fujimoto, and P. England. Effect of Communication Overheads on Time Warp Performance: An Experimental Study. In D. K. Arvind, Rajive Bagrodia, and Jason Yi-Bing Lin, editors, Proceedings of the 8th Workshop on Parallel and Distributed Simulation (PADS '94), pages 118–125, July 1994.Google Scholar
  5. 5.
    G. Chiola and A. Ferscha. Distributed Simulation of Petri Nets. IEEE Parallel and Distributed Technology, 1(3):33–50, August 1993.CrossRefGoogle Scholar
  6. 6.
    G. Chiola and A. Ferscha. Performance Comparable Design of Efficient Synchronization Protocols for Distributed Simulation. In Proc. of MASCOTS'95, pages 343–348. IEEE Computer Society Press, 1995.Google Scholar
  7. 7.
    S. Das, R. Fujimoto, K. Panesar, D. Allison, and M. Hybinette. GTW: A Time Warp System for Shared Memory Multiprocessors. In J. D. Tew and S. Manivannan, editors, Proceedings of the 1994 Winter Simulation Conference, pages 1332–1339, 1994.Google Scholar
  8. 8.
    T. Fahringer and H.P. Zima. A Static Parameter based Performance Prediction Tool for Parallel Program. In Proc. 1993 ACM Int. Conf. on Supercomputing, July 1993, Tokyo, Japan, 1993.Google Scholar
  9. 9.
    A. Ferscha. A Petri Net Approach for Performance Oriented Parallel Program Design. Journal of Parallel and Distributed Computing, 15(3):188–206, July 1992.CrossRefMathSciNetGoogle Scholar
  10. 10.
    A. Ferscha. Parallel and Distributed Simulation of Discrete Event Systems. In A. Y. Zomaya, editor, Parallel and Distributed Computing Handbook. McGraw-Hill, 1995.Google Scholar
  11. 11.
    A. Ferscha and G. Chiola. Accelerating the Evaluation of Parallel Program Performance Models using Distributed Simulation. In Proc. of. the 7 th Int. Conf. on Modelling Techniques and Tools for Computer Performance Evaluation., Lecture Notes in Computer Science, pages 231–252. Springer Verlag, 1994.Google Scholar
  12. 12.
    A. Ferscha and J. Johnson. Performance Oriented Development of SPMD Programs Based on Task Structure Specifications. In B. Buchberger and J. Volkert, editors, Parallel Processing: CONPAR94-VAPP VI, LNCS 854, pages 51–65. Springer Verlag, 1994.Google Scholar
  13. 13.
    A. Ferscha and J. Johnson. N-MAP: A Virtual Processor Discrete Event Simulation Tool for Performance Predicition in CAPSE. In Proceedings of the HICCS28. IEEE Computer Society Press, 1995. to appear.Google Scholar
  14. 14.
    R. M. Fujimoto. Performance of Time Warp under Sythetic Workloads. In D. Nicol, editor, Proc. of the SCS Multiconf. on Distributed Simulation, pages 23–28, 1990.Google Scholar
  15. 15.
    G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley A users' guide to PICL: a portable instrumented communication library. Technical Report ORNL/TM-11616, Oak Ridge National Laboratory, August 1990.Google Scholar
  16. 16.
    M. T. Heath and J. A. Etheridge. Visualizing Performance of Parallel Programs. Technical Report ORNL/TM-11813, Oak Ridge National Laboratory, May 1991.Google Scholar
  17. 17.
    Y-S. Hwang, B. Moon, Sh. Sharma, R. Das, and J. Saltz. Runtime Support to Parallelize Adaptive Irregular Programs. In L. L. Dongarra and B. Tourancheau, editors, Proc. of the 2 nd Workshop on Environments and Tools for Parallel Scientific Computing, pages 19–32. SIAM, 1994.Google Scholar
  18. 18.
    D. A. Jefferson. Virtual Time. ACM Transactions on Programming Languages and Systems, 7(3):404–425, July 1985.CrossRefGoogle Scholar
  19. 19.
    I. O. Mahgoub and A. K. Elmagarmid. Performance Analysis of a Generalized Class of m-Level Hierarchical Multiprocessor Systems. IEEE Transactions on Parallel and Distributed Systems, 3(2):129–138, March 1992.CrossRefGoogle Scholar
  20. 20.
    A. D. Malony. Performance Observability. PhD thesis, University of Illinois, Department of Computer Science, University of Illinois, 1304 W. Springfield Avenue, Urbana, IL 61801, October 1990.Google Scholar
  21. 21.
    A. D. Malony, D. A. Reed, and H. A. G. Wijshoff. Performance Measurement Intrusion and Perturbation Analysis. IEEE Transactions on Parallel and Distributed Systems, 3(4):433–450, July 1992.CrossRefGoogle Scholar
  22. 22.
    D. C. Marinescu, J. E. Lumpp, T. L. Casavant, and H. J. Siegel. Models for Monitoring and Debugging Tools for Parallel and Distributed Software. Journal of Parallel and Distributed Computing, 9:171–184, 1990.CrossRefGoogle Scholar
  23. 23.
    B. P. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S.-S. Lim, and T. Torzewski. IPS-2: The Second Generation of a Parallel Program Measurement System. IEEE Transactions on Parallel and Distributed Systems, 1(2):206–217, April 1990.CrossRefGoogle Scholar
  24. 24.
    P. L. Reiher, R. M. Fujimoto, S. Bellenot, and D. Jefferson. Cancellation Strategies in Optimistic Execution Systems. In Proceedings of the SCS Multiconference on Distributed Simulation Vol. 22 (1), pages 112–121. SCS, January 1990.Google Scholar
  25. 25.
    J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared Memory. Technical report, Computer Systems Laboratory, Stanford University, CA 94305, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Alois Ferscha
    • 1
  • Allen D. Malony
    • 2
  1. 1.Institut für Angewandte Informatik, und InformationssystemeUniversität WienViennaAustria
  2. 2.Computer Science DepartmentUniversity of OregonEugeneUSA

Personalised recommendations