Adaptively Increasing Performance and Scalability of Automatically Parallelized Programs

  • Jaejin Lee
  • H. D. K. Moonesinghe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2481)

Abstract

This paper presents adaptive execution techniques that determine whether automatically parallelized loops are executed parallelly or sequentially in order to maximize performance and scalability. The adaptation and performance estimation algorithms are implemented in a compiler preprocessor. The preprocessor inserts code that automatically determines at compile-time or at run-time the way the parallelized loops are executed. Using a set of standard numerical applications written in Fortran77 and running them with our techniques on a distributed shared memory multiprocessor machine (SGI Origin2000), we obtain the performance of our techniques, on average, 26%, 20%, 16%, and 10% faster than the original parallel program on 32, 16, 8, and 4 processors, respectively. One of the applications runs even more than twice faster than its original parallel version on 32 processors.

References

  1. 1.
    Alpern, B., et al.: The Jalapeño Virtual Machine. IBM Systems Journal 39(1), 211–238 (2000)CrossRefGoogle Scholar
  2. 2.
    Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)Google Scholar
  3. 3.
    Byler, M., Davies, J., Huson, C., Leasure, B., Wolfe, M.: Multiple Version Loops. In: Proceedings of the International Conference on Parallel Processing (ICPP), August 1987, pp. 312–318 (1987)Google Scholar
  4. 4.
    Cascaval, C., DeRose, L., Padua, D.A., Reed, D.: Compile-Time Based Performance Prediction. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 365–379. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Manon, R.: Paralle Programming in OpenMP. Morgan Kaufmann Publisher, San Francisco (2001)Google Scholar
  6. 6.
    Cox, A.L., Fowler, R.J.: Adaptive Cache Coherency for Detecting Migratory Shared Data. In: Proceedings of the 20th International Symposium on Computer Architectur, May 1993, pp. 98–108 (1993)Google Scholar
  7. 7.
    Diniz, P., Rinard, M.: Dynamic Feedback: An Effective Technique for Adaptive Computing. In: Proceedings of the ACM SIGPLAN Conference on Program Language Design and Implementation, June 1997, pp. 71–84 (1997)Google Scholar
  8. 8.
    Gupta, R., Bodik, R.: Adaptive Loop Transformations for Scientific Programs. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing, October 1995, pp. 368–375 (1995)Google Scholar
  9. 9.
    Holzle, U., Ungar, D.: Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1994, pp. 326–336 (1994)Google Scholar
  10. 10.
    Lee, J.: Compilation Techniques for Explicitly Parallel Programs. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, Department of Computer Science Technical Report UIUCDCS-R- 99-2112 (October 1999)Google Scholar
  11. 11.
    Lee, J., Solihin, Y., Torrellas, J.: Automatically Mapping Code in an Intelligent Memory Architecture. In: Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA), January 2001, pp. 121–132 (2001)Google Scholar
  12. 12.
    Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.: Evaluation Techniques for Storage Hierarchies. IBM Systems Journal 9(2), 78–117 (1970)CrossRefGoogle Scholar
  13. 13.
    OpenMP Standard Board. OpenMP Fortran Interpretations, Version 1.0 (April 1999)Google Scholar
  14. 14.
    Rinard, M., Diniz, P.: Eliminating Synchronization Bottlenecks in Object Based Programs Using Adaptive Replication. In: Proceedings of the ACM International Conference on Supercomputing (ICS), June 1999, pp. 83–92 (1999)Google Scholar
  15. 15.
    Romer, T.H., Lee, D., Bershad, B.N., Chen, B.: Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware. In: Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, November 1994, pp. 255–266 (1994)Google Scholar
  16. 16.
    Saavedra, R.H., Park, D.: Improving the Effectiveness of Software Prefetching with Adaptive Execution. In: Proceedings of the Conference on Parallel Algorithms and Compilation Techniques (October 1996)Google Scholar
  17. 17.
    Silicon Graphics Inc. MIPSpro Auto-Parallelization Option Programmer’s Guide (1999)Google Scholar
  18. 18.
    Silicon Graphics Inc. MIPSpro Fortran 77 programmer’s Guide (1999)Google Scholar
  19. 19.
    Voss, M.J., Eigenmann, R.: Reducing Parallel Overheads through Dynamic Serialization. In: Proceedings of the International Parallel Processing Symposium, April 1999, pp. 88–92 (1999)Google Scholar
  20. 20.
    Voss, M.J., Eigenmann, R.: ADAPT: Automated De-Coupled Adaptive Program Transformation. In: Proceedings of the International Conference on Parallel Processing (ICPP), August 2000, p. 163 (2000)Google Scholar
  21. 21.
    Voss, M.J., Eigenmann, R.: High-level Adaptive Program Optimization with ADAPT. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001, pp. 93–102 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jaejin Lee
    • 1
  • H. D. K. Moonesinghe
    • 2
  1. 1.School of Computer Science and EngineeringSeoul National UniversitySeoulKorea
  2. 2.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA

Personalised recommendations