Abstract
This paper presents adaptive execution techniques that determine whether automatically parallelized loops are executed parallelly or sequentially in order to maximize performance and scalability. The adaptation and performance estimation algorithms are implemented in a compiler preprocessor. The preprocessor inserts code that automatically determines at compile-time or at run-time the way the parallelized loops are executed. Using a set of standard numerical applications written in Fortran77 and running them with our techniques on a distributed shared memory multiprocessor machine (SGI Origin2000), we obtain the performance of our techniques, on average, 26%, 20%, 16%, and 10% faster than the original parallel program on 32, 16, 8, and 4 processors, respectively. One of the applications runs even more than twice faster than its original parallel version on 32 processors.
This work was supported in part by National Science Foundation under grant EIA-0130724 and by National Computational Science Alliance under grant ocn, and utilized the Silicon Graphics Origin2000. This work was also supported in part by the Korean Ministry of Education under the BK21 program and by the Korean Ministry of Science and Technology under the National Research Laboratory program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alpern, B., et al.: The Jalapeño Virtual Machine. IBM Systems Journal 39(1), 211–238 (2000)
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)
Byler, M., Davies, J., Huson, C., Leasure, B., Wolfe, M.: Multiple Version Loops. In: Proceedings of the International Conference on Parallel Processing (ICPP), August 1987, pp. 312–318 (1987)
Cascaval, C., DeRose, L., Padua, D.A., Reed, D.: Compile-Time Based Performance Prediction. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, pp. 365–379. Springer, Heidelberg (2000)
Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Manon, R.: Paralle Programming in OpenMP. Morgan Kaufmann Publisher, San Francisco (2001)
Cox, A.L., Fowler, R.J.: Adaptive Cache Coherency for Detecting Migratory Shared Data. In: Proceedings of the 20th International Symposium on Computer Architectur, May 1993, pp. 98–108 (1993)
Diniz, P., Rinard, M.: Dynamic Feedback: An Effective Technique for Adaptive Computing. In: Proceedings of the ACM SIGPLAN Conference on Program Language Design and Implementation, June 1997, pp. 71–84 (1997)
Gupta, R., Bodik, R.: Adaptive Loop Transformations for Scientific Programs. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing, October 1995, pp. 368–375 (1995)
Holzle, U., Ungar, D.: Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1994, pp. 326–336 (1994)
Lee, J.: Compilation Techniques for Explicitly Parallel Programs. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, Department of Computer Science Technical Report UIUCDCS-R- 99-2112 (October 1999)
Lee, J., Solihin, Y., Torrellas, J.: Automatically Mapping Code in an Intelligent Memory Architecture. In: Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA), January 2001, pp. 121–132 (2001)
Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.: Evaluation Techniques for Storage Hierarchies. IBM Systems Journal 9(2), 78–117 (1970)
OpenMP Standard Board. OpenMP Fortran Interpretations, Version 1.0 (April 1999)
Rinard, M., Diniz, P.: Eliminating Synchronization Bottlenecks in Object Based Programs Using Adaptive Replication. In: Proceedings of the ACM International Conference on Supercomputing (ICS), June 1999, pp. 83–92 (1999)
Romer, T.H., Lee, D., Bershad, B.N., Chen, B.: Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware. In: Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, November 1994, pp. 255–266 (1994)
Saavedra, R.H., Park, D.: Improving the Effectiveness of Software Prefetching with Adaptive Execution. In: Proceedings of the Conference on Parallel Algorithms and Compilation Techniques (October 1996)
Silicon Graphics Inc. MIPSpro Auto-Parallelization Option Programmer’s Guide (1999)
Silicon Graphics Inc. MIPSpro Fortran 77 programmer’s Guide (1999)
Voss, M.J., Eigenmann, R.: Reducing Parallel Overheads through Dynamic Serialization. In: Proceedings of the International Parallel Processing Symposium, April 1999, pp. 88–92 (1999)
Voss, M.J., Eigenmann, R.: ADAPT: Automated De-Coupled Adaptive Program Transformation. In: Proceedings of the International Conference on Parallel Processing (ICPP), August 2000, p. 163 (2000)
Voss, M.J., Eigenmann, R.: High-level Adaptive Program Optimization with ADAPT. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001, pp. 93–102 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, J., Moonesinghe, H.D.K. (2005). Adaptively Increasing Performance and Scalability of Automatically Parallelized Programs. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_14
Download citation
DOI: https://doi.org/10.1007/11596110_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30781-5
Online ISBN: 978-3-540-31612-1
eBook Packages: Computer ScienceComputer Science (R0)