Abstract
High performance computing capability is crucial for the advanced calculations of scientific applications. A parallelizing compiler can take a sequential program as input and automatically translate it into a parallel form. But for loops with arrays of irregular (i.e., indirectly indexed), nonlinear or dynamic access patterns, no state-of-the-art compilers can determine their parallelism at compile-time. In this paper, we propose an efficient run-time scheme to compute a high parallelism execution schedule for those loops. This new scheme first constructs a predecessor iteration table in inspector phase, and then schedules the whole loop iterations into wavefronts for parallel execution. For non-uniform access patterns, the performance of the inspector/executor methods usually degrades dramatically, but it is not valid for our scheme. Furthermore, this scheme is especially suitable for multiprocessor systems because of the features of high scalability and low overhead.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chen, D. K., Yew, P. C., Torrellas, J.: An Efficient Algorithm for the Run-Time Parallelization of Doacross Loops. Proc. 1994 Supercomputing (1994) 518–527
Huang, T. C., Hsu, P. H.: The SPNT Test: A New Technology for Run-Time Speculative Parallelization of Loops. Lecture Notes in Computer Science Vol. 1366. Springer-Verlag, Berlin Heidelberg New York (1998) 177–191
Huang, T. C., Hsu, P. H., Sheng, T. N.: Efficient Run-Time Scheduling for Parallelizing Partially Parallel Loops,” J. Information Science and Engineering 14(1), (1998) 255–264
Leung, S. T., Zahorjan, J.: Improving the Performance of Run-Time Parallelization. Proc. 4th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, (1993) 83–91
Leung, S. T., Zahorjan, J.: Extending the Applicability and Improving the Performance of Run-Time Parallelization. Tech. Rep. 95-01-08, Dept. CSE, Univ. of Washington (1995)
Midkiff, S., Padua, D.: Compiler Algorithms for Synchronization. IEEE Trans. Comput. C-36, 12, (1987) 1485–1495
Polychronopoulos, C.: Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design. IEEE Trans. Comput. C-37, 8, (1988) 991–1004
Rauchwerger, L., Amato, N., Padua, D.: A Scalable Method for Run-Time Loop Parallelization. Int. J. Parallel Processing, 26(6), (1995) 537–576
Rauchwerger, L., Padua, D.: The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. Parallel and Distributed Systems, 10(2), (1999) 160–180
Saltz, J., Mirchandaney, R., Crowley, K.: Run-time Parallelization and Scheduling of Loops. IEEE Trans. Comput. 40(5), (1991) 603–612
Xu, C., Chaudhary, V.: Time-Stamping Algorithms for Parallelization of Loops at Run-Time. Proc. 11th Int. Parallel Processing Symp. (1997)
Zhu, C. Q., Yew, P. C.: A Scheme to Enforce Data Dependence on Large Multiprocessor Systems. IEEE Trans. Software Eng. 13(6), (1987) 726–739
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang1, TC., Hsu, PH., Wu, CF. (2000). An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems. In: Valero, M., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2000. HiPC 2000. Lecture Notes in Computer Science, vol 1970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44467-X_3
Download citation
DOI: https://doi.org/10.1007/3-540-44467-X_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41429-2
Online ISBN: 978-3-540-44467-1
eBook Packages: Springer Book Archive