Skip to main content

An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

  • Conference paper
  • First Online:
  • 413 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1970))

Abstract

High performance computing capability is crucial for the advanced calculations of scientific applications. A parallelizing compiler can take a sequential program as input and automatically translate it into a parallel form. But for loops with arrays of irregular (i.e., indirectly indexed), nonlinear or dynamic access patterns, no state-of-the-art compilers can determine their parallelism at compile-time. In this paper, we propose an efficient run-time scheme to compute a high parallelism execution schedule for those loops. This new scheme first constructs a predecessor iteration table in inspector phase, and then schedules the whole loop iterations into wavefronts for parallel execution. For non-uniform access patterns, the performance of the inspector/executor methods usually degrades dramatically, but it is not valid for our scheme. Furthermore, this scheme is especially suitable for multiprocessor systems because of the features of high scalability and low overhead.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, D. K., Yew, P. C., Torrellas, J.: An Efficient Algorithm for the Run-Time Parallelization of Doacross Loops. Proc. 1994 Supercomputing (1994) 518–527

    Google Scholar 

  2. Huang, T. C., Hsu, P. H.: The SPNT Test: A New Technology for Run-Time Speculative Parallelization of Loops. Lecture Notes in Computer Science Vol. 1366. Springer-Verlag, Berlin Heidelberg New York (1998) 177–191

    Google Scholar 

  3. Huang, T. C., Hsu, P. H., Sheng, T. N.: Efficient Run-Time Scheduling for Parallelizing Partially Parallel Loops,” J. Information Science and Engineering 14(1), (1998) 255–264

    Google Scholar 

  4. Leung, S. T., Zahorjan, J.: Improving the Performance of Run-Time Parallelization. Proc. 4th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, (1993) 83–91

    Google Scholar 

  5. Leung, S. T., Zahorjan, J.: Extending the Applicability and Improving the Performance of Run-Time Parallelization. Tech. Rep. 95-01-08, Dept. CSE, Univ. of Washington (1995)

    Google Scholar 

  6. Midkiff, S., Padua, D.: Compiler Algorithms for Synchronization. IEEE Trans. Comput. C-36, 12, (1987) 1485–1495

    Article  Google Scholar 

  7. Polychronopoulos, C.: Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design. IEEE Trans. Comput. C-37, 8, (1988) 991–1004

    Article  Google Scholar 

  8. Rauchwerger, L., Amato, N., Padua, D.: A Scalable Method for Run-Time Loop Parallelization. Int. J. Parallel Processing, 26(6), (1995) 537–576

    Article  Google Scholar 

  9. Rauchwerger, L., Padua, D.: The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. Parallel and Distributed Systems, 10(2), (1999) 160–180

    Article  Google Scholar 

  10. Saltz, J., Mirchandaney, R., Crowley, K.: Run-time Parallelization and Scheduling of Loops. IEEE Trans. Comput. 40(5), (1991) 603–612

    Article  Google Scholar 

  11. Xu, C., Chaudhary, V.: Time-Stamping Algorithms for Parallelization of Loops at Run-Time. Proc. 11th Int. Parallel Processing Symp. (1997)

    Google Scholar 

  12. Zhu, C. Q., Yew, P. C.: A Scheme to Enforce Data Dependence on Large Multiprocessor Systems. IEEE Trans. Software Eng. 13(6), (1987) 726–739

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang1, TC., Hsu, PH., Wu, CF. (2000). An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems. In: Valero, M., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2000. HiPC 2000. Lecture Notes in Computer Science, vol 1970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44467-X_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-44467-X_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41429-2

  • Online ISBN: 978-3-540-44467-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics