Overcoming the limitations of the traditional loop parallelization

  • Ireneusz Karkowski
  • Henk Corporaal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1225)


Previous research has shown existence of a huge potential of the coarse-grain parallelism in programs. This parallelism is however not always easy to exploit. Especially, when applying today's parallelizing compilers to typical applications from the “embedded” domain. This is mainly due to the deficiencies of the static data dependency analysis they relay on. This paper investigates the potentials of the loops parallelization techniques using dynamic loop analysis techniques. For a set of “embedded” benchmarks (including an MPEG-2 encoder) ∼4 times more loops could be parallelized, in comparison with a state-of-the-art compiler (SUIF [1]), leading to average speedups of 2.85 (on a 4 processor system). Dynamic analysis is however not “full-proof” — we intent to use it exclusively in cases when static analysis fails to give any answer, and only if the user asserts its applicability.


multiprocessing loop parallelization techniques data dependency analysis high performance embedded system design 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Saman P. Amarasinghe, Jennifer M. Anderson, Christopher S. Wilson, Shin-Wei Liao, Brian R. Murphy, Robert S. French, Monica S. Lam, and Mary W. Hall. Multiprocessors From a Software Perspective. IEEE micro, pages 52–61, June 1996.Google Scholar
  2. [2]
    Aart J.C. Bik. A Prototype Restructuring Compiler. Technical Report INF/SCR-92-11, Utrecht University, Utrecht, the Netherlands, November 1994.Google Scholar
  3. [3]
    Henk Corporaal. Transport Triggered Architectures; Design and Evaluation. PhD thesis, Delft Univ. of Technology, September 1995. ISBN 90-9008662-5.Google Scholar
  4. [4]
    Henk Corporaal and Hans Mulder. MOVE: A framework for high-performance processor design. In Supercomputing-91, pages 692–701, Albuquerque, November 1991.Google Scholar
  5. [5]
    J. L. Hennessy D. E. Maydan and M.S. Lam. Effectiveness of Data Dependence Analysis. Proceedings of the NSF-NCRD Workshop on Advanced Compilation Techniques for Novel Architectures, 1992.Google Scholar
  6. [6]
    P.M. Embree. C Language Algorithms for Real-Time DSP. Prentice-Hall, 1995.Google Scholar
  7. [7]
    Jan Hoogerbrugge. Code generation for Transport Triggered Architectures. PhD thesis, Delft Univ. of Technology, February 1996.Google Scholar
  8. [8]
    Jeroen Hordijk and Henk Corporaal. The Impact of Data Communication and Control Synchronization on Coarse-Grain Task Parallelism. In Second Annual Conf. of ASCI, Lommel, Belgium, June 1996.Google Scholar
  9. [9]
    I. Karkowski and R.H.J.M. Otten. An Automatic Hardware-Software Partitioner Based on the Possibilistic Programming. In Proceedings of the ED&TC Conference, Paris, March 1996.Google Scholar
  10. [10]
    James R. Larus. Loop-Level Parallelism in Numeric and Symbolic Programs. IEEE Transactions on Parallel and Distributed Systems, 7:812–826, 1993.Google Scholar
  11. [11]
    Dror E. Maydan, John L. Hennessy, and Monica S. Lam. Efficient and Exact Data Dependency Analysis. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 1–14, June 1991.Google Scholar
  12. [12]
    MPEG Software Simulation Group, MPEG-2 Video Codec, 1996.Google Scholar
  13. [13]
    Alexandru Nicolay. Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies. IEEE Transactions on Computers, 38(5), May 1989.Google Scholar
  14. [14]
    Michael Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Ireneusz Karkowski
    • 1
  • Henk Corporaal
    • 1
  1. 1.Delft University of Technologythe Netherlands

Personalised recommendations