Optimal reordering and mapping of a class of nested-loops for parallel execution

  • Chi-Chung Lam
  • P. Sadayappan
  • Rephael Wenger
Parallelizing Compilers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1239)


This paper addresses the compile-time optimization of a class of nested-loop computations that arise in some computational physics applications. The computations involve summations over products of array terms in order to compute multi-dimensional surface and volume integrals. Reordering additions and multiplications and applying the distributive law can significantly reduce the number of operations required in evaluating these summations. In a multiprocessor environment, proper distribution of the arrays among processors will reduce the inter-processor communication time. We present a formal description of the operation minimization problem, a proof of its NP-completeness, and a pruning strategy for finding the optimal solution in small cases. We also give an algorithm for determining the optimal distribution of the arrays among processors in a multiprocessor environment.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    C. N. Fischer and R. J. Leblanc Jr. Crafting a Compiler. Menlo Park, CA: Benjamin/ Cummings, 1991.Google Scholar
  2. [2]
    Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman, 1979.Google Scholar
  3. [3]
    Ken Kennedy and Kathryn S. McKinley. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Languages and Compilers for Parallel Computing, August 1993, 301–320.Google Scholar
  4. [4]
    Ken Kennedy and Kathryn S. McKinley. Optimizing for Parallelism and Data Locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, July 1992, 323–334.Google Scholar
  5. [5]
    V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. RedWood City, CA: Benjamin/Cummings, 1994.Google Scholar
  6. [6]
    C. C. Lu and W. C. Chew. Fast Algorithm for Solving Hybrid Integral Equations. In IEE Proceedings-H, 140(6): 455–460, December 1993.Google Scholar
  7. [7]
    Edmund K. Miller. Solving Bigger Problems-By Decreasing the Operation Count and Increasing the Computation Bandwidth. In Proceedings of the IEEE, 79(10): 1493–1504, October 1991.CrossRefGoogle Scholar
  8. [8]
    M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan. Multiple Constant Multiplications: Efficient and Versatile Framework and Algorithms for Exploring Common Subexpression Elimination. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 15(2): 151–164, February 1996.CrossRefGoogle Scholar
  9. [9]
    S. Winograd. Arithmetic complexity of computations. Philadelphia: Society for Industrial and Applied Mathematics, 1980.Google Scholar
  10. [10]
    M. Wolfe. High Performance Compilers for Parallel Computing. Addison Wesley, 1996.Google Scholar
  11. [11]
    Michael E. Wolf and Monica S. Lam. A Data Locality Algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, June 1991, 30–44.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Chi-Chung Lam
    • 1
  • P. Sadayappan
    • 1
  • Rephael Wenger
    • 1
  1. 1.Department of Computer and Information ScienceThe Ohio State UniversityColumbus

Personalised recommendations