Optimal reordering and mapping of a class of nested-loops for parallel execution
This paper addresses the compile-time optimization of a class of nested-loop computations that arise in some computational physics applications. The computations involve summations over products of array terms in order to compute multi-dimensional surface and volume integrals. Reordering additions and multiplications and applying the distributive law can significantly reduce the number of operations required in evaluating these summations. In a multiprocessor environment, proper distribution of the arrays among processors will reduce the inter-processor communication time. We present a formal description of the operation minimization problem, a proof of its NP-completeness, and a pruning strategy for finding the optimal solution in small cases. We also give an algorithm for determining the optimal distribution of the arrays among processors in a multiprocessor environment.
Unable to display preview. Download preview PDF.
- C. N. Fischer and R. J. Leblanc Jr. Crafting a Compiler. Menlo Park, CA: Benjamin/ Cummings, 1991.Google Scholar
- Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman, 1979.Google Scholar
- Ken Kennedy and Kathryn S. McKinley. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Languages and Compilers for Parallel Computing, August 1993, 301–320.Google Scholar
- Ken Kennedy and Kathryn S. McKinley. Optimizing for Parallelism and Data Locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, July 1992, 323–334.Google Scholar
- V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. RedWood City, CA: Benjamin/Cummings, 1994.Google Scholar
- C. C. Lu and W. C. Chew. Fast Algorithm for Solving Hybrid Integral Equations. In IEE Proceedings-H, 140(6): 455–460, December 1993.Google Scholar
- M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan. Multiple Constant Multiplications: Efficient and Versatile Framework and Algorithms for Exploring Common Subexpression Elimination. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 15(2): 151–164, February 1996.CrossRefGoogle Scholar
- S. Winograd. Arithmetic complexity of computations. Philadelphia: Society for Industrial and Applied Mathematics, 1980.Google Scholar
- M. Wolfe. High Performance Compilers for Parallel Computing. Addison Wesley, 1996.Google Scholar
- Michael E. Wolf and Monica S. Lam. A Data Locality Algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, June 1991, 30–44.Google Scholar