On the synthesis of parallel programs from tensor product formulas for block recursive algorithms
This paper presents a methodology for synthesizing parallel programs for block recursive algorithms such as fast Fourier transforms and Strassen's matrix multiplication algorithm. A block recursive algorithm is expressed as a tensor product formula which consists of matrix sums, matrix products, direct sums, tensor products, componentwise matrix operations, and stride permutations. These mathematical operations can be mapped to high-level programming language constructs such as iteration, sequential composition, parallel composition, vector operations, and index computation. Translation of a tensor product formula consisting of these primitives into a parallel program involves determination of the proper indexing schemes for the arrays. This paper gives an algorithm to determine the indexing scheme and the code required for the index computation. Various parallel programs can be synthesized by manipulating tensor product formulas to exploit different computational structures. In this paper, we discuss some issues involved in formula manipulation for a particular target machine, the Cray Y-MP.
Unable to display preview. Download preview PDF.
- S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, J. R. Johnson, R. W. Johnson, and P. Sadayappan. A methodology for generating data distributions to optimize communication. In Proceedings of Fourth IEEE Symposium on Parallel and Distributed Computing, 1992. To appear.Google Scholar
- C.-H. Huang, J. R. Johnson, and R. W. Johnson. A tensor product formulation of Strassen's matrix multiplication algorithm. Applied Mathematics Letters, 3(3):67–71, 1990.Google Scholar
- C.-H. Huang, J. R. Johnson, and R. W. Johnson. Generating parallel programs from tensor product formulas: A case study of Strassen's matrix multiplication algorithm. In Proc. International Conference on Parallel Processing 1992, pages 104–108, 1992.Google Scholar
- J. R. Johnson, C.-H Huang, and R. W. Johnson. Tensor permutations and block matrix allocation. In Second International Workshop on Array Structures (ATABLE-92), 1992. To appear.Google Scholar
- J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri. A methodology for desigining, modifying, and implementing Fourier transform algorithms on various architectures. Circuits Systems Signal Process., 9(4):449–499, 1990.Google Scholar
- R. W. Johnson, C.-H. Huang, and J. R. Johnson. Programming schemata for tensor products. Preprint.Google Scholar
- R. W. Johnson, C.-H. Huang, and J. R. Johnson. Multilinear algebra and parallel programming. J. Supercomputing, 9:189–218, 1991.Google Scholar
- S.D. Kaushik, S. Sharma, C.-H. Huang, J.R. Johnson, R.W. Johnson, and P. Sadayappan. An algebraic theory for modeling direct interconnection networks. In Supercomputing '92, pages 488–497, Nov. 1992.Google Scholar
- C. Van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, 1992.Google Scholar