Optimizing the representation of local iteration sets and access sequences for block-cyclic distributions
In this paper we investigate the optimization of state machine based representations of access sequence and local iteration set (LIS) information. Two state machine based representations are shown to have complementary strengths. We develop a third representation, the hybrid state machine, that utilizes the strengths of the other two methods. A new optimization is presented that allows state machine reuse across references, reducing the cost of state machine based accesses. Experimental data is presented to support the approach taken.
Unable to display preview. Download preview PDF.
- 1.S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S-H. Teng. Generating local addresses and communication sets for data-parallel programs. In Proc. 4th annual ACM Symposium on Principles and Practice of Parallel Programming, San Diego, CA, May 1993.Google Scholar
- 2.S.K.S. Gupta, S.D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed memory machines. Journal of Parallel and Distributed Computing, Apr 1996.Google Scholar
- 3.K. Kennedy, N. Nedeljković, and A. Sethi. A linear time algorithm for computing the memory access sequence in data-parallel programs. Technical report, Center for Research on Parallel Computation, Rice Univ., 1994. Tech Report CRPC-TR94485-S.Google Scholar
- 5.S. P. Midkiff. Local iteration set computation for block-cyclic distributions. In C. Polychronopoulos, editor, Proceedings of the 24 'th International Conference on Parallel Processing, pages 77–84. CRC Press, Aug. 1995.Google Scholar
- 6.C.-H. Huang S.D. Kaushik and P. Sadayappan. Incremental generation of index sets for array statement execution on distributed memory machines. In K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, and D. P adua, editors, Languages and Compilers for Parallel Computing, 7th Internationa l Workshop, pages 251–265, Ithaca, NY, USA, August 1994.Google Scholar
- 7.A. Thirumalai and J. Ramanujam. Efficient computation of address sequences in data-parallel programs using closed forms for basis vectors. Journal of Parallel and Distributed Computing, 1996. To appear.Google Scholar
- 8.L. Wang, J.M. Stichnoth, and S. Chatterjee. Runtime performance of parallel array asignment: An empirical study. In Proceedings of the 1996 ACM/IEEE Supercomputing Conference, Nov. 1996. To Appear.Google Scholar