Abstract
Array redistribution is usually required for more efficiently executing a data-parallel program on distributed memory multi-computers. In performing array redistribution using synchronous communication mode, data communications among the processors should be properly arranged to avoid incurring higher data transfer cost. Some efficient communication scheduling methods for the Block-Cyclic redistribution have been proposed. On the other hand, the processor mapping technique can help reduce the data transfer cost of redistribution. To avoid degrading the benefit of data transfer cost reduction, it is needed to construct optimal communication schedules for the redistribution in which the processor mapping technique is applied. In this paper, we present a unified approach to constructing optimal communication schedules for the processor mapping technique applied Block-Cyclic redistribution. The proposed method is founded on the processor mapping technique and can more efficiently construct the required communication schedules than other optimal scheduling methods.
Similar content being viewed by others
References
E. T. Kalns and L. M. Ni. Processor mapping techniques toward efficient data redistribution. IEEE Transactions on Parallel and Distributed Systems, 6(12):1234–1247, 1995.
D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodghill. Solving alignment using elementary linear algebra. In Conference Record of the 7th Workshop on Languages and Compilers for Parallel Computing, pp. 46–60, 1994.
J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2(4):472–482, 1991.
M. Dion and Y. Robert. Mapping Affine Loop Nests: New Results. Parallel Computing, 22(10):1373–1397, 1996.
A. W. Lam and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24, (3/4):445–475, 1998.
W.-L. Chang, J.-W. Huang, and C.-P. Chu. Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Transactions on Parallel and Distributed Systems, 15(1): 28–39, 2004.
S. Hiranandani, K. Kennedy, J. Mellor-Crummey, and A. Sethi. Compilation techniques for block-cyclic distributions. ACM International Conference on Supercomputing, pp. 392–403, 1994.
S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S.-H. Teng. Generating local address and communication sets for data parallel programs. Journal of Parallel and Distributed Computing, 26:72–84, 1995.
S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32:155–172, 1996.
N. Park, V. K. Prasanna, and C. S. Raghavendra. Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Transactions on Parallel and Distributed Systems, 10(12):1217–1240, 1999.
C.-H. Hsu and Y.-H. Chung. Efficient methods for kr → r and r → kr array redistribution. The Journal of Supercomputing, 12:253–276, 1998.
S. Ramaswamy and P. Banerjee. Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In Frontiers ’95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, pp. 342–349, 1995.
S. Ramaswamy, B. Simons, and P. Banerjee. Optimization for Efficient Array Redistribution on Distributed Memory Multicomputers. Journal of Parallel and Distributed Computing, 38:217–228, 1996
R. Thakur, A. Choudhary, and G. Fox. Runtime array redistribution in HPF programs. In: Proceedings of Scalable High Performance Computing Conference, pp. 309–316, 1994.
R. Thakur, A. Choudhary, and J. Ramanujam. Efficient algorithm for array redistribution. IEEE Transactions on Parallel and Distributed Systems, 7(6):587–594, 1996
L. Prylli and B. Tourancheau. Fast runtime block cyclic data redistribution on multiprocessors. Journal of Parallel and Distributed Computing, 45:63–72, 1997.
C.-H. Hsu, S.-W. Bai, Y.-C. Chung, and C.-S. Yang. A generalized basic-cycle calculation method for efficient array redistribution. IEEE Transactions on Parallel and Distributed Systems, 11(12):1201–1216, 2000.
A. Wakatani and M. Wolfe. Optimization of array redistribution for distributed memory multicomputers. Parallel Compu ting, 21(9):1485–1490, 1995.
S. D. Kaushik, C.-H. Huang, R. W. Johnson, and P. Sadayappan. An approach to communication-efficient data redistribution. In: Proceedings of International Conference on Supercomputing, pp. 364–373, 1994.
S. D. Kaushik, C.-H. Huang, J. Ramanujam, and P. Sadayappan. Multi-phase array redistribution: Modeling and evaluation. In: Proceedings of International Parallel Processing Symposium, pp. 441–445, 1995.
D. W. Walker and S. W. Otto. Redistribution of block-cyclic data distributions using MPI. Concurrency: Practice and Experience, 8(9):707–728, 1996.
F. Desprez, J. Dongarra, C. Randriamaro, and Y. Robert. Scheduling block-cyclic array redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(2):192–205, 1998.
M. Guo, I. Nakata, and Y. Yamashita. Contention-free communication scheduling for array redistribution. Parallel Computing, 26(10):1325–1343, 2000.
E. T. Kalns and L. M. Ni. DaReL: A portable data redistribution library for distributed-memory machines. In Proceedings of Scalable Parallel Libraries Conference II, October 1994.
C.-H. Hsu, Y.-C. Chung, D.-L.Yang, and C.-R. Dow. A generalized processor mapping technique for array redistribution. IEEE Transactions on Parallel and Distributed Systems, 12(7):743–757, 2001.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, JW., Chu, CP. An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data Redistribution. J Supercomput 37, 297–318 (2006). https://doi.org/10.1007/s11227-006-6615-z
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-6615-z