Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

Hsu, Ching-Hsien

doi:10.1007/s11227-005-0247-6

Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

Published: September 2005

Volume 33, pages 175–196, (2005)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ching-Hsien Hsu¹

49 Accesses
2 Citations
Explore all metrics

Abstract

Modifying the data distribution over the course of a program to adapt to variations in the data access patterns may leads to significant computational benefits in many scientific applications. Therefore, dynamic realignment of data is used to enhance algorithm performance in parallel programs on distributed memory machines. This paper presents a new method aims to the efficiency of block-cyclic data realignment of sparse matrix. The main idea of the proposed technique is first todevelop closed forms for generating the Vector Index Set (VIS) of each source/destination processor. Based on the vector index set and the nonzero structure of sparse matrix, two efficient algorithms,vector2message (v2m) and message2vector (m2v) can be derived. The proposed technique uses v2m to extract nonzero elements from source compressed structures and packs them into messages in the source stage; and uses m2v to unpack each received messages and construct the destination matrix in the destination stage. A significant improvement of this approach is that a processor does not need to determine the complicated sending or receiving data sets for dynamic data redistribution. The indexing cost is reduced obviously. The second advantage of the present techniques is the achievement of optimal packing/unpacking stages consequent upon the informative VIS tables. Another contribution of our methods is the ability to handle sparse matrix redistribution under two disjoint processor grids in the source and destination phases. A theoretical model to analyze the performance of the proposed technique is also presented in this work. To evaluate the performance of our methods, we have implemented the present algorithms on an IBM SP2 parallel machine along with the Histogram method and a dense redistribution strategy. The experimental results show that our technique provides significant improvement for runtime data redistribution of sparse matrices in most test samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. Asenjo, L. F. Romero, M. Ujaldon, and E. L. Zapata. Sparse block and cyclic data distributions for matrix computations. In Adv. Workshop in High Performance Computing: Technology, Methods and Applications Cetraro, Italy, pp. 359–377, 1994.
Google Scholar
G. Bandera and E.L. Zapata. Sparse matrix block-cyclic redistribution. In Proceeding of IEEE Int’l. Parallel Processing Symposium(IPPS’99), San Juan, Puerto Rico, April 12–16, 1999.
Frederic Desprez, Jack Dongarra, and Antoine Petitet,“ Scheduling block-cyclic data redistribution. IEEE Transactions on Parallel and Distributed Systems 9(2):192–205, 1998.
Article Google Scholar
Minyi Guo. Communication generation for irregular codes. The Journal of Supercomputing 25(3):199–214, 2003.
Article Google Scholar
M. Guo and I. Nakata. A framework for efficient array redistribution on distributed memory multicomputers. The Journal of Supercomputing 20(3):243–265, 2001.
Article Google Scholar
S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing 32:155–172, 1996.
Article Google Scholar
C.-H Hsu, Y.-C Chung and C.-R Dow. Efficient methods for multidimensional array redistribution. The Journal of Supercomputing 17(1):23–46, 2000.
Article Google Scholar
C.-H. Hsu, D.-L. Yang, Y.-C. Chung, and C.-R. Dow. A generalized processor mapping technique for array redistribution. IEEE Transactions on Parallel and Distributed Systems 12(7):743–757, 2001.
Article Google Scholar
C.-H. Hsu and Kun-Ming Yu. Processor mapping technique for communication free data redistribution on symmetrical matrices. In Proc.of the 7th IEEE International Symposium on Parallel Architectures, Algorithms, and Networks 2004.
Emmanuel Jeannot and Frédéric Wagner. Two fast and efficient message scheduling algorithms for data redistribution through a backbone. In Parallel and Distributed Processing Symposium 18th International, April 26–30, 2004.
Edgar T. Kalns, and Lionel M. Ni. Processor mapping technique toward efficient data redistribution. IEEE Transactions on Parallel and Distributed Systems 6(12):1995.
S. D. Kaushik, C. H. Huang, J. Ramanujam, and P. Sadayappan. Multiphase data redistribution: Modeling and evaluation. In Proceeding of International Parallel processing Symposium pp. 441–445, 1995.
S. D. Kaushik, C. H. Huang, and P. Sadayappan. Efficient index set generation for compiling HPF array statements on distributed-memory machines. Journal of Parallel and Distributed Computing 38:237–247, 1996.
Article Google Scholar
Jens Knoop and Eduard Mehofer. Distribution assignment placement: Effective optimization of redistribution costs. IEEE Trans. on PDS 13(6), 628–647, 2002.
Google Scholar
S. Lee, H. Yook, M. Koo, and M. Park. Processor reordering algorithms toward efficient GEN_BLOCK redistribution. In Proceeding of the 2001 ACM Symposium on Applied computing Las Vegas, Nevada, pp. 539–543, 2001.
Y. W. Lim, Prashanth B. Bhat, and Viktor, K. Prasanna. Efficient algorithms for block-cyclic redistribution of arrays. Algorithmica 24(3/4):298–330, 1999.
Google Scholar
Neungsoo Park, Viktor K. Prasanna, and Cauligi S. Raghavendra. Efficient algorithms for block-cyclic data redistribution between processor sets. IEEE Transactions on Parallel and Distributed Systems 10(12):1217–1240, 1999.
Article Google Scholar
Antoine P. Petitet, and Jack J. Dongarra. Algorithmic redistribution methods for block-cyclic decompositions. IEEE Transactions on Parallel and Distributed Systems 10(12):1201–1216, 1999.
Article Google Scholar
L. Prylli and B. Touranchean. Fast runtime block cyclic data redistribution on multiprocessors. Journal of Parallel and Distributed Computing 45:63–72, 1997.
Article Google Scholar
L. F. Romero and E. L. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing 21(4):583–605, 1995.
Article Google Scholar
S. Ramaswamy, B. Simons, and P. Banerjee. Optimization for efficient Data redistribution on distributed memory multicomputers. Journal of Parallel and Distributed Computing 38:217–228, 1996.
Article Google Scholar
Rajeev Thakur, Alok Choudhary, and J. Ramanujam. Efficient algorithms for data redistribution. IEEE Transactions on Parallel and Distributed Systems 7(6):1996.
M. Ujaldón, E. L. Zapata, S. D. Sharma, and J. Saltz. Parallelization techniques for sparse matrix applications. Journal of Parallel and Distributed Computing 38(2):256–266, 1996.
Article Google Scholar
Akiyoshi Wakatani and Michael Wolfe. Optimization of data redistribution for distributed memory multicomputers. short communication. Parallel Computing 21(9):1485–1490, 1995.
Article Google Scholar
D. W. Walker and S. W. Otto. Redistribution of block-cyclic data distributions using MPI. Concurrency: Practice and Experience 8(9):707–728, 1996.
Article Google Scholar
Hui Wang, Minyi Guo, and Daming Wei. Divide-and-conquer algorithm for irregular redistributions in parallelizing compilers. The Journal of Supercomputing 29(2), 2004.
Hui Wang, Minyi Guo, and Wenxi Chen: An Efficient algorithm for irregular redistribution in parallelizing compilers Proceedings of 2003 International Symposium on Parallel and Distributed Processing with Applications (ISPA-03) LNCS 2745, Aizu_wakamatsu, Japan, July, 2003.
H.-G. Yook and Myung-Soon Park. Scheduling GEN_BLOCK Array Redistribution. Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems Nov., 1999.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan
Ching-Hsien Hsu

Authors

Ching-Hsien Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ching-Hsien Hsu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, CH. Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines. J Supercomput 33, 175–196 (2005). https://doi.org/10.1007/s11227-005-0247-6

Download citation

Issue Date: September 2005
DOI: https://doi.org/10.1007/s11227-005-0247-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

Abstract

Access this article

Similar content being viewed by others

Addressing Volume and Latency Overheads in 1D-parallel Sparse Matrix-Vector Multiplication

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Parallelization of Sparse Matrix Kernels for Big Data Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

Abstract

Access this article

Similar content being viewed by others

Addressing Volume and Latency Overheads in 1D-parallel Sparse Matrix-Vector Multiplication

Analysis of Partitioning Models and Metrics in Parallel Sparse Matrix-Vector Multiplication

Parallelization of Sparse Matrix Kernels for Big Data Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation