Skip to main content
Log in

Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modifying the data distribution over the course of a program to adapt to variations in the data access patterns may leads to significant computational benefits in many scientific applications. Therefore, dynamic realignment of data is used to enhance algorithm performance in parallel programs on distributed memory machines. This paper presents a new method aims to the efficiency of block-cyclic data realignment of sparse matrix. The main idea of the proposed technique is first todevelop closed forms for generating the Vector Index Set (VIS) of each source/destination processor. Based on the vector index set and the nonzero structure of sparse matrix, two efficient algorithms,vector2message (v2m) and message2vector (m2v) can be derived. The proposed technique uses v2m to extract nonzero elements from source compressed structures and packs them into messages in the source stage; and uses m2v to unpack each received messages and construct the destination matrix in the destination stage. A significant improvement of this approach is that a processor does not need to determine the complicated sending or receiving data sets for dynamic data redistribution. The indexing cost is reduced obviously. The second advantage of the present techniques is the achievement of optimal packing/unpacking stages consequent upon the informative VIS tables. Another contribution of our methods is the ability to handle sparse matrix redistribution under two disjoint processor grids in the source and destination phases. A theoretical model to analyze the performance of the proposed technique is also presented in this work. To evaluate the performance of our methods, we have implemented the present algorithms on an IBM SP2 parallel machine along with the Histogram method and a dense redistribution strategy. The experimental results show that our technique provides significant improvement for runtime data redistribution of sparse matrices in most test samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Asenjo, L. F. Romero, M. Ujaldon, and E. L. Zapata. Sparse block and cyclic data distributions for matrix computations. In Adv. Workshop in High Performance Computing: Technology, Methods and Applications Cetraro, Italy, pp. 359–377, 1994.

    Google Scholar 

  2. G. Bandera and E.L. Zapata. Sparse matrix block-cyclic redistribution. In Proceeding of IEEE Int’l. Parallel Processing Symposium(IPPS’99), San Juan, Puerto Rico, April 12–16, 1999.

  3. Frederic Desprez, Jack Dongarra, and Antoine Petitet,“ Scheduling block-cyclic data redistribution. IEEE Transactions on Parallel and Distributed Systems 9(2):192–205, 1998.

    Article  Google Scholar 

  4. Minyi Guo. Communication generation for irregular codes. The Journal of Supercomputing 25(3):199–214, 2003.

    Article  Google Scholar 

  5. M. Guo and I. Nakata. A framework for efficient array redistribution on distributed memory multicomputers. The Journal of Supercomputing 20(3):243–265, 2001.

    Article  Google Scholar 

  6. S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing 32:155–172, 1996.

    Article  Google Scholar 

  7. C.-H Hsu, Y.-C Chung and C.-R Dow. Efficient methods for multidimensional array redistribution. The Journal of Supercomputing 17(1):23–46, 2000.

    Article  Google Scholar 

  8. C.-H. Hsu, D.-L. Yang, Y.-C. Chung, and C.-R. Dow. A generalized processor mapping technique for array redistribution. IEEE Transactions on Parallel and Distributed Systems 12(7):743–757, 2001.

    Article  Google Scholar 

  9. C.-H. Hsu and Kun-Ming Yu. Processor mapping technique for communication free data redistribution on symmetrical matrices. In Proc.of the 7th IEEE International Symposium on Parallel Architectures, Algorithms, and Networks 2004.

  10. Emmanuel Jeannot and Frédéric Wagner. Two fast and efficient message scheduling algorithms for data redistribution through a backbone. In Parallel and Distributed Processing Symposium 18th International, April 26–30, 2004.

  11. Edgar T. Kalns, and Lionel M. Ni. Processor mapping technique toward efficient data redistribution. IEEE Transactions on Parallel and Distributed Systems 6(12):1995.

  12. S. D. Kaushik, C. H. Huang, J. Ramanujam, and P. Sadayappan. Multiphase data redistribution: Modeling and evaluation. In Proceeding of International Parallel processing Symposium pp. 441–445, 1995.

  13. S. D. Kaushik, C. H. Huang, and P. Sadayappan. Efficient index set generation for compiling HPF array statements on distributed-memory machines. Journal of Parallel and Distributed Computing 38:237–247, 1996.

    Article  Google Scholar 

  14. Jens Knoop and Eduard Mehofer. Distribution assignment placement: Effective optimization of redistribution costs. IEEE Trans. on PDS 13(6), 628–647, 2002.

    Google Scholar 

  15. S. Lee, H. Yook, M. Koo, and M. Park. Processor reordering algorithms toward efficient GEN_BLOCK redistribution. In Proceeding of the 2001 ACM Symposium on Applied computing Las Vegas, Nevada, pp. 539–543, 2001.

  16. Y. W. Lim, Prashanth B. Bhat, and Viktor, K. Prasanna. Efficient algorithms for block-cyclic redistribution of arrays. Algorithmica 24(3/4):298–330, 1999.

    Google Scholar 

  17. Neungsoo Park, Viktor K. Prasanna, and Cauligi S. Raghavendra. Efficient algorithms for block-cyclic data redistribution between processor sets. IEEE Transactions on Parallel and Distributed Systems 10(12):1217–1240, 1999.

    Article  Google Scholar 

  18. Antoine P. Petitet, and Jack J. Dongarra. Algorithmic redistribution methods for block-cyclic decompositions. IEEE Transactions on Parallel and Distributed Systems 10(12):1201–1216, 1999.

    Article  Google Scholar 

  19. L. Prylli and B. Touranchean. Fast runtime block cyclic data redistribution on multiprocessors. Journal of Parallel and Distributed Computing 45:63–72, 1997.

    Article  Google Scholar 

  20. L. F. Romero and E. L. Zapata. Data distributions for sparse matrix vector multiplication. Parallel Computing 21(4):583–605, 1995.

    Article  Google Scholar 

  21. S. Ramaswamy, B. Simons, and P. Banerjee. Optimization for efficient Data redistribution on distributed memory multicomputers. Journal of Parallel and Distributed Computing 38:217–228, 1996.

    Article  Google Scholar 

  22. Rajeev Thakur, Alok Choudhary, and J. Ramanujam. Efficient algorithms for data redistribution. IEEE Transactions on Parallel and Distributed Systems 7(6):1996.

  23. M. Ujaldón, E. L. Zapata, S. D. Sharma, and J. Saltz. Parallelization techniques for sparse matrix applications. Journal of Parallel and Distributed Computing 38(2):256–266, 1996.

    Article  Google Scholar 

  24. Akiyoshi Wakatani and Michael Wolfe. Optimization of data redistribution for distributed memory multicomputers. short communication. Parallel Computing 21(9):1485–1490, 1995.

    Article  Google Scholar 

  25. D. W. Walker and S. W. Otto. Redistribution of block-cyclic data distributions using MPI. Concurrency: Practice and Experience 8(9):707–728, 1996.

    Article  Google Scholar 

  26. Hui Wang, Minyi Guo, and Daming Wei. Divide-and-conquer algorithm for irregular redistributions in parallelizing compilers. The Journal of Supercomputing 29(2), 2004.

  27. Hui Wang, Minyi Guo, and Wenxi Chen: An Efficient algorithm for irregular redistribution in parallelizing compilers Proceedings of 2003 International Symposium on Parallel and Distributed Processing with Applications (ISPA-03) LNCS 2745, Aizu_wakamatsu, Japan, July, 2003.

  28. H.-G. Yook and Myung-Soon Park. Scheduling GEN_BLOCK Array Redistribution. Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems Nov., 1999.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-Hsien Hsu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, CH. Sparse Matrix Block-Cyclic Realignment on Distributed Memory Machines. J Supercomput 33, 175–196 (2005). https://doi.org/10.1007/s11227-005-0247-6

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-005-0247-6

Keywords

Navigation