Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi

Abstract

Array operations are useful in a lot of scientific codes. In recent years, several applications, such as the geological analysis and the medical images processing, are processed using array operations for three-dimensional (abbreviate to “3D”) sparse arrays. Due to the huge computation time, it is necessary to compress 3D sparse arrays and use parallel computing technologies to speed up sparse array operations. How to compress the sparse arrays efficiently is an important task for practical applications. Hence, in this paper, two strategies, inter- and intra-task parallelization (abbreviate to “ETP” and “RTP”), are presented to compress 3D sparse arrays, respectively. Each strategy was designed and implemented on Intel Xeon and Xeon Phi, respectively. From experimental results, the ETP strategy achieves 17.5\(\times \) and 18.2\(\times \) speedup ratios based on Intel Xeon E5-2670 v2 and Intel Xeon Phi SE10X, respectively; 4.5\(\times \) and 4.5\(\times \) speedup ratios for the RTP strategy based on these two environments, respectively.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. 1.

    Cullum JK, Willoughby RA (1985) Lanczos algorithms for large symmetric eignenvalue computations. Birkhauser, Boston

    MATH  Google Scholar 

  2. 2.

    Golub GH, Loan CFV (1989) Matrix computations, 2nd edn. The John Hopkins University Press, Baltimore

    MATH  Google Scholar 

  3. 3.

    Duff I, Grimes R, Lewis J (1989) Sparse matrix test problems. ACM Trans Math Softw 15(1):1–14

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    McKinley KS, Carr S, Tseng CW (1996) Improving data locality with loop transformations. ACM Trans Progr Lang Syst 18(4):424–453

    Article  Google Scholar 

  5. 5.

    Lin CY, Liu JS, Chung YC (2002) Efficient representation scheme for multi-dimensional array operations. IEEE Trans Comput 51(3):327–345

    MathSciNet  Article  Google Scholar 

  6. 6.

    Chambers JE, Wilkinson PB, Kuras O et al (2011) Three-dimensional geophysical anatomy of an active landslide in Lias Group mudrocks, Cleveland Basin, UK. Geomorphology 125(4):472–484

    Article  Google Scholar 

  7. 7.

    Gateau J, Caballero MAA, Dima A et al (2013) Three-dimensional optoacoustic tomography using a conventional ultrasound linear detector array: whole-body tomographic system for small animals. Med Phys 40:013302

    Article  Google Scholar 

  8. 8.

    Lin CY, Chung YC, Liu JS (2003) Efficient data compression methods for multi-dimensional sparse array operations based on the EKMR scheme. IEEE Trans Comput 52(12):1640–1646

    Article  Google Scholar 

  9. 9.

    Harwell-Boeing collection. http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/. Accessed 30 Aug 2015

  10. 10.

    Barrett R, Berry M, Chan TF et al (1994) Templates for the solution of linear systems: building blocks for the iterative methods, 2nd edn. SIAM, Philadelphia

  11. 11.

    Lin CY, Chung YC, Liu JS (2003) Efficient data parallel algorithms for multi-dimensional array operations based on the EKMR scheme for distributed memory multicomputers. IEEE Trans Parall Distr 14(7):625–639

    Article  Google Scholar 

  12. 12.

    Chang RG, Chung TR, Lee JK (2001) Parallel sparse supports for array intrinsic functions of Fortran 90. J Supercomput 18(3):305–339

    Article  MATH  Google Scholar 

  13. 13.

    Oliver T, Schmidt B, Maskell DL (2005) Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans Circ Syst II 52:851–855

    Article  Google Scholar 

  14. 14.

    Szalkowski A, Ledergerber C, Krahenbuhl P et al (2008) SWPS3—fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2. BMC Res Notes 1:107

    Article  Google Scholar 

  15. 15.

    Liu W, Schmidt B, Voss G et al (2006) Bio-sequence database scanning on a GPU. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, Rhodes Island. doi:10.1109/IPDPS.2006.1639531

  16. 16.

    Lin CY, Chung YC (2007) Data distribution schemes of sparse arrays on distributed memory multicomputers. J Supercomput 41(1):63–87

    Article  Google Scholar 

  17. 17.

    Lin CY, Chung YC (2007) Efficient data distribution schemes for multi-dimensional sparse arrays. J Inf Sci Eng 23(1):315–327

    Google Scholar 

  18. 18.

    Hsu WS, Hung CL, Lin CY et al (2013) Efficient strategy for compressing sparse matrices on graphics processing units. In: International Conference on Computational Problem-Solving(ICCP). IEEE, Jiuzhai, pp 5–8. doi:10.1109/ICCPS.2013.6893496

  19. 19.

    Intel Corporation, Intel R Xeon PhiTM coprocessor instruction set architecture reference manual. September 2012, reference number 327364-001

  20. 20.

    Cramer T, Schmidl D, Klemm K et al (2012) OpenMP programming on Intel R Xeon Phi TM coprocessors: an early performance comparison. http://www.lfbs.rwth-aachen.de/marc2012/07_Cramer.pdf. Accessed 30 Aug 2015

  21. 21.

    Liu X, Smelyanskiy M, Chow E et al (2013) Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: 27th International ACM Conference on International Conference on Supercomputing. ACM, New York, pp 273–282. doi:10.1145/2464996.2465013

  22. 22.

    Saule E, Kaya K, Catalyurek UV (2014) Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel processing and applied mathematics, Part I. Lecture notes in computer science, vol 8384, pp 559–570. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-55224-3_52

  23. 23.

    Cierniak M, Li W (1994) Unifying data and control transformations for distributed shared memory machines. Technical report

  24. 24.

    Press WH, Teukolsky SA, Vetterling WT et al (1996) Numerical recipes in Fortran 90: the art of parallel scientific computing. Cambridge University Press, Cambridge

    MATH  Google Scholar 

Download references

Acknowledgments

Part of this work was supported by the Ministry of Science and Technology under the Grants MOST104-2221-E-182-050, MOST104-2221-E-182-051 and MOST103-2221-E-126-013. The authors would like to thank the hardware support by the Professor Che-Rung Lee who joined the Department of Computer Science at National Tsing Hua University. The authors also would like to thank other experts who discussed with us in the past.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Che-Lun Hung.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lin, CY., Yen, H.T. & Hung, CL. Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi. J Supercomput 73, 3391–3410 (2017). https://doi.org/10.1007/s11227-016-1820-x

Download citation

Keywords

  • Sparse array operation
  • Data compression method
  • Parallel processing
  • Multiprocessor
  • Multicomputer
  • Accelerator