Knowledge and Information Systems

, Volume 51, Issue 2, pp 459–497 | Cite as

A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection

  • Antonela TommaselEmail author
  • Daniela Godoy
  • Alejandro Zunino
  • Cristian Mateos
Regular Paper


Matrix computations are both fundamental and ubiquitous in computational science, and as a result, they are frequently used in numerous disciplines of scientific computing and engineering. Due to the high computational complexity of matrix operations, which makes them critical to the performance of a large number of applications, their efficient execution in distributed environments becomes a crucial issue. This work proposes a novel approach for distributing sparse matrix arithmetic operations on computer clusters aiming at speeding-up the processing of high-dimensional matrices. The approach focuses on how to split such operations into independent parallel tasks by considering the intrinsic characteristics that distinguish each type of operation and the particular matrices involved. The approach was applied to the most commonly used arithmetic operations between matrices. The performance of the presented approach was evaluated considering a high-dimensional text feature selection approach and two real-world datasets. Experimental evaluation showed that the proposed approach helped to significantly reduce the computing times of big-scale matrix operations, when compared to serial and multi-thread implementations as well as several linear algebra software libraries.


Sparse matrix Matrix arithmetic operation Feature selection Distributed computing 



This work has been partially funded by CONICET (Argentina) under Grant PIP No. 112-201201-00185.


  1. 1.
    Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, Berlin, pp 163–222CrossRefGoogle Scholar
  2. 2.
    Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. In: Data clustering: algorithms and applications, pp 29–60Google Scholar
  3. 3.
    Bell N, Garland M (2008) Efficient sparse matrix–vector multiplication on cuda. NVIDIA technical report NVR-2008-004, NVIDIA CorporationGoogle Scholar
  4. 4.
    Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis (SC’09). ACM, New York, NY, USA, pp 18:1–18:11Google Scholar
  5. 5.
    Bisseling RH (2004) Parallel scientific computation: a structured approach using BSP and MPI. Oxford University Press, OxfordCrossRefzbMATHGoogle Scholar
  6. 6.
    Bosilca G, Delmas R, Dongarra J, Langou J (2009) Algorithmic based fault tolerance applied to high performance computing. J Parallel Distrib Comput 69(4):410–416CrossRefGoogle Scholar
  7. 7.
    Buluç A, Fineman JT, Frigo M, Gilbert JR, Leiserson CE (2009) Parallel sparse matrix–vector and matrix-transpose–vector multiplication using compressed sparse blocks. In: Proceedings of the 21st symposium on parallelism in algorithms and architectures (SPAA’09). ACM, pp 233–244Google Scholar
  8. 8.
    Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chan E, Quintana-Ortí ES, Quintana-Ortí G, Geijn RVD (2007) Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In: Proceedings of the 9th annual ACM symposium on parallel algorithms and architectures (SPAA’07). ACM, pp 116–125Google Scholar
  10. 10.
    Chan E, Zee FGV, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn RA (2008) Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: Chatterjee S, Scott ML (eds) Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP’08). ACM, pp 123–132Google Scholar
  11. 11.
    Dang HV, Schmidt B (2013) Cuda-enabled sparse matrix–vector multiplication on gpus using atomic operations. Parallel Comput 39(11):737–750MathSciNetCrossRefGoogle Scholar
  12. 12.
    Elmroth E, Gustavson F, Jonsson I, Kågström B (2004) Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev 46(1):3–45MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Gilbert JR, Moler C, Schreiber R (1992) Sparse matrices in MATLAB: design and implementation. SIAM J Matrix Anal Appl 13(1):333–356MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Gu Q, Han J (2011) Towards feature selection in network. In: Macdonald C, Ounis I, Ruthven I (ed) Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM’11). ACM, pp 1175–1184Google Scholar
  15. 15.
    Gu Q, Li Z, Han J (2011) Generalized fisher score for feature selection. In: Proceedings of the 27th conference annual conference on uncertainty in artificial intelligence (UAI-11), pp 266–273. arxiv:1202.3725
  16. 16.
    Gustavson F, Henriksson A, Jonsson I, Kågström B, Ling P (1998) Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In: Proceedings of the 4th international workshop on applied parallel computing. Large scale scientific and industrial problems. Springer, Berlin, pp 195–206Google Scholar
  17. 17.
    Heath L, Ribbens C, Pemmaraju S (2004) Processor-efficient sparse matrix–vector multiplication. Comput Math Appl 48(34):589–608MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Hou C, Nie F, Yi D, Wu Y (2011) Feature selection via joint embedding learning and sparse regression. In: Walsh T (ed) Proceedings of the 22nd international joint conference on artificial intelligence (IJCAI). AAAI, pp 1324–1329Google Scholar
  19. 19.
    Hu X, Tang L, Tang J, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Leonardi S, Panconesi A, Ferragina P, Gionis A (ed) Proceedings of the 6th ACM international conference on web search and data mining. ACM, pp 537–546Google Scholar
  20. 20.
    Im EJ, Yelick K, Vuduc R (2004) Sparsity optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158CrossRefGoogle Scholar
  21. 21.
    Kannan R, Ishteva M, Park H (2014) Bounded matrix factorization for recommender system. Knowl Inf Syst 39(3):491–511CrossRefGoogle Scholar
  22. 22.
    Kourtis K, Goumas GI, Koziris N (2008) Optimizing sparse matrix–vector multiplication using index and value compression. In: Ramírez A, Bilardi G, Gschwind M (ed) ACM international conference on computing frontiers. ACM, pp 87–96Google Scholar
  23. 23.
    Kurzak J, Alvaro W, Dongarra J (2009) Optimizing matrix multiplication for a short-vector simd architecture—cell processor. Parallel Comput 35(3):138–150CrossRefGoogle Scholar
  24. 24.
    Lee A, Yau C, Giles MB, Doucet A, Holmes CC (2010) On the utility of graphics cards to perform massively parallel simulation with advanced monte carlo methods. J Comput Graph Stat 19(4):769–789CrossRefGoogle Scholar
  25. 25.
    Li Y, Zhai C, Chen Y (2014) Exploiting rich user information for one-class collaborative filtering. Knowl Inf Syst 38(2):277–301CrossRefGoogle Scholar
  26. 26.
    Li Z, Liu J, Yang Y, Zhou X, Lu H (2013) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150Google Scholar
  27. 27.
    Lin YR, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) Metafac: community discovery via relational hypergraph factorization. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09), pp 527–536Google Scholar
  28. 28.
    Liu C, Chih Yang H, Fan J, He LW, Wang YM (2010) Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In: Rappa M, Jones P, Freire J, Chakrabarti S (ed) Proceedings of the 19th international conference on world wide web. ACM, pp 681–690Google Scholar
  29. 29.
    Liu H, He J, Rajan D, Camp J (2013) Outlier detection for training-based adaptive protocols. In: IEEE wireless communications and networking conference (WCNC), pp 333–338Google Scholar
  30. 30.
    Ma Z, Nie F, Yang Y, Uijlings JRR, Sebe N, Hauptmann AG (2012) Discriminating joint feature analysis for multimedia data understanding. IEEE Trans Multimed 14(6):1662–1672CrossRefGoogle Scholar
  31. 31.
    Marsden PV, Friedkin NE (1993) Network studies of social influence. Sociol Methods Res 22(1):127–151CrossRefGoogle Scholar
  32. 32.
    McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: Homophily in social networks. Annu Rev Sociol 27(1):415–444CrossRefGoogle Scholar
  33. 33.
    Moreira JE, Midkiff SP, Gupta M, Artigas PV, Wu P, Almasi G (2001) The NINJA project. Commun ACM 44(10):102–109CrossRefGoogle Scholar
  34. 34.
    Nesterov Y (2004) Introductory lectures on convex optimization: a basic course (applied optimization), 2nd edn. Springer, BerlinCrossRefzbMATHGoogle Scholar
  35. 35.
    Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint l2, 1-norms minimization. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems. Curran Associates Inc, Red Hook, pp 1813–1821Google Scholar
  36. 36.
    Oyarzun G, Borrell R, Gorobets A, Oliva A (2014) Mpi-cuda sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner. Comput Fluids 92:244–252MathSciNetCrossRefGoogle Scholar
  37. 37.
    Porter MF (1997) Readings in information retrieval. Morgan Kaufmann Publishers Inc., San Francisco (Chap: An algorithm for suffix stripping)Google Scholar
  38. 38.
    Poulson J, Marker B, van de Geijn RA, Hammond JR, Romero NA (2013) Elemental: a new framework for distributed memory dense matrix computations. ACM Trans Math Softw 39(2):13:1–13:24MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Qi GJ, Aggarwal CC, Tian Q, Ji H, Huang TS (2012) Exploring context and content links in social media: a latent space method. IEEE Trans Pattern Anal Mach Intell 34(5):850–862CrossRefGoogle Scholar
  40. 40.
    Shahrivari S, Sharifi M (2011) Task-oriented programming: a suitable programming model for multicore and distributed systems. In: Proceedings of the 10th international symposium on parallel and distributed computing (ISPDC’11), pp 139–144Google Scholar
  41. 41.
    Taboada GL, Ramos S, Expósito RR, Touriño J, Doallo R (2013) Java in the high performance computing arena: research, practice and experience. Sci Comput Program 78(5):425–444CrossRefGoogle Scholar
  42. 42.
    Tang J, Liu H (2012a) Feature selection with linked data in social media. In: Proceedings of the 12th SIAM international conference on data mining (SIAM/Omnipress), pp 118–128Google Scholar
  43. 43.
    Tang J, Liu H (2012b) Unsupervised feature selection for linked social media data. In: Yang Q, Agarwal D, Pei J (ed) Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12). ACM, pp 904–912Google Scholar
  44. 44.
    Tang J, Wang X, Liu H (2011) Social media data integration for community detection. In: Postproceedings of MUSE/MSM 2011Google Scholar
  45. 45.
    Tang J, Wang X, Gao H, Hu X, Liu H (2012) Enriching short text representation in microblog for clustering. Front Comput Sci China 6(1):88–101MathSciNetzbMATHGoogle Scholar
  46. 46.
    Tang J, Hu X, Gao H, Liu H (2013) Unsupervised feature selection for multi-view data in social media. In: Proceedings of the 13th SIAM international conference on data mining (SIAM), pp 270–278Google Scholar
  47. 47.
    Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Dev 11(1):25–33CrossRefzbMATHGoogle Scholar
  48. 48.
    Trinder PW, Cole MI, Hammond K, Loidl H, Michaelson G (2013) Resource analyses for parallel and distributed coordination. Concurr Comput Pract Exp 25(3):309–348CrossRefzbMATHGoogle Scholar
  49. 49.
    Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRefGoogle Scholar
  50. 50.
    Vuduc R, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. J Phys Conf Ser 16(1):521CrossRefGoogle Scholar
  51. 51.
    Wang Q, Li X (2014) Shrink image by feature matrix decomposition. Neurocomputing 140:162–171CrossRefGoogle Scholar
  52. 52.
    Wang X, Tang L, Gao H, Liu H (2010) Discovering overlapping groups in social media. In: Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM ’10). IEEE Computer Society, Washington, DC, USA, pp 569–578Google Scholar
  53. 53.
    Whiley M, Wilson SP (2004) Parallel algorithms for Markov chain Monte Carlo methods in latent spatial Gaussian models. Stat Comput 14(3):171–179MathSciNetCrossRefGoogle Scholar
  54. 54.
    Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’03). ACM, New York, NY, USA, pp 267–273Google Scholar
  55. 55.
    Yakubovich E, Zenkovich D (2001) Matrix approach to lagrangian fluid dynamics. J Fluid Mech 443:167–196MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Yu HF, Hsieh CJ, Si S, Dhillon I (2014) Parallel matrix factorization for recommender systems. Knowl Inf Syst 41(3):793–819CrossRefGoogle Scholar
  57. 57.
    Yu Y, Qiu RG (2014) Followee recommendation in microblog using matrix factorization model with structural regularization. Sci World J 2014:420841Google Scholar
  58. 58.
    Yuster R, Zwick U (2005) Fast sparse matrix multiplication. ACM Trans Algorithms 1:2–13MathSciNetCrossRefzbMATHGoogle Scholar
  59. 59.
    Van Zee FG, Chan E, van de Geijn RA, Quintana-Ortí ES, Quintana-Ortí G (2009) The libflame library for dense matrix computations. Comput Sci Eng 11(6):56–63CrossRefGoogle Scholar
  60. 60.
    Zhang K, Wu B (2012) Parallel sparse matrix multiplication for preconditioning and ssta on a many-core architecture. In: Proceedings of the 7th international conference on networking, architecture, and storage, pp 59–68Google Scholar
  61. 61.
    Zhang Y, Yi D, Wei B, Zhuang Y (2014) A GPU-accelerated non-negative sparse latent semantic analysis algorithm for social tagging data. J Inf Sci 281(0):687–702 (Multimedia modeling)MathSciNetCrossRefGoogle Scholar
  62. 62.
    Zhao Z, Wang L, Liu H (2010) Efficient spectral feature selection with minimum redundancy. In: Fox M, Poole D (eds) Association for the advancement of artificial intelligence (AAAI). AAAI Press, Menlo ParkGoogle Scholar
  63. 63.
    Zhou Y, Wilkinson DM, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the netflix prize. In: Fleischer R, Xu J (ed) Proceedings of the 4th international conference on algorithmic aspects in information and management. Lecture notes in computer science, vol 5034. Springer, Berlin, pp 337–348Google Scholar
  64. 64.
    Zhou Y, Cao W, Liu L, Agaian S, Chen CP (2015) Fast Fourier transform using matrix decomposition. J Inf Sci 291:172–183MathSciNetCrossRefzbMATHGoogle Scholar
  65. 65.
    Zuo W, McNeil A, Wetter M, Lee ES (2014) Acceleration of the matrix multiplication of radiance three phase daylighting simulations with parallel computing on heterogeneous hardware of personal computer. J Build Perform Simul 7(2):152–163CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Antonela Tommasel
    • 1
    Email author
  • Daniela Godoy
    • 1
  • Alejandro Zunino
    • 1
  • Cristian Mateos
    • 1
  1. 1.ISISTAN, UNICEN-CONICETTandilArgentina

Personalised recommendations