A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection

Tommasel, Antonela; Godoy, Daniela; Zunino, Alejandro; Mateos, Cristian

doi:10.1007/s10115-016-0981-5

A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection

Regular Paper
Published: 26 August 2016

Volume 51, pages 459–497, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Antonela Tommasel¹,
Daniela Godoy¹,
Alejandro Zunino¹ &
…
Cristian Mateos¹

408 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Matrix computations are both fundamental and ubiquitous in computational science, and as a result, they are frequently used in numerous disciplines of scientific computing and engineering. Due to the high computational complexity of matrix operations, which makes them critical to the performance of a large number of applications, their efficient execution in distributed environments becomes a crucial issue. This work proposes a novel approach for distributing sparse matrix arithmetic operations on computer clusters aiming at speeding-up the processing of high-dimensional matrices. The approach focuses on how to split such operations into independent parallel tasks by considering the intrinsic characteristics that distinguish each type of operation and the particular matrices involved. The approach was applied to the most commonly used arithmetic operations between matrices. The performance of the presented approach was evaluated considering a high-dimensional text feature selection approach and two real-world datasets. Experimental evaluation showed that the proposed approach helped to significantly reduce the computing times of big-scale matrix operations, when compared to serial and multi-thread implementations as well as several linear algebra software libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Core Implementation of Geometric Multidimensional Scaling for Large-Scale Data

Accuracy-Preserving and Scalable Column-Based Low-Rank Matrix Approximation

Parallel Column Subset Selection of Kernel Matrix for Scaling up Support Vector Machines

Notes

References

Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, Berlin, pp 163–222
Chapter Google Scholar
Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. In: Data clustering: algorithms and applications, pp 29–60
Bell N, Garland M (2008) Efficient sparse matrix–vector multiplication on cuda. NVIDIA technical report NVR-2008-004, NVIDIA Corporation
Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis (SC’09). ACM, New York, NY, USA, pp 18:1–18:11
Bisseling RH (2004) Parallel scientific computation: a structured approach using BSP and MPI. Oxford University Press, Oxford
Book MATH Google Scholar
Bosilca G, Delmas R, Dongarra J, Langou J (2009) Algorithmic based fault tolerance applied to high performance computing. J Parallel Distrib Comput 69(4):410–416
Article Google Scholar
Buluç A, Fineman JT, Frigo M, Gilbert JR, Leiserson CE (2009) Parallel sparse matrix–vector and matrix-transpose–vector multiplication using compressed sparse blocks. In: Proceedings of the 21st symposium on parallelism in algorithms and architectures (SPAA’09). ACM, pp 233–244
Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53
Article MathSciNet Google Scholar
Chan E, Quintana-Ortí ES, Quintana-Ortí G, Geijn RVD (2007) Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In: Proceedings of the 9th annual ACM symposium on parallel algorithms and architectures (SPAA’07). ACM, pp 116–125
Chan E, Zee FGV, Bientinesi P, Quintana-Ortí ES, Quintana-Ortí G, van de Geijn RA (2008) Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks. In: Chatterjee S, Scott ML (eds) Proceedings of the 13th ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP’08). ACM, pp 123–132
Dang HV, Schmidt B (2013) Cuda-enabled sparse matrix–vector multiplication on gpus using atomic operations. Parallel Comput 39(11):737–750
Article MathSciNet Google Scholar
Elmroth E, Gustavson F, Jonsson I, Kågström B (2004) Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev 46(1):3–45
Article MathSciNet MATH Google Scholar
Gilbert JR, Moler C, Schreiber R (1992) Sparse matrices in MATLAB: design and implementation. SIAM J Matrix Anal Appl 13(1):333–356
Article MathSciNet MATH Google Scholar
Gu Q, Han J (2011) Towards feature selection in network. In: Macdonald C, Ounis I, Ruthven I (ed) Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM’11). ACM, pp 1175–1184
Gu Q, Li Z, Han J (2011) Generalized fisher score for feature selection. In: Proceedings of the 27th conference annual conference on uncertainty in artificial intelligence (UAI-11), pp 266–273. arxiv:1202.3725
Gustavson F, Henriksson A, Jonsson I, Kågström B, Ling P (1998) Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In: Proceedings of the 4th international workshop on applied parallel computing. Large scale scientific and industrial problems. Springer, Berlin, pp 195–206
Heath L, Ribbens C, Pemmaraju S (2004) Processor-efficient sparse matrix–vector multiplication. Comput Math Appl 48(34):589–608
Article MathSciNet MATH Google Scholar
Hou C, Nie F, Yi D, Wu Y (2011) Feature selection via joint embedding learning and sparse regression. In: Walsh T (ed) Proceedings of the 22nd international joint conference on artificial intelligence (IJCAI). AAAI, pp 1324–1329
Hu X, Tang L, Tang J, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Leonardi S, Panconesi A, Ferragina P, Gionis A (ed) Proceedings of the 6th ACM international conference on web search and data mining. ACM, pp 537–546
Im EJ, Yelick K, Vuduc R (2004) Sparsity optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158
Article Google Scholar
Kannan R, Ishteva M, Park H (2014) Bounded matrix factorization for recommender system. Knowl Inf Syst 39(3):491–511
Article Google Scholar
Kourtis K, Goumas GI, Koziris N (2008) Optimizing sparse matrix–vector multiplication using index and value compression. In: Ramírez A, Bilardi G, Gschwind M (ed) ACM international conference on computing frontiers. ACM, pp 87–96
Kurzak J, Alvaro W, Dongarra J (2009) Optimizing matrix multiplication for a short-vector simd architecture—cell processor. Parallel Comput 35(3):138–150
Article Google Scholar
Lee A, Yau C, Giles MB, Doucet A, Holmes CC (2010) On the utility of graphics cards to perform massively parallel simulation with advanced monte carlo methods. J Comput Graph Stat 19(4):769–789
Article Google Scholar
Li Y, Zhai C, Chen Y (2014) Exploiting rich user information for one-class collaborative filtering. Knowl Inf Syst 38(2):277–301
Article Google Scholar
Li Z, Liu J, Yang Y, Zhou X, Lu H (2013) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150
Google Scholar
Lin YR, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) Metafac: community discovery via relational hypergraph factorization. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09), pp 527–536
Liu C, Chih Yang H, Fan J, He LW, Wang YM (2010) Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In: Rappa M, Jones P, Freire J, Chakrabarti S (ed) Proceedings of the 19th international conference on world wide web. ACM, pp 681–690
Liu H, He J, Rajan D, Camp J (2013) Outlier detection for training-based adaptive protocols. In: IEEE wireless communications and networking conference (WCNC), pp 333–338
Ma Z, Nie F, Yang Y, Uijlings JRR, Sebe N, Hauptmann AG (2012) Discriminating joint feature analysis for multimedia data understanding. IEEE Trans Multimed 14(6):1662–1672
Article Google Scholar
Marsden PV, Friedkin NE (1993) Network studies of social influence. Sociol Methods Res 22(1):127–151
Article Google Scholar
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: Homophily in social networks. Annu Rev Sociol 27(1):415–444
Article Google Scholar
Moreira JE, Midkiff SP, Gupta M, Artigas PV, Wu P, Almasi G (2001) The NINJA project. Commun ACM 44(10):102–109
Article Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course (applied optimization), 2nd edn. Springer, Berlin
Book MATH Google Scholar
Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint l2, 1-norms minimization. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems. Curran Associates Inc, Red Hook, pp 1813–1821
Google Scholar
Oyarzun G, Borrell R, Gorobets A, Oliva A (2014) Mpi-cuda sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner. Comput Fluids 92:244–252
Article MathSciNet Google Scholar
Porter MF (1997) Readings in information retrieval. Morgan Kaufmann Publishers Inc., San Francisco (Chap: An algorithm for suffix stripping)
Google Scholar
Poulson J, Marker B, van de Geijn RA, Hammond JR, Romero NA (2013) Elemental: a new framework for distributed memory dense matrix computations. ACM Trans Math Softw 39(2):13:1–13:24
Article MathSciNet MATH Google Scholar
Qi GJ, Aggarwal CC, Tian Q, Ji H, Huang TS (2012) Exploring context and content links in social media: a latent space method. IEEE Trans Pattern Anal Mach Intell 34(5):850–862
Article Google Scholar
Shahrivari S, Sharifi M (2011) Task-oriented programming: a suitable programming model for multicore and distributed systems. In: Proceedings of the 10th international symposium on parallel and distributed computing (ISPDC’11), pp 139–144
Taboada GL, Ramos S, Expósito RR, Touriño J, Doallo R (2013) Java in the high performance computing arena: research, practice and experience. Sci Comput Program 78(5):425–444
Article Google Scholar
Tang J, Liu H (2012a) Feature selection with linked data in social media. In: Proceedings of the 12th SIAM international conference on data mining (SIAM/Omnipress), pp 118–128
Tang J, Liu H (2012b) Unsupervised feature selection for linked social media data. In: Yang Q, Agarwal D, Pei J (ed) Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12). ACM, pp 904–912
Tang J, Wang X, Liu H (2011) Social media data integration for community detection. In: Postproceedings of MUSE/MSM 2011
Tang J, Wang X, Gao H, Hu X, Liu H (2012) Enriching short text representation in microblog for clustering. Front Comput Sci China 6(1):88–101
MathSciNet MATH Google Scholar
Tang J, Hu X, Gao H, Liu H (2013) Unsupervised feature selection for multi-view data in social media. In: Proceedings of the 13th SIAM international conference on data mining (SIAM), pp 270–278
Tomasulo RM (1967) An efficient algorithm for exploiting multiple arithmetic units. IBM J Res Dev 11(1):25–33
Article MATH Google Scholar
Trinder PW, Cole MI, Hammond K, Loidl H, Michaelson G (2013) Resource analyses for parallel and distributed coordination. Concurr Comput Pract Exp 25(3):309–348
Article MATH Google Scholar
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
Article Google Scholar
Vuduc R, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. J Phys Conf Ser 16(1):521
Article Google Scholar
Wang Q, Li X (2014) Shrink image by feature matrix decomposition. Neurocomputing 140:162–171
Article Google Scholar
Wang X, Tang L, Gao H, Liu H (2010) Discovering overlapping groups in social media. In: Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM ’10). IEEE Computer Society, Washington, DC, USA, pp 569–578
Whiley M, Wilson SP (2004) Parallel algorithms for Markov chain Monte Carlo methods in latent spatial Gaussian models. Stat Comput 14(3):171–179
Article MathSciNet Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’03). ACM, New York, NY, USA, pp 267–273
Yakubovich E, Zenkovich D (2001) Matrix approach to lagrangian fluid dynamics. J Fluid Mech 443:167–196
Article MathSciNet MATH Google Scholar
Yu HF, Hsieh CJ, Si S, Dhillon I (2014) Parallel matrix factorization for recommender systems. Knowl Inf Syst 41(3):793–819
Article Google Scholar
Yu Y, Qiu RG (2014) Followee recommendation in microblog using matrix factorization model with structural regularization. Sci World J 2014:420841
Yuster R, Zwick U (2005) Fast sparse matrix multiplication. ACM Trans Algorithms 1:2–13
Article MathSciNet MATH Google Scholar
Van Zee FG, Chan E, van de Geijn RA, Quintana-Ortí ES, Quintana-Ortí G (2009) The libflame library for dense matrix computations. Comput Sci Eng 11(6):56–63
Article Google Scholar
Zhang K, Wu B (2012) Parallel sparse matrix multiplication for preconditioning and ssta on a many-core architecture. In: Proceedings of the 7th international conference on networking, architecture, and storage, pp 59–68
Zhang Y, Yi D, Wei B, Zhuang Y (2014) A GPU-accelerated non-negative sparse latent semantic analysis algorithm for social tagging data. J Inf Sci 281(0):687–702 (Multimedia modeling)
Article MathSciNet Google Scholar
Zhao Z, Wang L, Liu H (2010) Efficient spectral feature selection with minimum redundancy. In: Fox M, Poole D (eds) Association for the advancement of artificial intelligence (AAAI). AAAI Press, Menlo Park
Google Scholar
Zhou Y, Wilkinson DM, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the netflix prize. In: Fleischer R, Xu J (ed) Proceedings of the 4th international conference on algorithmic aspects in information and management. Lecture notes in computer science, vol 5034. Springer, Berlin, pp 337–348
Zhou Y, Cao W, Liu L, Agaian S, Chen CP (2015) Fast Fourier transform using matrix decomposition. J Inf Sci 291:172–183
Article MathSciNet MATH Google Scholar
Zuo W, McNeil A, Wetter M, Lee ES (2014) Acceleration of the matrix multiplication of radiance three phase daylighting simulations with parallel computing on heterogeneous hardware of personal computer. J Build Perform Simul 7(2):152–163
Article Google Scholar

Download references

Acknowledgments

This work has been partially funded by CONICET (Argentina) under Grant PIP No. 112-201201-00185.

Author information

Authors and Affiliations

ISISTAN, UNICEN-CONICET, Paraje Arroyo Seco, Campus Universitario, Tandil, Buenos Aires, Argentina
Antonela Tommasel, Daniela Godoy, Alejandro Zunino & Cristian Mateos

Authors

Antonela Tommasel
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Godoy
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Zunino
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Mateos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonela Tommasel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tommasel, A., Godoy, D., Zunino, A. et al. A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection. Knowl Inf Syst 51, 459–497 (2017). https://doi.org/10.1007/s10115-016-0981-5

Download citation

Received: 13 December 2014
Revised: 30 June 2016
Accepted: 04 August 2016
Published: 26 August 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10115-016-0981-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection

Abstract

Access this article

Similar content being viewed by others

Multi-Core Implementation of Geometric Multidimensional Scaling for Large-Scale Data

Accuracy-Preserving and Scalable Column-Based Low-Rank Matrix Approximation

Parallel Column Subset Selection of Kernel Matrix for Scaling up Support Vector Machines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection

Abstract

Access this article

Similar content being viewed by others

Multi-Core Implementation of Geometric Multidimensional Scaling for Large-Scale Data

Accuracy-Preserving and Scalable Column-Based Low-Rank Matrix Approximation

Parallel Column Subset Selection of Kernel Matrix for Scaling up Support Vector Machines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation