ParCube: Sparse Parallelizable Tensor Decompositions
- 69 Citations
- 3.9k Downloads
Abstract
How can we efficiently decompose a tensor into sparse factors, when the data does not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data mining applications, however the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose ParCube, a new and highly parallelizable method for speeding up tensor decompositions that is well-suited to producing sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm’s correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (Enron, Lbnl, Facebook and Nell), demonstrating its effectiveness for data mining practitioners. In particular, we are the first to analyze the very large Nell dataset using a sparse tensor decomposition, demonstrating that ParCube enables us to handle effectively and efficiently very large datasets.
Keywords
Tensors PARAFAC decomposition sparsity sampling randomized algorithms parallel algorithmsReferences
- 1.Enron e-mail dataset, http://www.cs.cmu.edu/~enron/
- 2.Read the web, http://rtw.ml.cmu.edu/rtw/
- 3.Acar, E., Aykut-Bingol, C., Bingol, H., Bro, R., Yener, B.: Multiway analysis of epilepsy tensors. Bioinformatics 23(13), i10–i18 (2007)CrossRefGoogle Scholar
- 4.Andersson, C.A., Bro, R.: The n-way toolbox for matlab. Chemometrics and Intelligent Laboratory Systems 52(1), 1–4 (2000)CrossRefGoogle Scholar
- 5.Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30(1), 205–231 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
- 6.Bader, B.W., Berry, M.W., Browne, M.: Discussion tracking in enron email using parafac. Survey of Text Mining II, 147–163 (2008)Google Scholar
- 7.Bader, B.W., Harshman, R.A., Kolda, T.G.: Temporal analysis of social networks using three-way dedicom. Sandia National Laboratories TR SAND2006-2161 (2006)Google Scholar
- 8.Bader, B.W., Kolda, T.G.: Matlab tensor toolbox version 2.2. Sandia National Laboratories, Albuquerque (2007)Google Scholar
- 9.Bro, R.: Parafac. tutorial and applications. Chemometrics and Intelligent Laboratory Systems 38(2), 149–171 (1997)CrossRefGoogle Scholar
- 10.Chew, P.A., Bader, B.W., Kolda, T.G., Abdelali, A.: Cross-language information retrieval using parafac2. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 143–152. ACM (2007)Google Scholar
- 11.Drineas, P., Kannan, R., Mahoney, M.W., et al.: Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing 36(1), 184 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
- 12.Harshman, R.A.: Foundations of the parafac procedure: Models and conditions for an ”explanatory” multimodal factor analysis (1970)Google Scholar
- 13.Kolda, T.G., Bader, B.W.: The tophits model for higher-order web link analysis. In: Workshop on Link Analysis, Counterterrorism and Security, vol. 7, pp. 26–29 (2006)Google Scholar
- 14.Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3) (2009)Google Scholar
- 15.Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336. ACM (2006)Google Scholar
- 16.Maruhashi, K., Guo, F., Faloutsos, C.: Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In: Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining (2011)Google Scholar
- 17.Nion, D., Sidiropoulos, N.D.: Adaptive algorithms to track the parafac decomposition of a third-order tensor. IEEE Transactions on Signal Processing 57(6), 2299–2310 (2009)MathSciNetCrossRefGoogle Scholar
- 18.Pang, R., Allman, M., Bennett, M., Lee, J., Paxson, V., Tierney, B.: A first look at modern enterprise traffic. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, p. 2. USENIX Association (2005)Google Scholar
- 19.Papalexakis, E.E., Sidiropoulos, N.D.: Co-clustering as multilinear decomposition with sparse latent factors. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2064–2067. IEEE (2011)Google Scholar
- 20.Phan, A.H., Cichocki, A.: Block decomposition for very large-scale nonnegative tensor factorization. In: 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 316–319. IEEE (2009)Google Scholar
- 21.Sidiropoulos, N.D., Giannakis, G.B., Bro, R.: Blind parafac receivers for ds-cdma systems. IEEE Transactions on Signal Processing 48(3), 810–823 (2000)CrossRefGoogle Scholar
- 22.Sun, J., Papadimitriou, S., Lin, C.Y., Cao, N., Liu, S., Qian, W.: Multivis: Content-based social network exploration through multi-way visual analysis. In: Proc. SDM, vol. 9, pp. 1063–1074 (2009)Google Scholar
- 23.Sun, J., Tao, D., Faloutsos, C.: Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 374–383. ACM (2006)Google Scholar
- 24.Tsourakakis, C.E.: Mach: Fast randomized tensor decompositions, Arxiv preprint arXiv:0909.4969 (2009)Google Scholar
- 25.Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN 2009) (August 2009)Google Scholar
- 26.Zhang, Q., Berry, M.W., Lamb, B.T., Samuel, T.: A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009, Part I. LNCS, vol. 5545, pp. 405–415. Springer, Heidelberg (2009)CrossRefGoogle Scholar