Advertisement

ParCube: Sparse Parallelizable Tensor Decompositions

  • Evangelos E. Papalexakis
  • Christos Faloutsos
  • Nicholas D. Sidiropoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)

Abstract

How can we efficiently decompose a tensor into sparse factors, when the data does not fit in memory? Tensor decompositions have gained a steadily increasing popularity in data mining applications, however the current state-of-art decomposition algorithms operate on main memory and do not scale to truly large datasets. In this work, we propose ParCube, a new and highly parallelizable method for speeding up tensor decompositions that is well-suited to producing sparse approximations. Experiments with even moderately large data indicate over 90% sparser outputs and 14 times faster execution, with approximation error close to the current state of the art irrespective of computation and memory requirements. We provide theoretical guarantees for the algorithm’s correctness and we experimentally validate our claims through extensive experiments, including four different real world datasets (Enron, Lbnl, Facebook and Nell), demonstrating its effectiveness for data mining practitioners. In particular, we are the first to analyze the very large Nell dataset using a sparse tensor decomposition, demonstrating that ParCube enables us to handle effectively and efficiently very large datasets.

Keywords

Tensors PARAFAC decomposition sparsity sampling randomized algorithms parallel algorithms 

References

  1. 1.
    Enron e-mail dataset, http://www.cs.cmu.edu/~enron/
  2. 2.
  3. 3.
    Acar, E., Aykut-Bingol, C., Bingol, H., Bro, R., Yener, B.: Multiway analysis of epilepsy tensors. Bioinformatics 23(13), i10–i18 (2007)CrossRefGoogle Scholar
  4. 4.
    Andersson, C.A., Bro, R.: The n-way toolbox for matlab. Chemometrics and Intelligent Laboratory Systems 52(1), 1–4 (2000)CrossRefGoogle Scholar
  5. 5.
    Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30(1), 205–231 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Bader, B.W., Berry, M.W., Browne, M.: Discussion tracking in enron email using parafac. Survey of Text Mining II, 147–163 (2008)Google Scholar
  7. 7.
    Bader, B.W., Harshman, R.A., Kolda, T.G.: Temporal analysis of social networks using three-way dedicom. Sandia National Laboratories TR SAND2006-2161 (2006)Google Scholar
  8. 8.
    Bader, B.W., Kolda, T.G.: Matlab tensor toolbox version 2.2. Sandia National Laboratories, Albuquerque (2007)Google Scholar
  9. 9.
    Bro, R.: Parafac. tutorial and applications. Chemometrics and Intelligent Laboratory Systems 38(2), 149–171 (1997)CrossRefGoogle Scholar
  10. 10.
    Chew, P.A., Bader, B.W., Kolda, T.G., Abdelali, A.: Cross-language information retrieval using parafac2. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 143–152. ACM (2007)Google Scholar
  11. 11.
    Drineas, P., Kannan, R., Mahoney, M.W., et al.: Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing 36(1), 184 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Harshman, R.A.: Foundations of the parafac procedure: Models and conditions for an ”explanatory” multimodal factor analysis (1970)Google Scholar
  13. 13.
    Kolda, T.G., Bader, B.W.: The tophits model for higher-order web link analysis. In: Workshop on Link Analysis, Counterterrorism and Security, vol. 7, pp. 26–29 (2006)Google Scholar
  14. 14.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3) (2009)Google Scholar
  15. 15.
    Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336. ACM (2006)Google Scholar
  16. 16.
    Maruhashi, K., Guo, F., Faloutsos, C.: Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In: Proceedings of the Third International Conference on Advances in Social Network Analysis and Mining (2011)Google Scholar
  17. 17.
    Nion, D., Sidiropoulos, N.D.: Adaptive algorithms to track the parafac decomposition of a third-order tensor. IEEE Transactions on Signal Processing 57(6), 2299–2310 (2009)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Pang, R., Allman, M., Bennett, M., Lee, J., Paxson, V., Tierney, B.: A first look at modern enterprise traffic. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, p. 2. USENIX Association (2005)Google Scholar
  19. 19.
    Papalexakis, E.E., Sidiropoulos, N.D.: Co-clustering as multilinear decomposition with sparse latent factors. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2064–2067. IEEE (2011)Google Scholar
  20. 20.
    Phan, A.H., Cichocki, A.: Block decomposition for very large-scale nonnegative tensor factorization. In: 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 316–319. IEEE (2009)Google Scholar
  21. 21.
    Sidiropoulos, N.D., Giannakis, G.B., Bro, R.: Blind parafac receivers for ds-cdma systems. IEEE Transactions on Signal Processing 48(3), 810–823 (2000)CrossRefGoogle Scholar
  22. 22.
    Sun, J., Papadimitriou, S., Lin, C.Y., Cao, N., Liu, S., Qian, W.: Multivis: Content-based social network exploration through multi-way visual analysis. In: Proc. SDM, vol. 9, pp. 1063–1074 (2009)Google Scholar
  23. 23.
    Sun, J., Tao, D., Faloutsos, C.: Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 374–383. ACM (2006)Google Scholar
  24. 24.
    Tsourakakis, C.E.: Mach: Fast randomized tensor decompositions, Arxiv preprint arXiv:0909.4969 (2009)Google Scholar
  25. 25.
    Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN 2009) (August 2009)Google Scholar
  26. 26.
    Zhang, Q., Berry, M.W., Lamb, B.T., Samuel, T.: A Parallel Nonnegative Tensor Factorization Algorithm for Mining Global Climate Data. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009, Part I. LNCS, vol. 5545, pp. 405–415. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Evangelos E. Papalexakis
    • 1
  • Christos Faloutsos
    • 1
  • Nicholas D. Sidiropoulos
    • 2
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of Electrical and Computer EngineeringUniversity of MinnesotaMinneapolisUSA

Personalised recommendations