Skip to main content

Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

  • Chapter
  • First Online:
  • 1988 Accesses

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

Abstract

Tensors are commonly used for representing multi-modal data, such as Web graphs, sensor streams, and social networks. As a consequence of this, tensor-based algorithms, most notably tensor decomposition, are becoming a core tool for data analysis and knowledge discovery, including clustering. Intuitively, tensor decomposition process generalizes matrix decomposition to high-dimensional arrays (known as tensors) and rewrites the given tensor in the form of a set of factor matrices (one for each mode of the input tensor) and a core tensor (which, intuitively, describes the spectral structure of the given tensor). These factor matrices and core tensors then can be used for obtaining multi-modal clusters of the input data. One key problem with tensor decomposition, however, is its computational complexity. One way to deal with this challenge is to partition the tensor and obtain the tensor decomposition leveraging these smaller partitions. This solution, however, leaves an important open question: how to most effectively combine results from these partitions. In this chapter, we introduce the notion of sub-tensor impact graphs (SIGs), which quantify how the decompositions of these sub-partitions impact each other and the overall tensor decomposition accuracy and present several complementary algorithms that leverage this novel concept to address various key challenges in tensor decomposition: (a) Personalized Tensor Decomposition (PTD) algorithm leverages sub-tensor impact graphs to focus the accuracy of the tensor decomposition process on parts of the data tensor which are most relevant to a particular clustering task; whereas the (b) noise-profile adaptive tensor decomposition (nTD) method leverages limited a priori information about noise distribution in the data to improve tensor decomposition accuracy. Finally, (c) a two-phase block-incremental tensor decomposition technique, BICP, efficiently and effectively maintains tensor decomposition results in the presence of incrementally evolving tensor data. We also present experimental results, with diverse data sets, that show that, if properly constructed, sub-tensor impact graphs can indeed help overcome various density and noise challenges in clustering of multi-modal data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    If the sub-tensor is empty, then the factors are 0 matrices of the appropriate size.

  2. 2.

    cl(A) is similarly constructed from sub-tensor \( {\mathcal {X}}_{\mathbf {l}}\).

  3. 3.

    Note that, since in general, the number of partitions is small and is independent of the size of the input tensor, the cost of the PPR computation to obtain the ranks is negligible next to the cost of tensor decomposition.

  4. 4.

    It is trivial to modify this equation such that the smallest rank will correspond to a user provided lower bound, \(F_{\min }\), when such a lower bound is provided by the user.

  5. 5.

    Here we report the complexity of phase2 − I and other refinement method complexity can be derived similarly.

  6. 6.

    While this minimality criterion is not strictly required, the fewer partitions there are, the faster and potentially more effective will be the personalization process.

  7. 7.

    It is trivial to modify this equation such that the smallest rank will correspond to a user provided lower bound, \(F_{\min }\), when such a lower bound is provided by the user.

References

  1. B.W. Bader, T.G. Kolda et al., MATLAB Tensor Toolbox Version 2.5. Available online (January 2012)

    Google Scholar 

  2. A. Balmin, V. Hristidis, Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases, in Proceedings of the 30th International Conference on Very Large Data Bases (VLDB) (2004)

    Google Scholar 

  3. X. Cao, X. Wei, Y. Han, D. Lin, Robust face clustering via tensor decomposition. IEEE Trans. Cybern. 45(11), 2546–2557 (2015)

    Article  Google Scholar 

  4. S. Chakrabarti, Dynamic personalized pagerank in entity-relation graphs, in Proceeding WWW ’07 Proceedings of the 16th International Conference on World Wide Web (2007)

    Google Scholar 

  5. X. Chen, K.S. Candan, LWI-SVD: low-rank, windowed, incremental singular value decompositions on time-evolving data sets, in KDD ’14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)

    Google Scholar 

  6. I. Davidson, S. Gilpin, O. Carmichael, P. Walker, Network discovery via constrained tensor analysis of FMRI data, in 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 194–202 (2013)

    Google Scholar 

  7. C. Ding, X. He, K-means clustering via principal component analysis, in ICML ’04 Proceedings of the Twenty-First International Conference on Machine Learning (2004)

    Google Scholar 

  8. P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)

    Article  Google Scholar 

  9. F.M. Harper, J.A. Konstan, The MovieLens datasets: history and context. Trans. Interact. Intell. Syst. 5, 19:1–19:19 (2015)

    Article  Google Scholar 

  10. R.A. Harshman, Foundations of the PARAFAC procedure: model and conditions for an explanatory multi-mode factor analysis. UCLA Working Papers in Phonetics, vol. 16 (1970), pp. 1–84

    Google Scholar 

  11. S. Huang, K.S. Candan, M.L. Sapino, BICP: block-incremental CP decomposition with update sensitive refinement, in Proceeding CIKM ’16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016)

    Google Scholar 

  12. I. Jeon, E. Papalexakis, U. Kang, C. Faloutsos, HaTen2: billionscale tensor decompositions, in Proceedings - International Conference on Data Engineering (ICDE) (2015)

    Google Scholar 

  13. B. Jeon, I. Jeon, L. Sael, U. Kang, SCouT: scalable coupled matrix-tensor factorization - algorithm and discoveries, in IEEE 32nd International Conference on Data Engineering (ICDE) (2016)

    Google Scholar 

  14. U. Kang, E.E. Papalexakis, A. Harpale, C. Faloutsos, Gigatensor: scaling tensor analysis up by 100 times algorithms and discoveries, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 316–324 (2012)

    Google Scholar 

  15. M. Kim, K.S. Candan, Decomposition by normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions. Data Min. Knowl. Disc. 30(1), 1–46 (2016)

    Article  MathSciNet  Google Scholar 

  16. T.G. Kolda, B.W. Bader, Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  Google Scholar 

  17. T.G. Kolda, J. Sun, Scalable tensor decompositions for multi-aspect data mining, in Eighth IEEE International Conference on Data Mining (ICDM) (2008)

    Google Scholar 

  18. X. Li, S.Y. Huang, K.S. Candan, M.L. Sapino, Focusing decomposition accuracy by personalizing tensor decomposition (PTD), in Proceeding CIKM ’14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014)

    Google Scholar 

  19. X. Li, K.S. Candan, M.L. Sapino, nTD: noise-profile adaptive tensor decomposition, in Proceeding WWW ’17 Proceedings of the 26th International Conference on World Wide Web (2017)

    Google Scholar 

  20. S. Papadimitriou, J. Sun, C. Faloutsos, Streaming pattern discovery in multiple time-series, in Proceeding VLDB ’05 Proceedings of the 31st International Conference on Very Large Data Bases (2015)

    Google Scholar 

  21. E. Papalexakis, C. Faloutsos, N. Sidiropoulos, Parcube: sparse parallelizable tensor decompositions, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 521–536 (2012)

    Chapter  Google Scholar 

  22. I. Perros, E.E. Papalexakis, F. Wang, R. Vuduc, E. Searles, M. Thompson, J. Sun, Spartan: scalable parafac2 for large & sparse data (2017). arXiv preprint arXiv:1703.04219

    Google Scholar 

  23. A.H. Phan, A. Cichocki, PARAFAC algorithms for large-scale problems. Neurocomputing 74(11), 1970–1984 (2011)

    Article  Google Scholar 

  24. C.E. Priebe et al., Enron data set (2006). http://cis.jhu.edu/parky/Enron/enron.html

  25. R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in Proceeding NIPS’07 Proceedings of the 20th International Conference on Neural Information Processing Systems (2007)

    Google Scholar 

  26. J. Sun, S. Papadimitriou, P.S. Yu, Window based tensor analysis on high dimensional and multi aspect streams, in Sixth International Conference on Data Mining (ICDM’06), pp. 1076–1080 (2006)

    Google Scholar 

  27. J. Sun, D. Tao, S. Papadimitriou, P.S. Yu, C. Faloutsos, Incremental tensor analysis: theory and applications. ACM Trans. Knowl. Discov. Data 2(3), Article No. 11 (2008)

    Article  Google Scholar 

  28. Y. Sun, J. Gao, X. Hong, B. Mishra, B. Yin, Heterogeneous tensor decomposition for clustering via manifold optimization. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 476–489 (2016)

    Article  Google Scholar 

  29. J. Tang et al., Trust & distrust computing dataset (2011). https://www.cse.msu.edu/~tangjili/trust.html

  30. C.E. Tsourakakis, Mach: fast randomized tensor decompositions (2009). Arxiv preprint arXiv:0909.4969

    Google Scholar 

  31. L. Tucker, Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)

    Article  MathSciNet  Google Scholar 

  32. J. Wu, Z. Wang, Y. Wu, L. Liu, S. Deng, H. Huang, Tensor CP decomposition method for clustering heterogeneous information networks via stochastic gradient descent algorithms. Sci. Program. 2017, 13 (2017), Article ID 2803091

    Google Scholar 

  33. L. Xiong et al., Temporal collaborative filtering with Bayesian probabilistic tensor factorization, in Proceedings of the 2010 SIAM International Conference on Data Mining (2010)

    Chapter  Google Scholar 

Download references

Acknowledgements

Research is supported by NSF#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations,” NSF#1339835 “E-SDMS: Energy Simulation Data Management System Software,” NSF#1610282 “DataStorm: A Data Enabled System for End-to-End Disaster Planning and Response,” NSF#1633381 “BIGDATA: Discovering Context-Sensitive Impact in Complex Systems,” and “FourCmodeling”: EUH2020 Marie Sklodowska-Curie grant agreement No 690817.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Selçuk Candan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Candan, K.S., Huang, S., Li, X., Sapino, M.L. (2019). Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs. In: Nasraoui, O., Ben N'Cir, CE. (eds) Clustering Methods for Big Data Analytics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-97864-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97864-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97863-5

  • Online ISBN: 978-3-319-97864-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics