Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

Candan, K. Selçuk; Huang, Shengyu; Li, Xinsheng; Sapino, Maria Luisa

doi:10.1007/978-3-319-97864-2_7

Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs

K. Selçuk Candan⁴,
Shengyu Huang⁴,
Xinsheng Li⁴ &
…
Maria Luisa Sapino⁵

Chapter
First Online: 28 October 2018

1988 Accesses

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

Abstract

Tensors are commonly used for representing multi-modal data, such as Web graphs, sensor streams, and social networks. As a consequence of this, tensor-based algorithms, most notably tensor decomposition, are becoming a core tool for data analysis and knowledge discovery, including clustering. Intuitively, tensor decomposition process generalizes matrix decomposition to high-dimensional arrays (known as tensors) and rewrites the given tensor in the form of a set of factor matrices (one for each mode of the input tensor) and a core tensor (which, intuitively, describes the spectral structure of the given tensor). These factor matrices and core tensors then can be used for obtaining multi-modal clusters of the input data. One key problem with tensor decomposition, however, is its computational complexity. One way to deal with this challenge is to partition the tensor and obtain the tensor decomposition leveraging these smaller partitions. This solution, however, leaves an important open question: how to most effectively combine results from these partitions. In this chapter, we introduce the notion of sub-tensor impact graphs (SIGs), which quantify how the decompositions of these sub-partitions impact each other and the overall tensor decomposition accuracy and present several complementary algorithms that leverage this novel concept to address various key challenges in tensor decomposition: (a) Personalized Tensor Decomposition (PTD) algorithm leverages sub-tensor impact graphs to focus the accuracy of the tensor decomposition process on parts of the data tensor which are most relevant to a particular clustering task; whereas the (b) noise-profile adaptive tensor decomposition (nTD) method leverages limited a priori information about noise distribution in the data to improve tensor decomposition accuracy. Finally, (c) a two-phase block-incremental tensor decomposition technique, BICP, efficiently and effectively maintains tensor decomposition results in the presence of incrementally evolving tensor data. We also present experimental results, with diverse data sets, that show that, if properly constructed, sub-tensor impact graphs can indeed help overcome various density and noise challenges in clustering of multi-modal data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
If the sub-tensor is empty, then the factors are 0 matrices of the appropriate size.
2.
c_l(A) is similarly constructed from sub-tensor \( {\mathcal {X}}_{\mathbf {l}}\).
3.
Note that, since in general, the number of partitions is small and is independent of the size of the input tensor, the cost of the PPR computation to obtain the ranks is negligible next to the cost of tensor decomposition.
4.
It is trivial to modify this equation such that the smallest rank will correspond to a user provided lower bound, \(F_{\min }\), when such a lower bound is provided by the user.
5.
Here we report the complexity of phase2 − I and other refinement method complexity can be derived similarly.
6.
While this minimality criterion is not strictly required, the fewer partitions there are, the faster and potentially more effective will be the personalization process.
7.
It is trivial to modify this equation such that the smallest rank will correspond to a user provided lower bound, \(F_{\min }\), when such a lower bound is provided by the user.

References

B.W. Bader, T.G. Kolda et al., MATLAB Tensor Toolbox Version 2.5. Available online (January 2012)
Google Scholar
A. Balmin, V. Hristidis, Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases, in Proceedings of the 30th International Conference on Very Large Data Bases (VLDB) (2004)
Google Scholar
X. Cao, X. Wei, Y. Han, D. Lin, Robust face clustering via tensor decomposition. IEEE Trans. Cybern. 45(11), 2546–2557 (2015)
Article Google Scholar
S. Chakrabarti, Dynamic personalized pagerank in entity-relation graphs, in Proceeding WWW ’07 Proceedings of the 16th International Conference on World Wide Web (2007)
Google Scholar
X. Chen, K.S. Candan, LWI-SVD: low-rank, windowed, incremental singular value decompositions on time-evolving data sets, in KDD ’14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
Google Scholar
I. Davidson, S. Gilpin, O. Carmichael, P. Walker, Network discovery via constrained tensor analysis of FMRI data, in 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 194–202 (2013)
Google Scholar
C. Ding, X. He, K-means clustering via principal component analysis, in ICML ’04 Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)
Article Google Scholar
F.M. Harper, J.A. Konstan, The MovieLens datasets: history and context. Trans. Interact. Intell. Syst. 5, 19:1–19:19 (2015)
Article Google Scholar
R.A. Harshman, Foundations of the PARAFAC procedure: model and conditions for an explanatory multi-mode factor analysis. UCLA Working Papers in Phonetics, vol. 16 (1970), pp. 1–84
Google Scholar
S. Huang, K.S. Candan, M.L. Sapino, BICP: block-incremental CP decomposition with update sensitive refinement, in Proceeding CIKM ’16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016)
Google Scholar
I. Jeon, E. Papalexakis, U. Kang, C. Faloutsos, HaTen2: billionscale tensor decompositions, in Proceedings - International Conference on Data Engineering (ICDE) (2015)
Google Scholar
B. Jeon, I. Jeon, L. Sael, U. Kang, SCouT: scalable coupled matrix-tensor factorization - algorithm and discoveries, in IEEE 32nd International Conference on Data Engineering (ICDE) (2016)
Google Scholar
U. Kang, E.E. Papalexakis, A. Harpale, C. Faloutsos, Gigatensor: scaling tensor analysis up by 100 times algorithms and discoveries, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 316–324 (2012)
Google Scholar
M. Kim, K.S. Candan, Decomposition by normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions. Data Min. Knowl. Disc. 30(1), 1–46 (2016)
Article MathSciNet Google Scholar
T.G. Kolda, B.W. Bader, Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet Google Scholar
T.G. Kolda, J. Sun, Scalable tensor decompositions for multi-aspect data mining, in Eighth IEEE International Conference on Data Mining (ICDM) (2008)
Google Scholar
X. Li, S.Y. Huang, K.S. Candan, M.L. Sapino, Focusing decomposition accuracy by personalizing tensor decomposition (PTD), in Proceeding CIKM ’14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014)
Google Scholar
X. Li, K.S. Candan, M.L. Sapino, nTD: noise-profile adaptive tensor decomposition, in Proceeding WWW ’17 Proceedings of the 26th International Conference on World Wide Web (2017)
Google Scholar
S. Papadimitriou, J. Sun, C. Faloutsos, Streaming pattern discovery in multiple time-series, in Proceeding VLDB ’05 Proceedings of the 31st International Conference on Very Large Data Bases (2015)
Google Scholar
E. Papalexakis, C. Faloutsos, N. Sidiropoulos, Parcube: sparse parallelizable tensor decompositions, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 521–536 (2012)
Chapter Google Scholar
I. Perros, E.E. Papalexakis, F. Wang, R. Vuduc, E. Searles, M. Thompson, J. Sun, Spartan: scalable parafac2 for large & sparse data (2017). arXiv preprint arXiv:1703.04219
Google Scholar
A.H. Phan, A. Cichocki, PARAFAC algorithms for large-scale problems. Neurocomputing 74(11), 1970–1984 (2011)
Article Google Scholar
C.E. Priebe et al., Enron data set (2006). http://cis.jhu.edu/parky/Enron/enron.html
R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in Proceeding NIPS’07 Proceedings of the 20th International Conference on Neural Information Processing Systems (2007)
Google Scholar
J. Sun, S. Papadimitriou, P.S. Yu, Window based tensor analysis on high dimensional and multi aspect streams, in Sixth International Conference on Data Mining (ICDM’06), pp. 1076–1080 (2006)
Google Scholar
J. Sun, D. Tao, S. Papadimitriou, P.S. Yu, C. Faloutsos, Incremental tensor analysis: theory and applications. ACM Trans. Knowl. Discov. Data 2(3), Article No. 11 (2008)
Article Google Scholar
Y. Sun, J. Gao, X. Hong, B. Mishra, B. Yin, Heterogeneous tensor decomposition for clustering via manifold optimization. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 476–489 (2016)
Article Google Scholar
J. Tang et al., Trust & distrust computing dataset (2011). https://www.cse.msu.edu/~tangjili/trust.html
C.E. Tsourakakis, Mach: fast randomized tensor decompositions (2009). Arxiv preprint arXiv:0909.4969
Google Scholar
L. Tucker, Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)
Article MathSciNet Google Scholar
J. Wu, Z. Wang, Y. Wu, L. Liu, S. Deng, H. Huang, Tensor CP decomposition method for clustering heterogeneous information networks via stochastic gradient descent algorithms. Sci. Program. 2017, 13 (2017), Article ID 2803091
Google Scholar
L. Xiong et al., Temporal collaborative filtering with Bayesian probabilistic tensor factorization, in Proceedings of the 2010 SIAM International Conference on Data Mining (2010)
Chapter Google Scholar

Download references

Acknowledgements

Research is supported by NSF#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations,” NSF#1339835 “E-SDMS: Energy Simulation Data Management System Software,” NSF#1610282 “DataStorm: A Data Enabled System for End-to-End Disaster Planning and Response,” NSF#1633381 “BIGDATA: Discovering Context-Sensitive Impact in Complex Systems,” and “FourCmodeling”: EUH2020 Marie Sklodowska-Curie grant agreement No 690817.

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan, Shengyu Huang & Xinsheng Li
University of Torino, Torino, Italy
Maria Luisa Sapino

Authors

K. Selçuk Candan
View author publications
You can also search for this author in PubMed Google Scholar
Shengyu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xinsheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Maria Luisa Sapino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Selçuk Candan .

Editor information

Editors and Affiliations

Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA
Olfa Nasraoui
University of Jeddah, Jeddah, Saudi Arabia
Chiheb-Eddine Ben N'Cir

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Candan, K.S., Huang, S., Li, X., Sapino, M.L. (2019). Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs. In: Nasraoui, O., Ben N'Cir, CE. (eds) Clustering Methods for Big Data Analytics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-97864-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-97864-2_7
Published: 28 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97863-5
Online ISBN: 978-3-319-97864-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics