Abstract
Tensors are commonly used for representing multi-modal data, such as Web graphs, sensor streams, and social networks. As a consequence of this, tensor-based algorithms, most notably tensor decomposition, are becoming a core tool for data analysis and knowledge discovery, including clustering. Intuitively, tensor decomposition process generalizes matrix decomposition to high-dimensional arrays (known as tensors) and rewrites the given tensor in the form of a set of factor matrices (one for each mode of the input tensor) and a core tensor (which, intuitively, describes the spectral structure of the given tensor). These factor matrices and core tensors then can be used for obtaining multi-modal clusters of the input data. One key problem with tensor decomposition, however, is its computational complexity. One way to deal with this challenge is to partition the tensor and obtain the tensor decomposition leveraging these smaller partitions. This solution, however, leaves an important open question: how to most effectively combine results from these partitions. In this chapter, we introduce the notion of sub-tensor impact graphs (SIGs), which quantify how the decompositions of these sub-partitions impact each other and the overall tensor decomposition accuracy and present several complementary algorithms that leverage this novel concept to address various key challenges in tensor decomposition: (a) Personalized Tensor Decomposition (PTD) algorithm leverages sub-tensor impact graphs to focus the accuracy of the tensor decomposition process on parts of the data tensor which are most relevant to a particular clustering task; whereas the (b) noise-profile adaptive tensor decomposition (nTD) method leverages limited a priori information about noise distribution in the data to improve tensor decomposition accuracy. Finally, (c) a two-phase block-incremental tensor decomposition technique, BICP, efficiently and effectively maintains tensor decomposition results in the presence of incrementally evolving tensor data. We also present experimental results, with diverse data sets, that show that, if properly constructed, sub-tensor impact graphs can indeed help overcome various density and noise challenges in clustering of multi-modal data sets.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
If the sub-tensor is empty, then the factors are 0 matrices of the appropriate size.
- 2.
cl(A) is similarly constructed from sub-tensor \( {\mathcal {X}}_{\mathbf {l}}\).
- 3.
Note that, since in general, the number of partitions is small and is independent of the size of the input tensor, the cost of the PPR computation to obtain the ranks is negligible next to the cost of tensor decomposition.
- 4.
It is trivial to modify this equation such that the smallest rank will correspond to a user provided lower bound, \(F_{\min }\), when such a lower bound is provided by the user.
- 5.
Here we report the complexity of phase2 − I and other refinement method complexity can be derived similarly.
- 6.
While this minimality criterion is not strictly required, the fewer partitions there are, the faster and potentially more effective will be the personalization process.
- 7.
It is trivial to modify this equation such that the smallest rank will correspond to a user provided lower bound, \(F_{\min }\), when such a lower bound is provided by the user.
References
B.W. Bader, T.G. Kolda et al., MATLAB Tensor Toolbox Version 2.5. Available online (January 2012)
A. Balmin, V. Hristidis, Y. Papakonstantinou. ObjectRank: authority-based keyword search in databases, in Proceedings of the 30th International Conference on Very Large Data Bases (VLDB) (2004)
X. Cao, X. Wei, Y. Han, D. Lin, Robust face clustering via tensor decomposition. IEEE Trans. Cybern. 45(11), 2546–2557 (2015)
S. Chakrabarti, Dynamic personalized pagerank in entity-relation graphs, in Proceeding WWW ’07 Proceedings of the 16th International Conference on World Wide Web (2007)
X. Chen, K.S. Candan, LWI-SVD: low-rank, windowed, incremental singular value decompositions on time-evolving data sets, in KDD ’14 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
I. Davidson, S. Gilpin, O. Carmichael, P. Walker, Network discovery via constrained tensor analysis of FMRI data, in 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 194–202 (2013)
C. Ding, X. He, K-means clustering via principal component analysis, in ICML ’04 Proceedings of the Twenty-First International Conference on Machine Learning (2004)
P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)
F.M. Harper, J.A. Konstan, The MovieLens datasets: history and context. Trans. Interact. Intell. Syst. 5, 19:1–19:19 (2015)
R.A. Harshman, Foundations of the PARAFAC procedure: model and conditions for an explanatory multi-mode factor analysis. UCLA Working Papers in Phonetics, vol. 16 (1970), pp. 1–84
S. Huang, K.S. Candan, M.L. Sapino, BICP: block-incremental CP decomposition with update sensitive refinement, in Proceeding CIKM ’16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016)
I. Jeon, E. Papalexakis, U. Kang, C. Faloutsos, HaTen2: billionscale tensor decompositions, in Proceedings - International Conference on Data Engineering (ICDE) (2015)
B. Jeon, I. Jeon, L. Sael, U. Kang, SCouT: scalable coupled matrix-tensor factorization - algorithm and discoveries, in IEEE 32nd International Conference on Data Engineering (ICDE) (2016)
U. Kang, E.E. Papalexakis, A. Harpale, C. Faloutsos, Gigatensor: scaling tensor analysis up by 100 times algorithms and discoveries, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 316–324 (2012)
M. Kim, K.S. Candan, Decomposition by normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions. Data Min. Knowl. Disc. 30(1), 1–46 (2016)
T.G. Kolda, B.W. Bader, Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
T.G. Kolda, J. Sun, Scalable tensor decompositions for multi-aspect data mining, in Eighth IEEE International Conference on Data Mining (ICDM) (2008)
X. Li, S.Y. Huang, K.S. Candan, M.L. Sapino, Focusing decomposition accuracy by personalizing tensor decomposition (PTD), in Proceeding CIKM ’14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014)
X. Li, K.S. Candan, M.L. Sapino, nTD: noise-profile adaptive tensor decomposition, in Proceeding WWW ’17 Proceedings of the 26th International Conference on World Wide Web (2017)
S. Papadimitriou, J. Sun, C. Faloutsos, Streaming pattern discovery in multiple time-series, in Proceeding VLDB ’05 Proceedings of the 31st International Conference on Very Large Data Bases (2015)
E. Papalexakis, C. Faloutsos, N. Sidiropoulos, Parcube: sparse parallelizable tensor decompositions, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 521–536 (2012)
I. Perros, E.E. Papalexakis, F. Wang, R. Vuduc, E. Searles, M. Thompson, J. Sun, Spartan: scalable parafac2 for large & sparse data (2017). arXiv preprint arXiv:1703.04219
A.H. Phan, A. Cichocki, PARAFAC algorithms for large-scale problems. Neurocomputing 74(11), 1970–1984 (2011)
C.E. Priebe et al., Enron data set (2006). http://cis.jhu.edu/parky/Enron/enron.html
R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in Proceeding NIPS’07 Proceedings of the 20th International Conference on Neural Information Processing Systems (2007)
J. Sun, S. Papadimitriou, P.S. Yu, Window based tensor analysis on high dimensional and multi aspect streams, in Sixth International Conference on Data Mining (ICDM’06), pp. 1076–1080 (2006)
J. Sun, D. Tao, S. Papadimitriou, P.S. Yu, C. Faloutsos, Incremental tensor analysis: theory and applications. ACM Trans. Knowl. Discov. Data 2(3), Article No. 11 (2008)
Y. Sun, J. Gao, X. Hong, B. Mishra, B. Yin, Heterogeneous tensor decomposition for clustering via manifold optimization. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 476–489 (2016)
J. Tang et al., Trust & distrust computing dataset (2011). https://www.cse.msu.edu/~tangjili/trust.html
C.E. Tsourakakis, Mach: fast randomized tensor decompositions (2009). Arxiv preprint arXiv:0909.4969
L. Tucker, Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)
J. Wu, Z. Wang, Y. Wu, L. Liu, S. Deng, H. Huang, Tensor CP decomposition method for clustering heterogeneous information networks via stochastic gradient descent algorithms. Sci. Program. 2017, 13 (2017), Article ID 2803091
L. Xiong et al., Temporal collaborative filtering with Bayesian probabilistic tensor factorization, in Proceedings of the 2010 SIAM International Conference on Data Mining (2010)
Acknowledgements
Research is supported by NSF#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations,” NSF#1339835 “E-SDMS: Energy Simulation Data Management System Software,” NSF#1610282 “DataStorm: A Data Enabled System for End-to-End Disaster Planning and Response,” NSF#1633381 “BIGDATA: Discovering Context-Sensitive Impact in Complex Systems,” and “FourCmodeling”: EUH2020 Marie Sklodowska-Curie grant agreement No 690817.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Candan, K.S., Huang, S., Li, X., Sapino, M.L. (2019). Effective Tensor-Based Data Clustering Through Sub-Tensor Impact Graphs. In: Nasraoui, O., Ben N'Cir, CE. (eds) Clustering Methods for Big Data Analytics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-97864-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-97864-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97863-5
Online ISBN: 978-3-319-97864-2
eBook Packages: EngineeringEngineering (R0)