Abstract
High-dimensional data suffers from the curse of dimensionality and sparsity problems. Since all samples seem equidistant from each other in high-dimensional space, low-dimensional structures need to be found for cluster formation. This paper proposes a top-down approach for subspace clustering called projective clustering to identify clusters in low-dimensional subspaces using best low-rank matrix factorization strategy, singular value decomposition. The advantages of this approach are twofold. First is to obtain multiple low-dimensional substructures using the best low-rank approximation, thereby reducing the storage requirements. Second is the usage of the obtained projective clusters to retrieve approximate results of a given query in time-efficient manner. Experimentation on six real-world datasets proves the feasibility of our model for approximate information retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
S. Harikumar, A.S. Akhil, Semi supervised approach towards subspace clustering. J. Intell. Fuzzy Syst. 34, 1619–1629 (2018). https://doi.org/10.3233/JIFS-169456
R. Ng, J. Han, Efficient and effective clustering methods for spatial data mining, in Proceedings of the 20th VLDB Conference, pp. 144., 155 (1994)
C. Aggarwal, A. Hinneburg, D. Keim, On the surprising behavior of distance metrics in high dimensional space, in Database Theory-ICDT 2001, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2001), pp. 420–434
S. Chitra Nayagam, Comparative study of subspace clustering algorithms. Int. J. Comput. Sci. Inform. Technol. 6(5), 4459–4464 (2015)
T. Gonzalez, Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–366 (1985)
R. Lee, Clustering analysis and its applications, in Advances in Information Systems Science, ed. by J. Toum, vol. 8 (Plenum Press, New York, 1981), pp. 169–292
L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in Proceedings of the Twentieth International Conference on Machine Learning, pp. 856–863 (2003)
A. Jain, R. Dubes, Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs, NJ, 1998)
S. Harikumar, P.V. Surya, K-medoid clustering for heterogeneous datasets, in 4th International Conference on Eco-friendly Computing and Communication Systems (ICECCS)
S. Harikumar, M. Shyju, M.R. Kaimal, SQL-mapreduce hybrid approach towards distributed projected clustering, in 2014 International Conference on Data Science & Engineering (ICDSE)
C.C. Aggarwal, P.S. Yu, Finding generalized projected clusters in high dimensional spaces, in Proceedings of the 2000 ACM SIGMOD international conference on Management of data (ACM Press, 2000), pp. 70–81
L. Parsons, E. Haque, H. Liu, Subspace clustering of high dimensional data: a review, in ACM SIGKDD Explorations Newsletter (2004)
S. Goil, H. Nagesh, A. Choudhary, MAFIA: Efficient and scalable subspace clustering for very large data sets, Technical Report CPDC-TR-9906-010 Northwestern University (1999)
C.C. Aggarwal, C. Procopiuc, J.L. Wolf, P.S. Yu, J.S. Park, Fast algorithms for projected clustering, in SIGMOD ’99, Philadelphia PA Copyright, ACM, 1999 l-581 13-084-8/99/05
P. Pore, Must-Know: What is the curse of dimensionality? https://www.kdnuggets.com/2017/04/must-know-curse-dimensionality.html
K.G. Woo, J.H. Lee, FINDIT: A Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, Korea Advanced Institute of Science and Technology, Taejon, Korea (2002)
Yang et al., ‘It’-clusters: capturing subspace correlation in a large data set, in ICDE pp. 517–528 (2002)
M. Hund, M. Behrisch, I. Farber, M. Sedlmair, T. Schreck, T. Seidl, D. Keim, Subspace nearest neighbor search—problem statement, approaches, and discussion position paper, in International Conference on Similarity Search and Applications SISAP 2015: Similarity Search and Applications, pp. 307–313
T.F. Chan, Rank Revealing QR Factorizations. Department of Mathematics University of Califmia at Los Angeles, Los Angeles, CA
W. Xu, X. Liu, Y. Gong, Document clustering based on non-negative matrix factorization, in SIGIR’03 July 28–August 1, 2003, Toronto, Canada. Copyright 2001 ACM 1-58113-646-3/03/0007
S. Harikumar, S.S. Thaha, MapReduce model for K-Medoid clustering, in 2016 IEEE International Conference on Data Science and Engineering (ICDSE)
Wikipedia, Clustering high-dimensional data. https://en.wikipedia.org/wiki/Clustering_high-dimensional_data
N. Prema, T.K. Smruthy, Personalized multi-relational matrix factorization model for predicting student performance, in Intelligent Systems Technologies and Applications, ed. by S. Berretti, S.M. Thampi, P.R. Srivastava (Springer, Cham, 2016), pp. 163–172
B. Barathi, H. Ganesh, M.A. Kumar, K.P. Soman, Distributional semantic representation in health care text classification, in CEUR Workshop Proceedings, Volume 1737, 2016, Pages 201-204, Forum for Information Retrieval Evaluation, FIRE 2016, Kolkata, India
W. Li, C. Chen, J. Wang, An efficient clustering method for high-dimensional data, Conference: Proceedings of The 2008 International Conference on Data Mining, DMIN 2008, 2 vol., July 14–17, 2008, Las Vegas, USA
N. Lal, S. Qamar, S. Shiwani, Information retrieval system and challenges with dataspace. Int. J. Comput. Appl. 147(8), 23–28 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Harikumar, S., Joseph, S. (2021). Subspace Clustering Using Matrix Factorization. In: Thampi, S.M., Gelenbe, E., Atiquzzaman, M., Chaudhary, V., Li, KC. (eds) Advances in Computing and Network Communications. Lecture Notes in Electrical Engineering, vol 735. Springer, Singapore. https://doi.org/10.1007/978-981-33-6977-1_17
Download citation
DOI: https://doi.org/10.1007/978-981-33-6977-1_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6976-4
Online ISBN: 978-981-33-6977-1
eBook Packages: EngineeringEngineering (R0)