Subspace Clustering Using Matrix Factorization

Harikumar, Sandhya; Joseph, Shilpa

doi:10.1007/978-981-33-6977-1_17

Sandhya Harikumar³⁹ &
Shilpa Joseph³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 735))

563 Accesses
2 Citations

Abstract

High-dimensional data suffers from the curse of dimensionality and sparsity problems. Since all samples seem equidistant from each other in high-dimensional space, low-dimensional structures need to be found for cluster formation. This paper proposes a top-down approach for subspace clustering called projective clustering to identify clusters in low-dimensional subspaces using best low-rank matrix factorization strategy, singular value decomposition. The advantages of this approach are twofold. First is to obtain multiple low-dimensional substructures using the best low-rank approximation, thereby reducing the storage requirements. Second is the usage of the obtained projective clusters to retrieve approximate results of a given query in time-efficient manner. Experimentation on six real-world datasets proves the feasibility of our model for approximate information retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Coupled low rank representation and subspace clustering

Article 05 May 2021

Subspace Clustering via Adaptive Low-Rank Model

Finding Well-Clusterable Subspaces for High Dimensional Data

References

S. Harikumar, A.S. Akhil, Semi supervised approach towards subspace clustering. J. Intell. Fuzzy Syst. 34, 1619–1629 (2018). https://doi.org/10.3233/JIFS-169456
Article Google Scholar
R. Ng, J. Han, Efficient and effective clustering methods for spatial data mining, in Proceedings of the 20th VLDB Conference, pp. 144., 155 (1994)
Google Scholar
C. Aggarwal, A. Hinneburg, D. Keim, On the surprising behavior of distance metrics in high dimensional space, in Database Theory-ICDT 2001, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2001), pp. 420–434
Google Scholar
S. Chitra Nayagam, Comparative study of subspace clustering algorithms. Int. J. Comput. Sci. Inform. Technol. 6(5), 4459–4464 (2015)
Google Scholar
T. Gonzalez, Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–366 (1985)
Article MathSciNet Google Scholar
R. Lee, Clustering analysis and its applications, in Advances in Information Systems Science, ed. by J. Toum, vol. 8 (Plenum Press, New York, 1981), pp. 169–292
Google Scholar
L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in Proceedings of the Twentieth International Conference on Machine Learning, pp. 856–863 (2003)
Google Scholar
A. Jain, R. Dubes, Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs, NJ, 1998)
MATH Google Scholar
S. Harikumar, P.V. Surya, K-medoid clustering for heterogeneous datasets, in 4th International Conference on Eco-friendly Computing and Communication Systems (ICECCS)
Google Scholar
S. Harikumar, M. Shyju, M.R. Kaimal, SQL-mapreduce hybrid approach towards distributed projected clustering, in 2014 International Conference on Data Science & Engineering (ICDSE)
Google Scholar
C.C. Aggarwal, P.S. Yu, Finding generalized projected clusters in high dimensional spaces, in Proceedings of the 2000 ACM SIGMOD international conference on Management of data (ACM Press, 2000), pp. 70–81
Google Scholar
L. Parsons, E. Haque, H. Liu, Subspace clustering of high dimensional data: a review, in ACM SIGKDD Explorations Newsletter (2004)
Google Scholar
S. Goil, H. Nagesh, A. Choudhary, MAFIA: Efficient and scalable subspace clustering for very large data sets, Technical Report CPDC-TR-9906-010 Northwestern University (1999)
Google Scholar
C.C. Aggarwal, C. Procopiuc, J.L. Wolf, P.S. Yu, J.S. Park, Fast algorithms for projected clustering, in SIGMOD ’99, Philadelphia PA Copyright, ACM, 1999 l-581 13-084-8/99/05
Google Scholar
P. Pore, Must-Know: What is the curse of dimensionality? https://www.kdnuggets.com/2017/04/must-know-curse-dimensionality.html
K.G. Woo, J.H. Lee, FINDIT: A Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, Korea Advanced Institute of Science and Technology, Taejon, Korea (2002)
Google Scholar
Yang et al., ‘It’-clusters: capturing subspace correlation in a large data set, in ICDE pp. 517–528 (2002)
Google Scholar
M. Hund, M. Behrisch, I. Farber, M. Sedlmair, T. Schreck, T. Seidl, D. Keim, Subspace nearest neighbor search—problem statement, approaches, and discussion position paper, in International Conference on Similarity Search and Applications SISAP 2015: Similarity Search and Applications, pp. 307–313
Google Scholar
T.F. Chan, Rank Revealing QR Factorizations. Department of Mathematics University of Califmia at Los Angeles, Los Angeles, CA
Google Scholar
W. Xu, X. Liu, Y. Gong, Document clustering based on non-negative matrix factorization, in SIGIR’03 July 28–August 1, 2003, Toronto, Canada. Copyright 2001 ACM 1-58113-646-3/03/0007
Google Scholar
S. Harikumar, S.S. Thaha, MapReduce model for K-Medoid clustering, in 2016 IEEE International Conference on Data Science and Engineering (ICDSE)
Google Scholar
Wikipedia, Clustering high-dimensional data. https://en.wikipedia.org/wiki/Clustering_high-dimensional_data
N. Prema, T.K. Smruthy, Personalized multi-relational matrix factorization model for predicting student performance, in Intelligent Systems Technologies and Applications, ed. by S. Berretti, S.M. Thampi, P.R. Srivastava (Springer, Cham, 2016), pp. 163–172
Google Scholar
B. Barathi, H. Ganesh, M.A. Kumar, K.P. Soman, Distributional semantic representation in health care text classification, in CEUR Workshop Proceedings, Volume 1737, 2016, Pages 201-204, Forum for Information Retrieval Evaluation, FIRE 2016, Kolkata, India
Google Scholar
W. Li, C. Chen, J. Wang, An efficient clustering method for high-dimensional data, Conference: Proceedings of The 2008 International Conference on Data Mining, DMIN 2008, 2 vol., July 14–17, 2008, Las Vegas, USA
Google Scholar
N. Lal, S. Qamar, S. Shiwani, Information retrieval system and challenges with dataspace. Int. J. Comput. Appl. 147(8), 23–28 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
Sandhya Harikumar & Shilpa Joseph

Authors

Sandhya Harikumar
View author publications
You can also search for this author in PubMed Google Scholar
Shilpa Joseph
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandhya Harikumar .

Editor information

Editors and Affiliations

School of Computer Science & Engineering, Indian Institute of Information Technology and Management-Kerala (IIITM-K), Trivandrum, Kerala, India
Sabu M. Thampi
Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
Erol Gelenbe
School of Computer Science, University of Oklahoma, Norman, OK, USA
Mohammed Atiquzzaman
Department of Computer Science, University at Buffalo, State University, Buffalo, NY, USA
Vipin Chaudhary
Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Kuan-Ching Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Harikumar, S., Joseph, S. (2021). Subspace Clustering Using Matrix Factorization. In: Thampi, S.M., Gelenbe, E., Atiquzzaman, M., Chaudhary, V., Li, KC. (eds) Advances in Computing and Network Communications. Lecture Notes in Electrical Engineering, vol 735. Springer, Singapore. https://doi.org/10.1007/978-981-33-6977-1_17

Download citation

DOI: https://doi.org/10.1007/978-981-33-6977-1_17
Published: 21 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6976-4
Online ISBN: 978-981-33-6977-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Subspace Clustering Using Matrix Factorization

Abstract

Access this chapter

Similar content being viewed by others

Coupled low rank representation and subspace clustering

Subspace Clustering via Adaptive Low-Rank Model

Finding Well-Clusterable Subspaces for High Dimensional Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Subspace Clustering Using Matrix Factorization

Abstract

Access this chapter

Similar content being viewed by others

Coupled low rank representation and subspace clustering

Subspace Clustering via Adaptive Low-Rank Model

Finding Well-Clusterable Subspaces for High Dimensional Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation