Skip to main content

Subspace Clustering Using Matrix Factorization

  • Conference paper
  • First Online:
Advances in Computing and Network Communications

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 735))

Abstract

High-dimensional data suffers from the curse of dimensionality and sparsity problems. Since all samples seem equidistant from each other in high-dimensional space, low-dimensional structures need to be found for cluster formation. This paper proposes a top-down approach for subspace clustering called projective clustering to identify clusters in low-dimensional subspaces using best low-rank matrix factorization strategy, singular value decomposition. The advantages of this approach are twofold. First is to obtain multiple low-dimensional substructures using the best low-rank approximation, thereby reducing the storage requirements. Second is the usage of the obtained projective clusters to retrieve approximate results of a given query in time-efficient manner. Experimentation on six real-world datasets proves the feasibility of our model for approximate information retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. S. Harikumar, A.S. Akhil, Semi supervised approach towards subspace clustering. J. Intell. Fuzzy Syst. 34, 1619–1629 (2018). https://doi.org/10.3233/JIFS-169456

    Article  Google Scholar 

  2. R. Ng, J. Han, Efficient and effective clustering methods for spatial data mining, in Proceedings of the 20th VLDB Conference, pp. 144., 155 (1994)

    Google Scholar 

  3. C. Aggarwal, A. Hinneburg, D. Keim, On the surprising behavior of distance metrics in high dimensional space, in Database Theory-ICDT 2001, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2001), pp. 420–434

    Google Scholar 

  4. S. Chitra Nayagam, Comparative study of subspace clustering algorithms. Int. J. Comput. Sci. Inform. Technol. 6(5), 4459–4464 (2015)

    Google Scholar 

  5. T. Gonzalez, Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–366 (1985)

    Article  MathSciNet  Google Scholar 

  6. R. Lee, Clustering analysis and its applications, in Advances in Information Systems Science, ed. by J. Toum, vol. 8 (Plenum Press, New York, 1981), pp. 169–292

    Google Scholar 

  7. L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in Proceedings of the Twentieth International Conference on Machine Learning, pp. 856–863 (2003)

    Google Scholar 

  8. A. Jain, R. Dubes, Algorithms for Clustering Data (Prentice Hall, Englewood Cliffs, NJ, 1998)

    MATH  Google Scholar 

  9. S. Harikumar, P.V. Surya, K-medoid clustering for heterogeneous datasets, in 4th International Conference on Eco-friendly Computing and Communication Systems (ICECCS)

    Google Scholar 

  10. S. Harikumar, M. Shyju, M.R. Kaimal, SQL-mapreduce hybrid approach towards distributed projected clustering, in 2014 International Conference on Data Science & Engineering (ICDSE)

    Google Scholar 

  11. C.C. Aggarwal, P.S. Yu, Finding generalized projected clusters in high dimensional spaces, in Proceedings of the 2000 ACM SIGMOD international conference on Management of data (ACM Press, 2000), pp. 70–81

    Google Scholar 

  12. L. Parsons, E. Haque, H. Liu, Subspace clustering of high dimensional data: a review, in ACM SIGKDD Explorations Newsletter (2004)

    Google Scholar 

  13. S. Goil, H. Nagesh, A. Choudhary, MAFIA: Efficient and scalable subspace clustering for very large data sets, Technical Report CPDC-TR-9906-010 Northwestern University (1999)

    Google Scholar 

  14. C.C. Aggarwal, C. Procopiuc, J.L. Wolf, P.S. Yu, J.S. Park, Fast algorithms for projected clustering, in SIGMOD ’99, Philadelphia PA Copyright, ACM, 1999 l-581 13-084-8/99/05

    Google Scholar 

  15. P. Pore, Must-Know: What is the curse of dimensionality? https://www.kdnuggets.com/2017/04/must-know-curse-dimensionality.html

  16. K.G. Woo, J.H. Lee, FINDIT: A Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, Korea Advanced Institute of Science and Technology, Taejon, Korea (2002)

    Google Scholar 

  17. Yang et al., ‘It’-clusters: capturing subspace correlation in a large data set, in ICDE pp. 517–528 (2002)

    Google Scholar 

  18. M. Hund, M. Behrisch, I. Farber, M. Sedlmair, T. Schreck, T. Seidl, D. Keim, Subspace nearest neighbor search—problem statement, approaches, and discussion position paper, in International Conference on Similarity Search and Applications SISAP 2015: Similarity Search and Applications, pp. 307–313

    Google Scholar 

  19. T.F. Chan, Rank Revealing QR Factorizations. Department of Mathematics University of Califmia at Los Angeles, Los Angeles, CA

    Google Scholar 

  20. W. Xu, X. Liu, Y. Gong, Document clustering based on non-negative matrix factorization, in SIGIR’03 July 28–August 1, 2003, Toronto, Canada. Copyright 2001 ACM 1-58113-646-3/03/0007

    Google Scholar 

  21. S. Harikumar, S.S. Thaha, MapReduce model for K-Medoid clustering, in 2016 IEEE International Conference on Data Science and Engineering (ICDSE)

    Google Scholar 

  22. Wikipedia, Clustering high-dimensional data. https://en.wikipedia.org/wiki/Clustering_high-dimensional_data

  23. N. Prema, T.K. Smruthy, Personalized multi-relational matrix factorization model for predicting student performance, in Intelligent Systems Technologies and Applications, ed. by S. Berretti, S.M. Thampi, P.R. Srivastava (Springer, Cham, 2016), pp. 163–172

    Google Scholar 

  24. B. Barathi, H. Ganesh, M.A. Kumar, K.P. Soman, Distributional semantic representation in health care text classification, in CEUR Workshop Proceedings, Volume 1737, 2016, Pages 201-204, Forum for Information Retrieval Evaluation, FIRE 2016, Kolkata, India

    Google Scholar 

  25. W. Li, C. Chen, J. Wang, An efficient clustering method for high-dimensional data, Conference: Proceedings of The 2008 International Conference on Data Mining, DMIN 2008, 2 vol., July 14–17, 2008, Las Vegas, USA

    Google Scholar 

  26. N. Lal, S. Qamar, S. Shiwani, Information retrieval system and challenges with dataspace. Int. J. Comput. Appl. 147(8), 23–28 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandhya Harikumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Harikumar, S., Joseph, S. (2021). Subspace Clustering Using Matrix Factorization. In: Thampi, S.M., Gelenbe, E., Atiquzzaman, M., Chaudhary, V., Li, KC. (eds) Advances in Computing and Network Communications. Lecture Notes in Electrical Engineering, vol 735. Springer, Singapore. https://doi.org/10.1007/978-981-33-6977-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6977-1_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6976-4

  • Online ISBN: 978-981-33-6977-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics