Skip to main content

Adapting K-Means Algorithm for Discovering Clusters in Subspaces

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Abstract

Subspace clustering is a challenging task in the field of data mining. Traditional distance measures fail to differentiate the furthest point from the nearest point in very high dimensional data space. To tackle the problem, we design minimal subspace distance which measures the similarity between two points in the subspace where they are nearest to each other. It can discover subspace clusters implicitly when measuring the similarities between points. We use the new similarity measure to improve traditional k-means algorithm for discovering clusters in subspaces. By clustering with low-dimensional minimal subspace distance first, the clusters in low-dimensional subspaces are detected. Then by gradually increasing the dimension of minimal subspace distance, the clusters get refined in higher dimensional subspaces. Our experiments on both synthetic data and real data show the effectiveness of the proposed similarity measure and algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Proc. of the 8th International Conference on Database Theory (2001)

    Google Scholar 

  2. Agrawal, R., Gehrke, J., et al.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), Seattle, WA, pp. 94–105 (June 1998)

    Google Scholar 

  3. Fern, X.Z., Brodley, C.E.: Random Projection for High Dimensional Data Clustering: A Clustering Ensemble Approach. In: Proc. 20th Int. Conf. On Machine Learning (ICML 2003), Washington DC (2003)

    Google Scholar 

  4. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Higher Education Press. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  5. Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 506–515 (2000)

    Google Scholar 

  6. Nagesh, H., Goil, S., Choudhary, A.: MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets. Technical Report 9906-010, Northwestern University (June 1999)

    Google Scholar 

  7. Procopiuc, M., Jones, M., Agarwal, P., Murali, T.M.: A Monte-Carlo Algorithm for Fast Projective Clustering. In: Proc. of the 2002 International Conference on Management of Data (2002)

    Google Scholar 

  8. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Machine Learning Research 3, 583–617 (2002)

    Article  MathSciNet  Google Scholar 

  9. Zait, M., Messatfa, H.: A comparative study of clustering methods. Future Generation Computer Systems 13, 149–159 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, Y., Zhang, C., Zhang, S., Zhao, L. (2006). Adapting K-Means Algorithm for Discovering Clusters in Subspaces. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_6

Download citation

  • DOI: https://doi.org/10.1007/11610113_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics