Adapting K-Means Algorithm for Discovering Clusters in Subspaces

Zhao, Yanchang; Zhang, Chengqi; Zhang, Shichao; Zhao, Lianwei

doi:10.1007/11610113_6

Adapting K-Means Algorithm for Discovering Clusters in Subspaces

Yanchang Zhao²¹,
Chengqi Zhang²¹,
Shichao Zhang²¹ &
…
Lianwei Zhao²²

Conference paper

829 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Abstract

Subspace clustering is a challenging task in the field of data mining. Traditional distance measures fail to differentiate the furthest point from the nearest point in very high dimensional data space. To tackle the problem, we design minimal subspace distance which measures the similarity between two points in the subspace where they are nearest to each other. It can discover subspace clusters implicitly when measuring the similarities between points. We use the new similarity measure to improve traditional k-means algorithm for discovering clusters in subspaces. By clustering with low-dimensional minimal subspace distance first, the clusters in low-dimensional subspaces are detected. Then by gradually increasing the dimension of minimal subspace distance, the clusters get refined in higher dimensional subspaces. Our experiments on both synthetic data and real data show the effectiveness of the proposed similarity measure and algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Proc. of the 8th International Conference on Database Theory (2001)
Google Scholar
Agrawal, R., Gehrke, J., et al.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1998), Seattle, WA, pp. 94–105 (June 1998)
Google Scholar
Fern, X.Z., Brodley, C.E.: Random Projection for High Dimensional Data Clustering: A Clustering Ensemble Approach. In: Proc. 20th Int. Conf. On Machine Learning (ICML 2003), Washington DC (2003)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Higher Education Press. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, pp. 506–515 (2000)
Google Scholar
Nagesh, H., Goil, S., Choudhary, A.: MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets. Technical Report 9906-010, Northwestern University (June 1999)
Google Scholar
Procopiuc, M., Jones, M., Agarwal, P., Murali, T.M.: A Monte-Carlo Algorithm for Fast Projective Clustering. In: Proc. of the 2002 International Conference on Management of Data (2002)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Machine Learning Research 3, 583–617 (2002)
Article MathSciNet Google Scholar
Zait, M., Messatfa, H.: A comparative study of clustering methods. Future Generation Computer Systems 13, 149–159 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, University of Technology, Sydney, Australia
Yanchang Zhao, Chengqi Zhang & Shichao Zhang
Dept. of Computer Science, Beijing Jiaotong University, Beijing, 100044, China
Lianwei Zhao

Authors

Yanchang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chengqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lianwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y., Zhang, C., Zhang, S., Zhao, L. (2006). Adapting K-Means Algorithm for Discovering Clusters in Subspaces. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_6

Download citation

DOI: https://doi.org/10.1007/11610113_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics