A Density-Based k-Means++ Algorithm for Imbalanced Datasets Clustering

Fan, Linchuan; Chai, Yi; Li, Yanxia

doi:10.1007/978-981-32-9698-5_5

Linchuan Fan³⁷,
Yi Chai³⁷ &
Yanxia Li³⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 594))

Included in the following conference series:

Chinese Intelligent Systems Conference

948 Accesses
1 Citations

Abstract

k-means algorithm is popularly used as an effective clustering method. However, existing k-means algorithm usually performs poorly in imbalanced datasets. To address this problem, density-kmeans++ algorithm based on density distance is proposed in this paper. The proposed method incorporates density distance into traditional Euclidean distance-based k-means algorithm when clustering imbalanced dataset. The experimental results on UCI datasets and Western Reserve University Bearing Data indicates that density-kmeans++ has better ability to deal with imbalanced datasets than k-means++.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, New York
MATH Google Scholar
Macqueen J (1965) Some methods for classification and analysis of multivariate observations. In: Proceedings of berkeley symposium on mathematical statistics & probability
Google Scholar
Tian Z, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD international conference on management of data. https://doi.org/10.1145/233269.233324
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery & data mining
Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905. https://doi.org/10.1109/34.868688
Article Google Scholar
Pearson R, Goney G, Shwaber J (2003) Imbalanced clustering for microarray time-series. In: Proceedings of the ICML, ICML, Washington DC, vol 3
Google Scholar
Chen L, Cai Z, Chen L, Gu Q (2010) A novel differential evolution-clustering hybrid resampling algorithm on imbalanced datasets. In: International conference on knowledge discovery & data mining. IEEE. https://doi.org/10.1109/WKDD.2010.48
Li X, Chen ZG, Yang F (2013) Exploring of clustering algorithm on class-imbalanced data. https://doi.org/10.1109/ICCSE.2013.6553890
Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211:172–181. https://doi.org/10.1016/j.neucom.2015.10.140 S092523121630577X
Article Google Scholar
Brown RA (2014) Building a balanced k-d tree in o(kn log n) time. Computer Science
Google Scholar
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The r*-tree: an efficient and robust access method for points and rectangles. ACM SIGMOD Rec 19(2):322–331. https://doi.org/10.1145/93605.98741
Article Google Scholar
Arthur D (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, 2007. ACM. https://doi.org/10.1145/1283383.1283494
UCI Machine Learning Repository. http://csegroups.case.edu/bearingdatacenter/home
Bearing Data Center. http://csegroups.case.edu/bearingdatacenter/home
Liu H, Zhou JZ, Xu YH, Zheng Y, Peng XL, Jiang W (2018) Unsupervised fault diagnosis of rolling bearings using a deep neural network based on generative adversarial networks. Neurocomputing 315:412–424. https://doi.org/10.1016/j.neucom.2018.07.034
Article Google Scholar

Download references

Author information

Authors and Affiliations

Chongqing University, Chongqing, 400044, China
Linchuan Fan, Yi Chai & Yanxia Li

Authors

Linchuan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Chai
View author publications
You can also search for this author in PubMed Google Scholar
Yanxia Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Chai .

Editor information

Editors and Affiliations

Beihang University, Beijing, China
Yingmin Jia
Beijing University of Posts and Telecommunications, Beijing, China
Junping Du
University of Science and Technology Beijing, Beijing, China
Weicun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, L., Chai, Y., Li, Y. (2020). A Density-Based k-Means++ Algorithm for Imbalanced Datasets Clustering. In: Jia, Y., Du, J., Zhang, W. (eds) Proceedings of 2019 Chinese Intelligent Systems Conference. CISC 2019. Lecture Notes in Electrical Engineering, vol 594. Springer, Singapore. https://doi.org/10.1007/978-981-32-9698-5_5

Download citation

DOI: https://doi.org/10.1007/978-981-32-9698-5_5
Published: 08 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9697-8
Online ISBN: 978-981-32-9698-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics