Abstract
Clustering algorithms are popular algorithms used in various fields of science and engineering and technologies. The k-means is example unsupervised clustering algorithm used in various applications such as medical images clustering, gene data clustering etc. There is huge research work done on basic k-means clustering algorithm for its enhancement. But researchers focused only on some of the limitations of k-means. This paper studied some of literatures on improved k-means algorithms, summarized their shortcomings and identified scope for further enhancement to make it more scalable and efficient for large data. From the literatures this paper studied distance, validity and stability measures, algorithms for initial centroids selection and algorithms to decide value of k. Then proposing objectives and guidelines for enhanced scalable clustering algorithm. Also suggesting method to avoid outliers using concept of semantic analysis and AI.
References
Robert Harrison, Phang C., Zhong, Gulsah Altun, Tai, and Yi Pan: Improved k-means clustering algo. For exploring protein sequence motifs representing common structural property, ieee trans on nanobio, vol 4, no 3 (2005).
V. N. Manjunath Aradhya, M. S. Pavithra: An application of k -means clustering for improving video, text detection, advances in intelligent systems and computing volume 182, 2013, pp 41–47(2012).
C. Rajalaxmi, K. P. Soman, S. Padmavathi: Texel identification using k-means clustering method, advances in intelligent systems and computing volume 167, pp 285–294(2012).
Dimitrios Charalampidis: A modified k-means algorithm for circular invariant clustering, ieee trans, on patrn anlyss, vol. 27, no. 12 (2005).
Kunchev and Dmitry P. Vetrov, Ludmila I.: Evaluation of stability of k-means cluster ensembles with respect to random initialization, ieee tran. On patrn. analys. and machine intelligence, vol. 28, no. 11 (2006).
Wenyuan Li, Wee-Keong NG, Ying Liu, Member, and Kok-Leong Ong: Enhancing the effectiveness of clustering with spectra analysis, ieee trans on knowledge and data engineering, vol. 19, no. 7 (2007).
Sanghamitra Bandyopadhyay and Sriparna Saha: A point symmetry-based clustering technique for automatic evolution of clusters, ieee transactions on knowledge and data engineering, vol. 20, no. 11 (2008).
Yi Hong and Sam Kwong: Learning the assignment order of instances for the constrained k-means clustering algorithm, ieee trans on systems, man, and cybernetics—part b: cybernetics, vol. 39, no. 2(2009).
Sanghamitra Bandyopadhyay, and Sriparna Saha: Performance evaluation of some symmetry- based cluster validity indexes, ieee transactions on systems, man, and cybernetics—part c: vol. 39, no. 4 (2009).
Pawan Lingras, Min Chen, and Duoqian Miao: Rough cluster quality index based on decision theory, ieee trans. on knowldge and data engineering, vol. 21, no. 7(2009).
Nor Ashidi Mat Isa, Samy A. Salamah, Umi Kalthum Ngah: This is an adaptive fuzzy moving algorithm of k-means clustering for image segmentation, ieee trans on consumer electronics, vol. 55, no. 4 (2009).
Juntao Wang, Xiaolong Su: An improved k-means clustering algorithm, comm. software and networks ieee 3rd international conference (2011).
Kong Dexi, Kong Rui: A fast and effective kernel-based k-means clustering algorithm, ieee conf on intelligent system design and engg. Applications (2013).
Jiye Liang, Liang Bai, Chuangyin Dang, and Fuyuan Cao: The k-means-type algorithms versus imbalanced data distributions, ieee trans. on fuzzy systems, vol. 20, no. 4 (2012).
Partha Sarathi Bishnu and Vandana Bhattacherjee: Software fault prediction using quad tree-based k-means clustering algorithm, ieee trans on knowg and data engg, vol. 24, no. 6 (2012).
Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang, and Yunming Ye: TW-k-means: automated two-level variable weighting clustering algorithm for multiview data, ieee trans. on knowlge and data engi, vol. 25, no. 4 (2013).
Jie Cao, Zhiang Wu, Junjie Wu, Member, Ieee, and Hui Xiong: Sail: Summation-based incremental learning for information-theoretic text clustering, ieee trans on cybernetics, vol. 43, no. 2 (2013).
Rui Máximo Esteves, Thomas Hacker, Chunming Rong: Competitive k-means-a new accurate and distributed k-means algorithm for large datasets, ieee international conference on cloud computing technology and science (2013).
Zhiwen Yu, Hongsheng Chen, Jane You, Hau-San Wong, Jiming Liu, Le Li, and Guoqiang Han: Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles, ieee/acm trans on computational biology and bioimrmtcs, vol. 11, no. 4 (2014).
Kazuki Ichikawa and Shinichi Morishita: A simple and powerful heuristic method for accelerating k-means clustering of large-scale data in life science, ieee/acm trans compt biology and bioinf, vol. 11, no. 4(2014).
Qinpei Zhao and Pasi Fränti: Centroid ratio for a pairwise random swap clustering algorithm, ieee trans on knowld and data engg,. V. 26, no. 5 (2014).
Hongyan Cui, Mingzhi Xie, Yunlong Cai, Xu Huang, Yunjie Liu: Cluster validity index for adaptive clustering algorithms, iet commun., vol. 8, iss. 13(2014).
Liping Jing, Michael K. Ng, and Joshua Zhexue Huang: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data, ieee trans. on knowledge and data engg, vol. 19, no. 8 (2007).
Chuan Ming Chen, Dechang Pi and Zhuoran Fang: Artificial immune k-means grid- density clustering algorithm for real-time monitoring and analysis of urban traffic”, electronics letters vol. 49 no. 20 pp. 1272–1273 (2013).
Hoel Le Capitaine and Carl Fr´elicot: A cluster-validity index combining an overlap measure and a separation measure based on fuzzy-aggregation operators, ieee trans on fuzzy systems, vol. 19, no. 3 (2011).
Pilsung Kang, Sungzoon Cho: k-means clustering seeds initialization based on centrality, sparsity, and isotropy, lncs vol 5788, 2009, pp 109–117(2009).
Fasahat Ullah Siddiqui and Nor Ashidi Mat Isa: Enhanced moving k-means algo. for image segmentation”, ieee trans. On cons electrcs, vl. 57, no 2 (2011).
Jonathon K. Parker, and Lawrence O. Hall: Accelerating fuzzy-c means using an estimated subsample size”, ieee trans on fuzzy system, vo. 22, no. 5(2014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Khandare, A., Alvi, A.S. (2016). Survey of Improved k-means Clustering Algorithms: Improvements, Shortcomings and Scope for Further Enhancement and Scalability. In: Satapathy, S.C., Mandal, J.K., Udgata, S.K., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 434. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2752-6_48
Download citation
DOI: https://doi.org/10.1007/978-81-322-2752-6_48
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2750-2
Online ISBN: 978-81-322-2752-6
eBook Packages: EngineeringEngineering (R0)