DenClust: A Density Based Seed Selection Approach for K-Means

Rahman, Md Anisur; Islam, Md Zahidul; Bossomaier, Terry

doi:10.1007/978-3-319-07176-3_68

Md Anisur Rahman²⁴,
Md Zahidul Islam²⁴ &
Terry Bossomaier²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8468))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2172 Accesses
6 Citations

Abstract

In this paper we present a clustering technique called DenClust that produces high quality initial seeds through a deterministic process without requiring an user input on the number of clusters k and the radius of the clusters r. The high quality seeds are given input to K-Means as the set of initial seeds to produce the final clusters. DenClust uses a density based approach for initial seed selection. It calculates the density of each record, where the density of a record is the number of records that have the minimum distances with the record. This approach is expected to produce high quality initial seeds for K-Means resulting in high quality clusters from a dataset. The performance of DenClust is compared with five (5) existing techniques namely CRUDAW, AGCUK, Simple K-means (SK), Basic Farthest Point Heuristic (BFPH) and New Farthest Point Heuristic (NFPH) in terms of three (3) external cluster evaluation criteria namely F-Measure, Entropy, Purity and two (2) internal cluster evaluation criteria namely Xie-Beni Index (XB) and Sum of Square Error (SSE). We use three (3) natural datasets that we obtain from the UCI machine learning repository. DenClust performs better than all five existing techniques in terms of all five evaluation criteria for all three datasets used in this study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bai, L., Liang, J., Dang, C.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowledge-Based Systems 24(6), 785–795 (2011)
Article Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson Addison Wesley (2005)
Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, pp. 21–34 (1997)
Google Scholar
Khan, F.: An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application. Applied Soft Computing 12(11), 3698–3700 (2012)
Article Google Scholar
Chuan Tan, S., Ming Ting, K., Wei Teng, S.: A general stochastic clustering method for automatic cluster discovery. Pattern Recognition 44(10-11), 2786–2799 (2011)
Article Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-Means. Pattern Recognition Letters 31(8), 651–666 (2010)
Article Google Scholar
Bagirov, A.M.: Modified global -means algorithm for minimum sum-of-squares clustering problems. Pattern Recognition 41(10), 3192–3199 (2008)
Article MATH Google Scholar
Maitra, R., Peterson, A., Ghosh, A.: A systematic evaluation of different methods for initializing the K-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering (2010)
Google Scholar
Rahman, M.A., Islam, M.Z.: CRUDAW: A Novel Fuzzy Technique for Clustering Records Following User Defined Attribute Weights. In: 10th Australasian Data Mining Conference (AusDM 2012), Sydney, Australia. CRPIT Series, vol. 134, pp. 27–42. ACS (2012)
Google Scholar
Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Applied Mathematics and Computation 218(4), 1267–1279 (2011)
Article MATH MathSciNet Google Scholar
He, Z.: Farthest-Point Heuristic based Initialization Methods for K-Modes Clustering. CoRR, abs/cs/0610043 (2006)
Google Scholar
Mukhopadhyay, A., Maulik, U.: Towards improving fuzzy clustering using support vector machine: Application to gene expression data. Pattern Recognition 42(11), 2744–2763 (2009)
Article MATH Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013), http://archive.ics.uci.edu/ml/
Rahman, M.A., Islam, M.Z.: Seed-Detective: A Novel Clustering Technique Using High Quality Seed for K-Means on Categorical and Numerical Attributes. In: 9th Australasian Data Mining Conference(AusDM 2011), Ballarat, Australia. CRPIT Series, vol. 121, pp. 211–220. ACS (2011)
Google Scholar
Giggins, H., Brankovic, L.: VICUS - A Noise Addition Technique for Categorical Data. In: 10th Australasian Data Mining Conference (AusDM 2012), December 4 - 7. CRPIT, vol. 134, pp. 139–148 (2012)
Google Scholar
Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Systems 30(0), 129–135 (2012)
Article Google Scholar
Wang, Y.: Approximating nearest neighbor among triangles in convex position. Information Processing Letters 108(6), 379–385 (2008)
Article MathSciNet Google Scholar
Nene, S.A., Nayar, S.K.: A simple algorithm for nearest neighbor search in high dimensions. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(9), 989–1003 (1997)
Article Google Scholar
Vaidya, P.M.: An O(n log n) Algorithm for the All-Nearest-Neighbors Problem. Discrete Computational Geometry 4(1), 101–115 (1989)
Article MATH MathSciNet Google Scholar
Kocamaz, U.E.: Increasing the efficiency of quicksort using a neural network based algorithm selection model. Information Sciences 229(0), 94–105 (2013)
Article MathSciNet Google Scholar
Yang, Y., Yu, P., Gan, Y.: Experimental Study on the Five Sort Algorithms. In: Second International Conference on Mechanic Automation and Control Engineering (MACE), pp. 1314–1317 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Research in Complex Systems, School of Computing and Mathematics, Charles Sturt University, Panorama Avenue, Bathurst, NSW, 2795, Australia
Md Anisur Rahman, Md Zahidul Islam & Terry Bossomaier

Authors

Md Anisur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Terry Bossomaier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski & Rafał Scherer &
Częstochowa University of Technology, 42-200, Częstochowa, Poland
Marcin Korytkowski
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, M.A., Islam, M.Z., Bossomaier, T. (2014). DenClust: A Density Based Seed Selection Approach for K-Means. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_68

Download citation

DOI: https://doi.org/10.1007/978-3-319-07176-3_68
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07175-6
Online ISBN: 978-3-319-07176-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics