Skip to main content

DenClust: A Density Based Seed Selection Approach for K-Means

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8468))

Included in the following conference series:

Abstract

In this paper we present a clustering technique called DenClust that produces high quality initial seeds through a deterministic process without requiring an user input on the number of clusters k and the radius of the clusters r. The high quality seeds are given input to K-Means as the set of initial seeds to produce the final clusters. DenClust uses a density based approach for initial seed selection. It calculates the density of each record, where the density of a record is the number of records that have the minimum distances with the record. This approach is expected to produce high quality initial seeds for K-Means resulting in high quality clusters from a dataset. The performance of DenClust is compared with five (5) existing techniques namely CRUDAW, AGCUK, Simple K-means (SK), Basic Farthest Point Heuristic (BFPH) and New Farthest Point Heuristic (NFPH) in terms of three (3) external cluster evaluation criteria namely F-Measure, Entropy, Purity and two (2) internal cluster evaluation criteria namely Xie-Beni Index (XB) and Sum of Square Error (SSE). We use three (3) natural datasets that we obtain from the UCI machine learning repository. DenClust performs better than all five existing techniques in terms of all five evaluation criteria for all three datasets used in this study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bai, L., Liang, J., Dang, C.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowledge-Based Systems 24(6), 785–795 (2011)

    Article  Google Scholar 

  2. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson Addison Wesley (2005)

    Google Scholar 

  3. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, pp. 21–34 (1997)

    Google Scholar 

  4. Khan, F.: An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application. Applied Soft Computing 12(11), 3698–3700 (2012)

    Article  Google Scholar 

  5. Chuan Tan, S., Ming Ting, K., Wei Teng, S.: A general stochastic clustering method for automatic cluster discovery. Pattern Recognition 44(10-11), 2786–2799 (2011)

    Article  Google Scholar 

  6. Jain, A.K.: Data clustering: 50 years beyond K-Means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  7. Bagirov, A.M.: Modified global -means algorithm for minimum sum-of-squares clustering problems. Pattern Recognition 41(10), 3192–3199 (2008)

    Article  MATH  Google Scholar 

  8. Maitra, R., Peterson, A., Ghosh, A.: A systematic evaluation of different methods for initializing the K-means clustering algorithm. IEEE Transactions on Knowledge and Data Engineering (2010)

    Google Scholar 

  9. Rahman, M.A., Islam, M.Z.: CRUDAW: A Novel Fuzzy Technique for Clustering Records Following User Defined Attribute Weights. In: 10th Australasian Data Mining Conference (AusDM 2012), Sydney, Australia. CRPIT Series, vol. 134, pp. 27–42. ACS (2012)

    Google Scholar 

  10. Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Applied Mathematics and Computation 218(4), 1267–1279 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  11. He, Z.: Farthest-Point Heuristic based Initialization Methods for K-Modes Clustering. CoRR, abs/cs/0610043 (2006)

    Google Scholar 

  12. Mukhopadhyay, A., Maulik, U.: Towards improving fuzzy clustering using support vector machine: Application to gene expression data. Pattern Recognition 42(11), 2744–2763 (2009)

    Article  MATH  Google Scholar 

  13. Bache, K., Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013), http://archive.ics.uci.edu/ml/

  14. Rahman, M.A., Islam, M.Z.: Seed-Detective: A Novel Clustering Technique Using High Quality Seed for K-Means on Categorical and Numerical Attributes. In: 9th Australasian Data Mining Conference(AusDM 2011), Ballarat, Australia. CRPIT Series, vol. 121, pp. 211–220. ACS (2011)

    Google Scholar 

  15. Giggins, H., Brankovic, L.: VICUS - A Noise Addition Technique for Categorical Data. In: 10th Australasian Data Mining Conference (AusDM 2012), December 4 - 7. CRPIT, vol. 134, pp. 139–148 (2012)

    Google Scholar 

  16. Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Systems 30(0), 129–135 (2012)

    Article  Google Scholar 

  17. Wang, Y.: Approximating nearest neighbor among triangles in convex position. Information Processing Letters 108(6), 379–385 (2008)

    Article  MathSciNet  Google Scholar 

  18. Nene, S.A., Nayar, S.K.: A simple algorithm for nearest neighbor search in high dimensions. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(9), 989–1003 (1997)

    Article  Google Scholar 

  19. Vaidya, P.M.: An O(n log n) Algorithm for the All-Nearest-Neighbors Problem. Discrete Computational Geometry 4(1), 101–115 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  20. Kocamaz, U.E.: Increasing the efficiency of quicksort using a neural network based algorithm selection model. Information Sciences 229(0), 94–105 (2013)

    Article  MathSciNet  Google Scholar 

  21. Yang, Y., Yu, P., Gan, Y.: Experimental Study on the Five Sort Algorithms. In: Second International Conference on Mechanic Automation and Control Engineering (MACE), pp. 1314–1317 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rahman, M.A., Islam, M.Z., Bossomaier, T. (2014). DenClust: A Density Based Seed Selection Approach for K-Means. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_68

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07176-3_68

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07175-6

  • Online ISBN: 978-3-319-07176-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics