Advertisement

Cluster Computing

, Volume 22, Supplement 3, pp 6171–6179 | Cite as

Attribute weights-based clustering centres algorithm for initialising K-modes clustering

  • Liwen Peng
  • Yongguo LiuEmail author
Article
  • 127 Downloads

Abstract

The K-modes algorithm based on partitional clustering technology is a very popular and effective clustering method; moreover, it handles categorical data. However, the performance of the K-modes method is largely affected by the initial clustering centres. Random selection of the initial clustering centres commonly leads to non-repeatable clustering result. Hence, suitable choice of the initial clustering centres is crucial to realizing high-performance K-modes clustering. The present article develops an initialisation algorithm for K-modes. At initialisation, the distance between two instances calculated after weighting the attributes of the instances. Many studies have shown that if clustering is based only on distances or density between the instances, the clustering revolves around one centre or the outliers. Therefore, based on the attribute weights, we combine the distance and density measures to select the clustering centres. In experiments on several UCI machine learning repository benchmark datasets, the new initialisation method outperformed the existing K-modes clustering methods.

Keywords

Clustering centers Weight Density Distance 

Notes

Acknowledgements

This research is supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant 2017ZX10105003-002, the National Key Research and Development Program of China under Grant 2017YFC1703900, and the Sichuan Science and Technology Program under Grant 2018PTDJ0084.

References

  1. 1.
    Matas, J., Kittler, J.: Spatial and feature space clustering: applications in image analysis. In: International Conference on Computer Analysis of Images and Patterns, pp. 162–173. Springer, Berlin (1995)CrossRefGoogle Scholar
  2. 2.
    Hsu, C.C., Huang, Y.P.: Incremental clustering of mixed data based on distance hierarchy. Inf. Sci. 35(3), 1177–1185 (2008)Google Scholar
  3. 3.
    Anant, R., Sunita, J., Jalal, A.S., Aanjoy, K.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3(6), 1–4 (2011)Google Scholar
  4. 4.
    Bai, L., Liang, J., Sui, C., Dang, C.: Fast global k-means clustering based on local geometrical information. Inf. Sci. 245(10), 168–180 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRefGoogle Scholar
  6. 6.
    Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: 2017 IEEE Trustcom/bigdatase/ispa, pp. 791–798 (2017)Google Scholar
  7. 7.
    Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2016)CrossRefGoogle Scholar
  8. 8.
    Baby, V., Chandra, N.S.: Distributed threshold k-means clustering for privacy preserving data mining. In: 2016 IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 2286–2289 (2016)Google Scholar
  9. 9.
    Wazid, M., Das, A.K.: An efficient hybrid anomaly detection scheme using k-means clustering for wireless sensor networks. Wirel. Pers. Commun. 90(4), 1971–2000 (2016)CrossRefGoogle Scholar
  10. 10.
    Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. Data Min. Knowl. Discov. 3, 1–8 (1997)Google Scholar
  11. 11.
    Bai, T., Kulikowski, C.A.A., Gong, L., Yang, B., Huang, L., Zhou, C.: A global k-modes algorithm for clustering categorical data. Chin. J. Electron. 21(3), 460–465 (2012)Google Scholar
  12. 12.
    Khan, S.S., Ahmad, A.: Cluster center initialization for categorical data using multiple attribute clustering. MultiClust@ SDM (2012)Google Scholar
  13. 13.
    Li, T.Y., Chen, Y., Jin, Z.H., Li, Y.: Initialization of k-modes clustering for categorical data. In: 2013 IEEE International Conference on Management Science and Engineering, pp. 107–112 (2013)Google Scholar
  14. 14.
    Ali, D.S., Ghoneim, A., Saleh, M.: K-modes and entropy cluster centers initialization methods. In: International Conference on Operations Research and Enterprise Systems, pp. 447–454 (2017)Google Scholar
  15. 15.
    Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data. Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRefGoogle Scholar
  16. 16.
    Sun, Y., Zhu, Q., Chen, Z.: An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit. Lett. 23(7), 875–884 (2002)CrossRefGoogle Scholar
  17. 17.
    Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp. 91–99 (1998)Google Scholar
  18. 18.
    Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: DBLP, vol. 1, pp. 582–589 (2002)Google Scholar
  19. 19.
    Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)CrossRefGoogle Scholar
  20. 20.
    Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining, vol. 4426, pp. 972–980 (2007)Google Scholar
  21. 21.
    Bai, L., Liang, J., Dang, C.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)CrossRefGoogle Scholar
  22. 22.
    Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-modes clustering. Pattern Recognit. Lett. 40(18), 7444–7456 (2013)Google Scholar
  23. 23.
    Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)CrossRefGoogle Scholar
  24. 24.
    Mahajan, P., Kandwal, R., Vijay, R.: Rough set approach in machine learning: a review. Int. J. Comput. Appl. 56(10), 1–13 (1996)Google Scholar
  25. 25.
    Bai, L., Liang, J., Dang, C., Cao, F.: A cluster centers initialization method for clustering categorical data. Expert Syst. Appl. 39(9), 8022–8029 (2012)CrossRefGoogle Scholar
  26. 26.
    Wang, C., Chen, D., Wu, C., Hu, Q.: Data compression with homomorphism in covering information systems. Int. J. Approx. Reason. 52(4), 519–525 (2011)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Ntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artif. Intell. 106(1), 109–137 (1998)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Hoa, N.S., Son N.H.: Some efficient algorithms for rough set methods. In: 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 1541–1457 (2000)Google Scholar
  29. 29.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations