Skip to main content
Log in

An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining

  • Research Article - Special Issue - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

In recent days, mining data in the form of information and knowledge from large databases is one of the demanding and task. Finding similarity between different attributes in a synthetic dataset is an aggressive concept in data retrieval applications. For this purpose, some of the clustering techniques are proposed in the existing works such as k-means, fuzzy c-means, and fuzzy k-means. But it has some drawbacks that include high overhead, less effective results, computation complexity, high time consumption, and memory utilization. To overcome these drawbacks, a similarity-based categorical data clustering technique is proposed. Here, the similarities of inter- and intra-attributes are simultaneously calculated and it is integrated to improve the performance. The dataset loaded as input, where the preprocessing is performed to remove the noise. Once the data are noise free, the similarity between the elements is computed; then, the most relevant attributes are selected and the insignificant attributes are neglected. The support and confidence measures are estimated by applying association rule mining for resource planning. The similarity-based K-medoids clustering technique is used to cluster the attributes based on the Euclidean distance to reduce the overhead. Finally, the bee colony (BC) optimization technique is used to select the optimal features for further use. In experiments, the results of the proposed clustering system are estimated and analyzed with respect to the clustering accuracy, execution time (s), error rate, convergence time (s), and adjusted Rand index (ARI). From the results, it is observed that the proposed technique provides better results when compared to the other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Verma, A.; Kaur, I.; Kaur, A.: Algorithmic approach to data mining and classification techniques. Indian J. Sci. Technol. (IJST) (Association rule mining, classification, clustering, data, data mining, decision tree, neural network) 9(28), 1–22 (2016)

  2. Gayathri, S.; Mary, Metilda M.; Sanjai, Babu S.: A shared nearest neighbour density based clustering approach on a Proclus method to cluster high dimensional data. Indian J. Sci. Technol. (IJST) (Density based approach, high dimensional data, Proclus, SNN algorithm) 8(22), 1–6 (2015)

  3. Celebi, M.E.; Kingravi, H.A.; Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl. 40(1), 200–210 (2013)

    Article  Google Scholar 

  4. Ghosh, S.; Dubey, S.K.: Comparative analysis of k-means and fuzzy c-means algorithms. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(4), 35–39 (2013)

    Google Scholar 

  5. Velmurugan, T.: Performance based analysis between k-means and fuzzy C-means clustering algorithms for connection oriented telecommunication data. Appl. Soft Comput. 19, 134–46 (2014)

    Article  Google Scholar 

  6. Wang, C.; Dong, X.; Zhou, F.; Cao, L.; Chi, C.-H.: Coupled attribute similarity learning on categorical data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 781–97 (2015)

    Article  MathSciNet  Google Scholar 

  7. Joshi, A.; Kaur, R.: A review: comparative study of various clustering techniques in data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(3), 67–70 (2013)

  8. Mukhopadhyay, A.; Maulik, U.; Bandyopadhyay, S.; Coello, C.A.C.: Survey of multiobjective evolutionary algorithms for data mining: part II. IEEE Trans. Evolut. Comput. 18(1), 20–35 (2014)

    Article  Google Scholar 

  9. Arora, P.; Varshney, S.: Analysis of K-means and K-medoids algorithm for big data. Proced. Comput. Sci. 78, 507–512 (2016)

    Article  Google Scholar 

  10. Harikumar, S.; Surya, P.: K-medoid clustering for heterogeneous datasets. Proced. Comput. Sci. 70, 226–37 (2015)

    Article  Google Scholar 

  11. Choi, D.-W.; Chung, C.-W.: A K-partitioning algorithm for clustering large-scale spatio-textual data. Inf. Syst. 64, 1–11 (2017)

    Article  Google Scholar 

  12. Mei, J.-P.; Chen, L.: Fuzzy clustering with weighted medoids for relational data. Pattern Recognit. 43(5), 1964–74 (2010)

    Article  MATH  Google Scholar 

  13. Galluccio, L.; Michel, O.; Comon, P.; Kliger, M.; Hero, A.O.: Clustering with a new distance measure based on a dual-rooted tree. Inf. Sci. 251, 96–113 (2013)

    Article  MathSciNet  Google Scholar 

  14. Jiang, B.; Pei, J.; Tao, Y.; Lin, X.: Clustering uncertain data based on probability distribution similarity. IEEE Trans. Knowl. Data Eng. 25(4), 751–63 (2013)

    Article  Google Scholar 

  15. Subbalakshmi, G.R.; Rao, S.K.M. (eds.) Evaluation of data mining strategies using fuzzy clustering in dynamic environment. In: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Springer, Berlin (2016)

  16. Raj, Y.S.; Rajan, A.P.; Charles, S.; Raj, S.A.J.: Clustering methods and algorithms in data mining: concepts and a study. J. Comput. Technol. 4(7), 8–11 (2015)

    Google Scholar 

  17. Zadegan, S.M.R.; Mirzaie, M.; Sadoughi, F.: Ranked k-medoids: a fast and accurate rank-based partitioning algorithm for clustering large datasets. Knowl. Based Syst. 39, 133–43 (2013)

    Article  Google Scholar 

  18. Sood, M.; Bansal, S.: K-medoids clustering technique using bat algorithm. Int. J. Appl. Inf. Syst. 5(8), 20–2 (2013)

    Google Scholar 

  19. Skabar, A.; Abdalgader, K.: Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans. Knowl. Data Eng. 25(1), 62–75 (2013)

    Article  Google Scholar 

  20. Kulkarni, B.M.; Kinariwala, S.: Review on fuzzy approach to sentence level text clustering. Int. J. Sci. Res. Educ. 3(06), 3845–3850 (2015)

    Google Scholar 

  21. Kameshwaran, K.; Malarvizhi, K.: Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol. 5(2), 2272–6 (2014)

    Google Scholar 

  22. Ghadiri, M.; Aghaee, A.; Baghshah, M.S.: Active distance-based clustering using K-medoids. (2015). arXiv preprint arXiv:1512.03953 [cs.LG]

  23. Li, Y.; Hsu, B.-J.P.; Zhai, C.; Wang, K. (eds.) Mining entity attribute synonyms via compact clustering. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, ACM (2013)

  24. Guo, G.; Zhang, J.; Yorke-Smith, N.: Leveraging multiviews of trust and similarity to enhance clustering-based recommender systems. Knowl. Based Syst. 74, 14–27 (2015)

    Article  Google Scholar 

  25. Balabantaray, R.C.: Sarma, C.; Jha, M.: Document clustering using K-means and K-medoids. (2015). arXiv preprint arXiv:1502.07938 [cs.IR]

  26. Grossi, V.; Monreale, A.; Nanni, M.; Pedreschi, D.; Turini, F. (eds.) Clustering formulation using constraint optimization. In: International Conference on Software Engineering and Formal Methods. Springer (2015)

  27. Nguyen, T.T.; Nguyen, Q.V.H.; Weidlich, M.; Aberer, K. (eds.) Result selection and summarization for web table search. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), IEEE (2015)

  28. Zhang, D.; Luo, K.: Clustering algorithm based on artificial bee colony optimization. In: International Conference on Applied Science and Engineering Innovation (ASEI) (2015)

  29. Ozturk, C.; Hancer, E.; Karaboga, D.: Dynamic clustering with improved binary artificial bee colony algorithm. Appl. Soft Comput. 28, 69–80 (2015)

    Article  Google Scholar 

  30. Djenouri, Y.; Drias, H.; Habbas, Z.: Bees swarm optimisation using multiple strategies for association rule mining. Int. J. Bioinspir. Comput. 6(4), 239–49 (2014)

    Article  Google Scholar 

  31. Karaboga, D.; Gorkemli, B.; Ozturk, C.; Karaboga, N.: A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2014)

    Article  Google Scholar 

  32. Becker, B.: Adult data set (2015). https://archive.ics.uci.edu/ml/datasets/adult

  33. Bain, M.; Hoff, A.V. Chess (King-Rook vs. King) data set (2015). https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King)

  34. Tromp J.: Connect-4 data set (2015). https://archive.ics.uci.edu/ml/datasets/Connect-4

  35. Cheung, Y.-M.; Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–38 (2013)

    Article  MATH  Google Scholar 

  36. Cao, F.; Liang, J.; Li, D.; Zhao, X.: A weighting k modes algorithm for subspace clustering of categorical data. Neurocomputing 108, 23–30 (2013)

    Article  Google Scholar 

  37. Vora, P.; Oza, B.: A survey on k mean clustering and particle swarm optimization. Int. J. Sci. Mod. Eng. (IJISME) 1(3), 24–26 (2013)

    Google Scholar 

  38. Mirjalili, S.; Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Surya Narayana.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Surya Narayana, G., Vasumathi, D. An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining. Arab J Sci Eng 43, 3979–3992 (2018). https://doi.org/10.1007/s13369-017-2761-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-017-2761-2

Keywords

Navigation