Parallel K-prototypes for Clustering Big Data

  • Mohamed Aymen Ben HajKacem
  • Chiheb-Eddine Ben N’cir
  • Nadia Essoussi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9330)


Big data clustering has become an important challenge in data mining. Indeed, Big data are often characterized by a huge volume and a variety of attributes namely, numerical and categorical. To deal with these challenges, we propose the parallel k-prototypes method which is based on the Map-Reduce model. This method is able to perform efficient groupings on large-scale and mixed type of data. Experiments realized on huge data sets show the performance of the proposed method in clustering large-scale of mixed data.


Big data K-prototypes Map-reduce Mixed data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proceedings of the VLDB Endowment 5(7), 622–633 (2012)CrossRefGoogle Scholar
  2. 2.
    Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data K-means clustering using MapReduce. The Journal of Supercomputing 70(3), 1249–1259 (2014)CrossRefGoogle Scholar
  3. 3.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  4. 4.
    Gorodetsky, V.: Big data: opportunities, challenges and solutions. In: Ermolayev, V., Mayr, H.C., Nikitchenko, M., Spivakovsky, A., Zholtkevych, G. (eds.) ICTERI 2014. CCIS, vol. 469, pp. 3–22. Springer, Heidelberg (2014) Google Scholar
  5. 5.
    Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)CrossRefGoogle Scholar
  6. 6.
    Hadian, A., Shahrivari, S.: High performance parallel k-means clustering for disk-resident datasets on multi-core CPUs. The Journal of Supercomputing 69(2), 845–863 (2014)CrossRefGoogle Scholar
  7. 7.
    Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34 (1997)Google Scholar
  8. 8.
    Kim, Y., Shim, K., Kim, M.S., Lee, J.S.: DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce. Information Systems 42, 15–35 (2014)CrossRefGoogle Scholar
  9. 9.
    Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. Knowledge and Data Engineering 14(4), 673–690 (2002)CrossRefGoogle Scholar
  10. 10.
    Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on mapreduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  11. 11.
    Ludwig, S.A.: MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability. International Journal of Machine Learning and Cybernetics, 1–12 (2015)Google Scholar
  12. 12.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 14(1), 281–297 (1967)MathSciNetMATHGoogle Scholar
  13. 13.
    Vattani, A.: K-means requires exponentially many iterations even in the plane. Discrete Computational Geometry 45(4), 596–616 (2011)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Xu, X., Jger, J., Kriegel, H.P.: A fast parallel clustering algorithm for large spatial databases. High Performance Data Mining, 263–290 (2002)Google Scholar
  15. 15.
    Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. Cloud Computing, 674–679 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mohamed Aymen Ben HajKacem
    • 1
  • Chiheb-Eddine Ben N’cir
    • 1
  • Nadia Essoussi
    • 1
  1. 1.LARODECUniversité de Tunis, Institut Supérieur de Gestion de TunisLe BardoTunisia

Personalised recommendations