An Accelerated MapReduce-Based K-prototypes for Big Data

HajKacem, Mohamed Aymen Ben; N’cir, Chiheb-Eddine Ben; Essoussi, Nadia

doi:10.1007/978-3-319-50230-4_2

Mohamed Aymen Ben HajKacem¹⁶,
Chiheb-Eddine Ben N’cir¹⁶ &
Nadia Essoussi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9946))

Included in the following conference series:

Federation of International Conferences on Software Technologies: Applications and Foundations

866 Accesses
1 Citations

Abstract

Big data are often characterized by a huge volume and a variety of attributes namely, numerical and categorical. To address this issue, this paper proposes an accelerated MapReduce-based k-prototypes method. The proposed method is based on pruning strategy to accelerate the clustering process by reducing the unnecessary distance computations between cluster centers and data points. Experiments performed on huge synthetic and real data sets show that the proposed method is scalable and improves the efficiency of the existing MapReduce-based k-prototypes method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)
Article Google Scholar
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endowment 5(7), 622–633 (2012)
Article Google Scholar
Ben Haj Kacem, M.A., Ben N’cir, C.E., Essoussi, N.: MapReduce-based k-prototypes clustering method for big data. In: Proceedings of Data Science and Advanced Analytics, pp. 1–7(2015)
Google Scholar
Cui, X., Zhu, P., Yang, X., Li, K., Ji, C.: Optimized big data k-means clustering using mapReduce. J. Supercomput. 70(3), 1249–1259 (2014)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Article Google Scholar
Gorodetsky, V.: Opportunities, challenges and solutions. In: Information and Communication Technologies in Education, Research, and Industrial Applications, pp. 3–22
Google Scholar
Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590–596 (2013)
Article Google Scholar
Hadian, A., Shahrivari, S.: High performance parallel k-means clustering for disk-resident datasets on multi-core CPUs. J. Supercomput. 69(2), 845–863 (2014)
Article Google Scholar
Hamerly, G., Drake, J. Accelerating Lloyd’s algorithm for k-means clustering. In: Partitional Clustering Algorithms, pp. 41–78 (2015)
Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21–34(1997)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Kim, Y., Shim, K., Kim, M.S., Lee, J.S.: DBCURE-MR: an efficient density-based clustering algorithm for large data using mapReduce. Inf. Syst. 42, 15–35 (2014)
Article Google Scholar
Li, C., Biswas, G.: Unsupervised learning with mixed numeric and nominal data. Knowl. Data Eng. 14(4), 673–690 (2002)
Article Google Scholar
Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient k-means clustering algorithm on mapReduce. In: Proceedings of Database Systems for Advanced Applications, pp. 357–371 (2014)
Google Scholar
Ludwig, S.A.: MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int. J. Mach. Learn. Cybern. 6(6), 923–934 (2015)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, no. 1, pp. 281–297 (1967)
Google Scholar
Shahrivari, S., Jalili, S.: Single-pass and linear-time k-means clustering based on mapReduce. Inf. Syst. 60, 1–12 (2016)
Article Google Scholar
Vattani, A.: K-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011)
Article MathSciNet MATH Google Scholar
Xu, R., Wunsch, D.C.: Clustering algorithms in biomedical research: a review. Biomed. Eng. IEEE Rev. 3, 120–154 (2010)
Article Google Scholar
Xu, X., Jäger, J., Kriegel, H.P.: A fast parallel clustering algorithm for large spatial databases. In: High Performance Data Mining, pp. 263–290 (2002)
Google Scholar
Zhao, W., Ma, H., He, Q. Parallel k-means clustering based on mapReduce. In: Proceedings of Cloud Computing, pp. 674–679 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

LARODEC, Université de Tunis, Institut Supérieur de Gestion de Tunis, 41 Avenue de la Liberté, Cité Bouchoucha, 2000, Le Bardo, Tunisia
Mohamed Aymen Ben HajKacem, Chiheb-Eddine Ben N’cir & Nadia Essoussi

Authors

Mohamed Aymen Ben HajKacem
View author publications
You can also search for this author in PubMed Google Scholar
Chiheb-Eddine Ben N’cir
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Essoussi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Aymen Ben HajKacem .

Editor information

Editors and Affiliations

Dipartimento di Informatica, Universita di Pisa, Pisa, Italy
Paolo Milazzo
Budapest University of Technology and Economics, Budapest, Hungary
Dániel Varró
Vienna University of Technology, Vienna, Austria
Manuel Wimmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

HajKacem, M.A.B., N’cir, CE.B., Essoussi, N. (2016). An Accelerated MapReduce-Based K-prototypes for Big Data. In: Milazzo, P., Varró, D., Wimmer, M. (eds) Software Technologies: Applications and Foundations. STAF 2016. Lecture Notes in Computer Science(), vol 9946. Springer, Cham. https://doi.org/10.1007/978-3-319-50230-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-50230-4_2
Published: 01 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50229-8
Online ISBN: 978-3-319-50230-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics