Strategies for Parallelizing KMeans Data Clustering Algorithm

Mohanavalli, S.; Jaisakthi, S. M.; Aravindan, C.

doi:10.1007/978-3-642-20573-6_76

S. Mohanavalli⁴,
S. M. Jaisakthi⁴ &
C. Aravindan⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 147))

Included in the following conference series:

International Conference on Advances in Information Technology and Mobile Communication

2148 Accesses
4 Citations

Abstract

Data Clustering is a descriptive data mining task of finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups [5]. The motivation behind this research paper is to explore KMeans partitioning algorithm in the currently available parallel architecture using parallel programming models. Parallel KMeans algorithms have been implemented for a shared memory model using OpenMP programming and distributed memory model using MPI programming. A hybrid version of OpenMP in MPI programming also has been experimented. The performance of the parallel algorithms were analysed to compare the speedup obtained and to study the Amdhals effect. The computational time of hybrid method was reduced by 50% compared to MPI and was also more efficient with balanced load.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, C.-C., Lai, L.-F., Yang, C.-T., Chiu, P.-H.: Using hybrid MPI and OpenMP programming to optimize communications in parallel loop selfscheduling schemes for multicore PC clusters. Journal of Supercomputing (2009), doi: 10.1007
Google Scholar
Skillicorn, D.B.: Strategies for parallel data mining. IEEE Concurrency 7, 26–35 (1999)
Article Google Scholar
Datta, S., Gianella, C.R., Kargupta, H.: Approximate distributed k-means clustering over a peer-to-peer network. IEEE Transactions on Knowledge and Data Engineering 21(10), 1372–1388 (2009)
Article Google Scholar
Dhillon, I., Modha, D.: A Data Clustering algorithm on distributed memory multiprocessors. IEEE Transactions on Knowledge and Data Engineering (KDD 1999), 47–56 (1999)
Google Scholar
Han, J., Kamber, M.: Data Mining:Concepts and Techniques, 2nd edn. Morgan Kaufmanm, San Francisco (2006)
Google Scholar
http://archive.ics.uci.edu/mll UC Irvine Machine Learning Repository
http://www.openmp.org
http://www-unix.mcs.anl.gov/mpi
Jin, R., Goswami, A., Agrawal, G.: Fast and Exact out of core and distributed KMeans clustering. Knowledge and Information Systems, 17–40 (2006)
Google Scholar
Quinn, M.J.: Parallel Programming in C with MPI and OpenMP. Tata Mc- Graw Hill (2003)
Google Scholar
Rao, S.N.T., Prasad, E.V., Venkatehwarulu, A.: Critical Performance Study of Memory Mapping on Multi-Core Processors: An Experiment with k-means Algorithm with Large Data Mining Data Sets . International Journal of Computer Applications (0975 8887) 1(9) (2010)
Google Scholar
Zhang, X.Z., Mao, J., Ling Ou, L.: The Study of Parallel KMeans Algorithm. In: Proceedings of the 6th WCAIAC, pp. 5868–5871. IEEE, Los Alamitos (2006)
Google Scholar
Zhou, J., Liu, Z.: Distributed Clustering Based on K-means and CPGA. In: Proceedings of FSKD(2), pp. 444–447. IEEE, Los Alamitos (2008), doi:10.1109
Google Scholar

Download references

Author information

Authors and Affiliations

SSN College of Engineering, Chennai, India
S. Mohanavalli, S. M. Jaisakthi & C. Aravindan

Authors

S. Mohanavalli
View author publications
You can also search for this author in PubMed Google Scholar
S. M. Jaisakthi
View author publications
You can also search for this author in PubMed Google Scholar
C. Aravindan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Engineers Network, Trivandrum, Kerala, India
Vinu V Das
MES College of Engineering, Kuttippuram, Kerala, India
Gylson Thomas
Binus University, Jakarta, Indonesia
Ford Lumban Gaol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohanavalli, S., Jaisakthi, S.M., Aravindan, C. (2011). Strategies for Parallelizing KMeans Data Clustering Algorithm. In: Das, V.V., Thomas, G., Lumban Gaol, F. (eds) Information Technology and Mobile Communication. AIM 2011. Communications in Computer and Information Science, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20573-6_76

Download citation

DOI: https://doi.org/10.1007/978-3-642-20573-6_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20572-9
Online ISBN: 978-3-642-20573-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics