Skip to main content

Strategies for Parallelizing KMeans Data Clustering Algorithm

  • Conference paper
Information Technology and Mobile Communication (AIM 2011)

Abstract

Data Clustering is a descriptive data mining task of finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups [5]. The motivation behind this research paper is to explore KMeans partitioning algorithm in the currently available parallel architecture using parallel programming models. Parallel KMeans algorithms have been implemented for a shared memory model using OpenMP programming and distributed memory model using MPI programming. A hybrid version of OpenMP in MPI programming also has been experimented. The performance of the parallel algorithms were analysed to compare the speedup obtained and to study the Amdhals effect. The computational time of hybrid method was reduced by 50% compared to MPI and was also more efficient with balanced load.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, C.-C., Lai, L.-F., Yang, C.-T., Chiu, P.-H.: Using hybrid MPI and OpenMP programming to optimize communications in parallel loop selfscheduling schemes for multicore PC clusters. Journal of Supercomputing (2009), doi: 10.1007

    Google Scholar 

  2. Skillicorn, D.B.: Strategies for parallel data mining. IEEE Concurrency 7, 26–35 (1999)

    Article  Google Scholar 

  3. Datta, S., Gianella, C.R., Kargupta, H.: Approximate distributed k-means clustering over a peer-to-peer network. IEEE Transactions on Knowledge and Data Engineering 21(10), 1372–1388 (2009)

    Article  Google Scholar 

  4. Dhillon, I., Modha, D.: A Data Clustering algorithm on distributed memory multiprocessors. IEEE Transactions on Knowledge and Data Engineering (KDD 1999), 47–56 (1999)

    Google Scholar 

  5. Han, J., Kamber, M.: Data Mining:Concepts and Techniques, 2nd edn. Morgan Kaufmanm, San Francisco (2006)

    Google Scholar 

  6. http://archive.ics.uci.edu/mll UC Irvine Machine Learning Repository

  7. http://www.openmp.org

  8. http://www-unix.mcs.anl.gov/mpi

  9. Jin, R., Goswami, A., Agrawal, G.: Fast and Exact out of core and distributed KMeans clustering. Knowledge and Information Systems, 17–40 (2006)

    Google Scholar 

  10. Quinn, M.J.: Parallel Programming in C with MPI and OpenMP. Tata Mc- Graw Hill (2003)

    Google Scholar 

  11. Rao, S.N.T., Prasad, E.V., Venkatehwarulu, A.: Critical Performance Study of Memory Mapping on Multi-Core Processors: An Experiment with k-means Algorithm with Large Data Mining Data Sets . International Journal of Computer Applications (0975 8887) 1(9) (2010)

    Google Scholar 

  12. Zhang, X.Z., Mao, J., Ling Ou, L.: The Study of Parallel KMeans Algorithm. In: Proceedings of the 6th WCAIAC, pp. 5868–5871. IEEE, Los Alamitos (2006)

    Google Scholar 

  13. Zhou, J., Liu, Z.: Distributed Clustering Based on K-means and CPGA. In: Proceedings of FSKD(2), pp. 444–447. IEEE, Los Alamitos (2008), doi:10.1109

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohanavalli, S., Jaisakthi, S.M., Aravindan, C. (2011). Strategies for Parallelizing KMeans Data Clustering Algorithm. In: Das, V.V., Thomas, G., Lumban Gaol, F. (eds) Information Technology and Mobile Communication. AIM 2011. Communications in Computer and Information Science, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20573-6_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20573-6_76

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20572-9

  • Online ISBN: 978-3-642-20573-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics