Abstract
Data Clustering is a descriptive data mining task of finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups [5]. The motivation behind this research paper is to explore KMeans partitioning algorithm in the currently available parallel architecture using parallel programming models. Parallel KMeans algorithms have been implemented for a shared memory model using OpenMP programming and distributed memory model using MPI programming. A hybrid version of OpenMP in MPI programming also has been experimented. The performance of the parallel algorithms were analysed to compare the speedup obtained and to study the Amdhals effect. The computational time of hybrid method was reduced by 50% compared to MPI and was also more efficient with balanced load.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wu, C.-C., Lai, L.-F., Yang, C.-T., Chiu, P.-H.: Using hybrid MPI and OpenMP programming to optimize communications in parallel loop selfscheduling schemes for multicore PC clusters. Journal of Supercomputing (2009), doi: 10.1007
Skillicorn, D.B.: Strategies for parallel data mining. IEEE Concurrency 7, 26–35 (1999)
Datta, S., Gianella, C.R., Kargupta, H.: Approximate distributed k-means clustering over a peer-to-peer network. IEEE Transactions on Knowledge and Data Engineering 21(10), 1372–1388 (2009)
Dhillon, I., Modha, D.: A Data Clustering algorithm on distributed memory multiprocessors. IEEE Transactions on Knowledge and Data Engineering (KDD 1999), 47–56 (1999)
Han, J., Kamber, M.: Data Mining:Concepts and Techniques, 2nd edn. Morgan Kaufmanm, San Francisco (2006)
http://archive.ics.uci.edu/mll UC Irvine Machine Learning Repository
Jin, R., Goswami, A., Agrawal, G.: Fast and Exact out of core and distributed KMeans clustering. Knowledge and Information Systems, 17–40 (2006)
Quinn, M.J.: Parallel Programming in C with MPI and OpenMP. Tata Mc- Graw Hill (2003)
Rao, S.N.T., Prasad, E.V., Venkatehwarulu, A.: Critical Performance Study of Memory Mapping on Multi-Core Processors: An Experiment with k-means Algorithm with Large Data Mining Data Sets . International Journal of Computer Applications (0975 8887)Â 1(9) (2010)
Zhang, X.Z., Mao, J., Ling Ou, L.: The Study of Parallel KMeans Algorithm. In: Proceedings of the 6th WCAIAC, pp. 5868–5871. IEEE, Los Alamitos (2006)
Zhou, J., Liu, Z.: Distributed Clustering Based on K-means and CPGA. In: Proceedings of FSKD(2), pp. 444–447. IEEE, Los Alamitos (2008), doi:10.1109
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mohanavalli, S., Jaisakthi, S.M., Aravindan, C. (2011). Strategies for Parallelizing KMeans Data Clustering Algorithm. In: Das, V.V., Thomas, G., Lumban Gaol, F. (eds) Information Technology and Mobile Communication. AIM 2011. Communications in Computer and Information Science, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20573-6_76
Download citation
DOI: https://doi.org/10.1007/978-3-642-20573-6_76
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20572-9
Online ISBN: 978-3-642-20573-6
eBook Packages: Computer ScienceComputer Science (R0)