Parallel K-Means Clustering Based on MapReduce

  • Weizhong Zhao
  • Huifang Ma
  • Qing He
Conference paper

DOI: 10.1007/978-3-642-10665-1_71

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5931)
Cite this paper as:
Zhao W., Ma H., He Q. (2009) Parallel K-Means Clustering Based on MapReduce. In: Jaatun M.G., Zhao G., Rong C. (eds) Cloud Computing. CloudCom 2009. Lecture Notes in Computer Science, vol 5931. Springer, Berlin, Heidelberg

Abstract

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel k-means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

Keywords

Data mining Parallel clustering K-means Hadoop MapReduce 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Weizhong Zhao
    • 1
    • 2
  • Huifang Ma
    • 1
    • 2
  • Qing He
    • 1
  1. 1.The Key Laboratory of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of Sciences 
  2. 2.Graduate University of Chinese Academy of Sciences 

Personalised recommendations