Research on large data set clustering method based on MapReduce
- 117 Downloads
The similarities and differences between the K-means algorithm and the Canopy algorithm’s MapReduce implementation are described in detail, and the possibility of combining the two to design a better algorithm suitable for clustering analysis of large data sets is analyzed in this paper. Different from the previous literature’s improvement ideas for K-means algorithm, it proposes new ideas for sampling and analyzes the selection of relevant thresholds in this paper. Finally, it introduces the MapReduce implementation framework based on Canopy partitioning and filtering K-means algorithm and analyzes some pseudocode in this chapter. Finally, it briefly analyzes the time complexity of the algorithm in this paper.
KeywordsMapReduce Large data Set clustering method
This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, the Science and Technology Research Project of Chongqing Municipal Education Commission of China (No. KJ1601401), the Science and Technology Research Project of Chongqing University of Education (No. KY201725C), Basic research and Frontier Exploration of Chongqing Science and Technology Commission (CSTC2014jcyjA40019), Project of Science and Technology Research Program of Chongqing Education Commission of China (No. KJZD-K201801601).
- 1.Alexey B, Dmytro I, Oleg R et al (2018) Constraints on decaying dark matter from XMM-Newton observations of M31. Mon Not R Astron Soc 387(4):1361–1373Google Scholar
- 14.Xiaoshan YU, Yangyang WU (2014) Parallel text hierarchical clustering based on MapReduce. J Comput Appl 34(6):1595–1599Google Scholar
- 15.Fan T (2017) Research and implementation of user clustering based on MapReduce in multimedia big data. Multimed Tools Appl 1:1–15Google Scholar
- 19.Hajkacem MAB, N’Cir CEB, Essoussi N (2017) One-pass MapReduce-based clustering method for mixed large scale data. J Intell Inf Syst 2:1–18Google Scholar