Advances in Big Data and Cloud Computing pp 219-225 | Cite as
SBKMEDA: Sorting-Based K-Median Clustering Algorithm Using Multi-Machine Technique for Big Data
Abstract
Big Data is the term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information. Clustering is an essential tool for clustering Big Data. Multi-machine clustering technique is one of the very efficient methods used in the Big Data to mine and analyse the data for insights. K-Means partition-based clustering algorithm is one of the clustering algorithm used to cluster Big Data. One of the main disadvantage of K-Means clustering algorithms is the deficiency in randomly identifying the K number of clusters and centroids. This results in more number of iterations and increased execution times to arrive at the optimal centroid. Sorting-based K-Means clustering algorithm (SBKMA) using multi-machine technique is another method for analysing Big Data. In this method, the data is sorted first using Hadoop MapReduce and mean is taken as centroids. This paper proposes a new algorithm called as SBKMEDA: Sorting-based K-Median clustering algorithm using multi-machine technique for Big Data to sort the data and replace median with mean as centroid for better accuracy and speed in forming the cluster.
Keywords
Big Data Clustering K-Means algorithm Hadoop MapReduce SBKMAReferences
- 1.Jane, M., George Dharma Prakash Raj, E.: SBKMA: sorting based K-Means clustering algorithm using multi machine technique for Big Data. Int. J. Control Theory Appl. 8, 2105–2110 (2015)Google Scholar
- 2.Vrinda, Patil, S.: Efficient clustering of data using improved K-Means algorithm—a review. Imp. J. Interdiscip. Res. 2(1) (2016)Google Scholar
- 3.Patil, Y.S., Vaidya, M.B.: K-Means clustering with MapReduce technique. Int. J. Adv. Res. Comput. Commun. Eng. (2015)Google Scholar
- 4.Baswade, A.M., Nalwade, P.S.: Selection of initial centroids for K-Means Algorithm. IJCSMC 2(7), 161–164 (2013)Google Scholar
- 5.Vishnupriya, N., Sagayaraj Francis, F.: Data clustering using MapReduce for multidimensional datasets. Int. Adv. Res. J. Sci. Eng. Technol. (2015)Google Scholar
- 6.Gandhi, G., Srivastava, R.: Review paper: a comparative study on partitioning techniques of clustering algorithms. Int. J. Comput. Appl. (0975-8887) 87(9) (2014)Google Scholar
- 7.Bobade, V.B.: Survey paper on Big Data and Hadoop. Int. Res. J. Eng. Technol. (IRJET) 03(01) (2016)Google Scholar
- 8.Rauf, A., Sheeba, Mahfooz, S., Khusro, S., Javed, H.: Enhanced K-Mean clustering algorithm to reduce number of iterations and time complexity. Middle-East J. Sci. Res. 12(7), 959–963 (2012)Google Scholar
- 9.