Abstract
Hadoop is an open-source utility which allows users to provide massive input in terms of data and facilitates the computation. Role of Hadoop in load balancing is enormous which allows the user to configure the network of nodes having master/slave nodes. Hadoop’s typical architecture takes into consideration the default configuration for the machine as homogeneous, but many of the real-time application or clusters of nodes will have the homogeneous configurations. Thus, an effort is made in this paper to consider the homogeneity of the nodes in clusters and build an efficient algorithm which does load balancing in an efficient way when compared with the default balancer of Hadoop which works well only on homogeneous nodes.
Similar content being viewed by others
Change history
28 September 2023
A Correction to this paper has been published: https://doi.org/10.1007/s42979-023-02168-3
References
An optimal task selection scheme for Hadoop scheduling S.Suresh, N.P.Gopalan Depatment of computer Applications, National institute of technology, Tiruchirapalli.
An Optimization Algorithm for Heterogeneous HadoopClusters Basedon Dynamic Load Balancing Wei Yan, ChunLin Li, ShuMeng Du, XiJun Mao Software Engineering Wuhan University of Technology No.1186, Heping Boulevard, Wuchang District, Wuhan, Hubei CHINA.
Cluster Computing at a Glance Mark Bakery and Rajkumar Buyya Division of Computer Science University of Portsmouth Southsea, Hants, UK z School of Computer Science and Software Engineering Monash University Melbourne, Australia.
HPCA: A Node Selection and Scheduling Method for Hadoop MapReduce Archana.G.K1, V.Deeban Chakravarthy2 1 II M.Tech, Department Of Computer Science and Engineering, SRM University, Chennai. 2 Assistant Professor, Department Of Computer Science and Engineering, SRM University, Chennai.
Hadoop/MapReduce Object-oriented framework presentation CSCI 5448 Casey McTaggart.
Hadoop MapReduce Scheduling Algorithms – A Survey Ms. Anjana Sharma Senior Assistant Professor, Computer Science and Engineering Department, New Horizon College of Engineering, Bangalore, India.
https://www.cloudera.com/documentation/enterprise/5-7-x/topics/admin_hdfs_balancer.html, https://www.oreilly.com/ideas/distributed-systems-a-quick-and-simple-definition.
Load balancing in MapReduce on homogeneous and heterogeneous clusters: an in-depth review Mohammad Javad Kargar and Meysam Vakili* Department of Computer Engineering, College of Engineering, University of Science and Culture, Tehran, Iran.
Load Balancing in MapReduce Environments for Data Intensive Applications Yang Liu1.
Maozhen Li,Nasullah Khalid Alham, Suhel Hammoud and Mahesh Ponraj, School of Engineering and Design, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK The Key Laboratory of Embedded Systems and Service Computing, Ministry of Education, Tongji University, China, http://www.cse.scu.edu/~mwang2/projects/CDH_installConfig1_13m.pdf.
Suresh, N.P. Gopalan. “An Optimal Task Selection Scheme for Hadoop Scheduling”, IERI Procedia, 2014.
White T. Hadoop: the definitive guide[M]. O’Reilly, 2012.
Wickham, Hadley (2011). The split-apply-combine strategy for data analysis. Journal of Statistical Software.
YARN, MapReduce 2.0, Hadoop clusters and the Big Data layer cake.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Computational Intelligence Paradigms and Applications” guest edited by Young Lee and S. Meenakshi Sundaram.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Manjula, K., Meenakshi Sundaram, S. Optimized Approach (SPCA) for Load Balancing in Distributed HDFS Cluster. SN COMPUT. SCI. 1, 102 (2020). https://doi.org/10.1007/s42979-020-0107-8
Published:
DOI: https://doi.org/10.1007/s42979-020-0107-8