Abstract
Hadoop has been regarded as the de-facto standard for handling data-intensive distributed applications with its popular storage and processing engine called as the Hadoop distributed File System (HDFS) and MapReduce. Hadoop’s inherent assumption of homogeneity in the cluster is a major cause of performance deterioration due to the huge shuffle required for the processing of data during map phase and reducer phase. This chapter addresses this performance deterioration by proposing a counter placement scheme (CPS) whose main contributions are enumerated as follows; (i) Profiling of nodes based on the completion of maps, (ii) Movement of high-performance nodes into a single rack for tracking higher computation, (iii) Data replacement strategy based on placing at least a single block of file in the rack with the highest computation, and (iv) Finally assigning reducers to the rack and node with highest computation. The experiments performed clearly signify the merits of CPS in terms of reduction in the average completion time, reduce time and off-local shuffle by about (1.9–22.83%), (2.1–21.5%), (4.25–24%) while running several benchmarks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Khan, N., Yaqoob, I., Hashem, I.A.T., Inayat, Z., Mahmoud Ali, W.K., Alam, M., Gani, A., et al.: Big data: survey, technologies, opportunities, and challenges. Sci. World J. (2014)
Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. 30(4), 431–448 (2018)
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Welcome to Apache Hadoop 2021. https://hadoop.apache.org/
White, T.: Hadoop: The Definitive Guide. O'Reilly Media, Inc. (2012)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Hussain, M.W., Reddy, K.H.K., Roy, D.S.: Resource aware execution of speculated tasks in Hadoop with SDN. Int. J. Adv. Sci. Technol. 28(13), 72–84 (2019)
Hussain, M.W., Reddy, K.H., Roy, D.S.: A counter based approach for reducer placement with augmented Hadoop rack awareness. Turk. J. Electr. Eng. Comput. Sci. 29(1), 437–453 (2021)
Paik, S.S., Goswami, R.S., Roy, D.S., Reddy, K.H.: Intelligent data placement in heterogeneous hadoop cluster. In: International Conference on Next Generation Computing Technologies, pp. 568–579. Springer, Singapore (2017)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of OSDI, vol. 8, no. 4, p. 7 (2008)
Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)
Reddy, K.H.K., Das, H., Roy, D.S.: A data aware scheme for scheduling big data applications with savanna hadoop. Networks of the Future, pp. 377–392. Chapman and Hall/CRC (2017)
Reddy, K.H.K., Roy, D.S.: Dppacs: a novel data partitioning and placement aware computation scheduling scheme for data-intensive cloud applications. Comput. J. 59(1), 64–82 (2016)
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278 (2010)
Naik, N.S., Negi, A., Tapas Bapu, B.R., Anitha, R.: A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Futur. Gener. Comput. Syst. 90, 423–434 (2019)
He, C., Lu, Y., Swanson, D.: Matchmaking: a new mapreduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 40–47. IEEE (2011)
Nabavinejad, S.M., Goudarzi, M., Mozaffari, S.: The memory challenge in reduce phase of MapReduce applications. IEEE Trans. Big Data 2(4), 380–386 (2016)
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 570–576. IEEE (2011)
Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: 2012 IEEE Fifth International Conference on Cloud Computing, pp. 49–58. IEEE (2012)
Arslan, E., Shekhar, M., Kosar, T.: Locality and network-aware reduce task scheduling for data-intensive applications. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds, pp. 17–24. IEEE (2014)
Ashu, A., Hussain, M.W., Roy, D.S., Reddy, H.K.: Intelligent data compression policy for Hadoop performance optimization. In: International Conference on Soft Computing and Pattern Recognition, pp. 80–89. Springer, Cham (2019)
Ho, L.Y., Wu, J.J., Liu, P.: Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: 2011 IEEE 4th International Conference on Cloud Computing, pp. 420–427. IEEE (2011)
Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous Hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)
Zhang, X., Wu, Y., Zhao, C.: MrHeter: improving MapReduce performance in heterogeneous environments. Clust. Comput. 19(4), 1691–1701 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hussain, M.W., Roy, D.S. (2022). A Counter-Based Profiling Scheme for Improving Locality Through Data and Reducer Placement. In: Dehuri, S., Chen, YW. (eds) Advances in Machine Learning for Big Data Analysis. Intelligent Systems Reference Library, vol 218. Springer, Singapore. https://doi.org/10.1007/978-981-16-8930-7_4
Download citation
DOI: https://doi.org/10.1007/978-981-16-8930-7_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8929-1
Online ISBN: 978-981-16-8930-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)