A Counter-Based Profiling Scheme for Improving Locality Through Data and Reducer Placement

Hussain, Mir Wajahat; Roy, Diptendu Sinha

doi:10.1007/978-981-16-8930-7_4

Mir Wajahat Hussain⁵ &
Diptendu Sinha Roy⁶

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 218))

534 Accesses
3 Citations

Abstract

Hadoop has been regarded as the de-facto standard for handling data-intensive distributed applications with its popular storage and processing engine called as the Hadoop distributed File System (HDFS) and MapReduce. Hadoop’s inherent assumption of homogeneity in the cluster is a major cause of performance deterioration due to the huge shuffle required for the processing of data during map phase and reducer phase. This chapter addresses this performance deterioration by proposing a counter placement scheme (CPS) whose main contributions are enumerated as follows; (i) Profiling of nodes based on the completion of maps, (ii) Movement of high-performance nodes into a single rack for tracking higher computation, (iii) Data replacement strategy based on placing at least a single block of file in the rack with the highest computation, and (iv) Finally assigning reducers to the rack and node with highest computation. The experiments performed clearly signify the merits of CPS in terms of reduction in the average completion time, reduce time and off-local shuffle by about (1.9–22.83%), (2.1–21.5%), (4.25–24%) while running several benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Khan, N., Yaqoob, I., Hashem, I.A.T., Inayat, Z., Mahmoud Ali, W.K., Alam, M., Gani, A., et al.: Big data: survey, technologies, opportunities, and challenges. Sci. World J. (2014)
Google Scholar
Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. 30(4), 431–448 (2018)
Google Scholar
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Article Google Scholar
Welcome to Apache Hadoop 2021. https://hadoop.apache.org/
White, T.: Hadoop: The Definitive Guide. O'Reilly Media, Inc. (2012)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Hussain, M.W., Reddy, K.H.K., Roy, D.S.: Resource aware execution of speculated tasks in Hadoop with SDN. Int. J. Adv. Sci. Technol. 28(13), 72–84 (2019)
Google Scholar
Hussain, M.W., Reddy, K.H., Roy, D.S.: A counter based approach for reducer placement with augmented Hadoop rack awareness. Turk. J. Electr. Eng. Comput. Sci. 29(1), 437–453 (2021)
Google Scholar
Paik, S.S., Goswami, R.S., Roy, D.S., Reddy, K.H.: Intelligent data placement in heterogeneous hadoop cluster. In: International Conference on Next Generation Computing Technologies, pp. 568–579. Springer, Singapore (2017)
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of OSDI, vol. 8, no. 4, p. 7 (2008)
Google Scholar
Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)
Article MathSciNet Google Scholar
Reddy, K.H.K., Das, H., Roy, D.S.: A data aware scheme for scheduling big data applications with savanna hadoop. Networks of the Future, pp. 377–392. Chapman and Hall/CRC (2017)
Google Scholar
Reddy, K.H.K., Roy, D.S.: Dppacs: a novel data partitioning and placement aware computation scheduling scheme for data-intensive cloud applications. Comput. J. 59(1), 64–82 (2016)
Google Scholar
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278 (2010)
Google Scholar
Naik, N.S., Negi, A., Tapas Bapu, B.R., Anitha, R.: A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Futur. Gener. Comput. Syst. 90, 423–434 (2019)
Google Scholar
He, C., Lu, Y., Swanson, D.: Matchmaking: a new mapreduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 40–47. IEEE (2011)
Google Scholar
Nabavinejad, S.M., Goudarzi, M., Mozaffari, S.: The memory challenge in reduce phase of MapReduce applications. IEEE Trans. Big Data 2(4), 380–386 (2016)
Article Google Scholar
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 570–576. IEEE (2011)
Google Scholar
Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: 2012 IEEE Fifth International Conference on Cloud Computing, pp. 49–58. IEEE (2012)
Google Scholar
Arslan, E., Shekhar, M., Kosar, T.: Locality and network-aware reduce task scheduling for data-intensive applications. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds, pp. 17–24. IEEE (2014)
Google Scholar
Ashu, A., Hussain, M.W., Roy, D.S., Reddy, H.K.: Intelligent data compression policy for Hadoop performance optimization. In: International Conference on Soft Computing and Pattern Recognition, pp. 80–89. Springer, Cham (2019)
Google Scholar
Ho, L.Y., Wu, J.J., Liu, P.: Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: 2011 IEEE 4th International Conference on Cloud Computing, pp. 420–427. IEEE (2011)
Google Scholar
Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous Hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)
Article Google Scholar
Zhang, X., Wu, Y., Zhao, C.: MrHeter: improving MapReduce performance in heterogeneous environments. Clust. Comput. 19(4), 1691–1701 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Alliance College of Engineering & Design, Alliance University, Anekal, Karnataka, 562106, India
Mir Wajahat Hussain
Department of Computer Science & Engineering, National Institute of Technology Meghalaya, Shillong, Meghalaya, 793003, India
Diptendu Sinha Roy

Authors

Mir Wajahat Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Diptendu Sinha Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diptendu Sinha Roy .

Editor information

Editors and Affiliations

Department of Information and Communication Technology, Fakir Mohan University, Balasore, India
Satchidananda Dehuri
College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan
Yen-Wei Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hussain, M.W., Roy, D.S. (2022). A Counter-Based Profiling Scheme for Improving Locality Through Data and Reducer Placement. In: Dehuri, S., Chen, YW. (eds) Advances in Machine Learning for Big Data Analysis. Intelligent Systems Reference Library, vol 218. Springer, Singapore. https://doi.org/10.1007/978-981-16-8930-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-16-8930-7_4
Published: 24 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8929-1
Online ISBN: 978-981-16-8930-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics