Skip to main content

A Counter-Based Profiling Scheme for Improving Locality Through Data and Reducer Placement

  • Chapter
  • First Online:
Advances in Machine Learning for Big Data Analysis

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 218))

Abstract

Hadoop has been regarded as the de-facto standard for handling data-intensive distributed applications with its popular storage and processing engine called as the Hadoop distributed File System (HDFS) and MapReduce. Hadoop’s inherent assumption of homogeneity in the cluster is a major cause of performance deterioration due to the huge shuffle required for the processing of data during map phase and reducer phase. This chapter addresses this performance deterioration by proposing a counter placement scheme (CPS) whose main contributions are enumerated as follows; (i) Profiling of nodes based on the completion of maps, (ii) Movement of high-performance nodes into a single rack for tracking higher computation, (iii) Data replacement strategy based on placing at least a single block of file in the rack with the highest computation, and (iv) Finally assigning reducers to the rack and node with highest computation. The experiments performed clearly signify the merits of CPS in terms of reduction in the average completion time, reduce time and off-local shuffle by about (1.9–22.83%), (2.1–21.5%), (4.25–24%) while running several benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Khan, N., Yaqoob, I., Hashem, I.A.T., Inayat, Z., Mahmoud Ali, W.K., Alam, M., Gani, A., et al.: Big data: survey, technologies, opportunities, and challenges. Sci. World J. (2014)

    Google Scholar 

  2. Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. 30(4), 431–448 (2018)

    Google Scholar 

  3. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)

    Article  Google Scholar 

  4. Welcome to Apache Hadoop 2021. https://hadoop.apache.org/

  5. White, T.: Hadoop: The Definitive Guide. O'Reilly Media, Inc. (2012)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Hussain, M.W., Reddy, K.H.K., Roy, D.S.: Resource aware execution of speculated tasks in Hadoop with SDN. Int. J. Adv. Sci. Technol. 28(13), 72–84 (2019)

    Google Scholar 

  8. Hussain, M.W., Reddy, K.H., Roy, D.S.: A counter based approach for reducer placement with augmented Hadoop rack awareness. Turk. J. Electr. Eng. Comput. Sci. 29(1), 437–453 (2021)

    Google Scholar 

  9. Paik, S.S., Goswami, R.S., Roy, D.S., Reddy, K.H.: Intelligent data placement in heterogeneous hadoop cluster. In: International Conference on Next Generation Computing Technologies, pp. 568–579. Springer, Singapore (2017)

    Google Scholar 

  10. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of OSDI, vol. 8, no. 4, p. 7 (2008)

    Google Scholar 

  11. Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)

    Article  MathSciNet  Google Scholar 

  12. Reddy, K.H.K., Das, H., Roy, D.S.: A data aware scheme for scheduling big data applications with savanna hadoop. Networks of the Future, pp. 377–392. Chapman and Hall/CRC (2017)

    Google Scholar 

  13. Reddy, K.H.K., Roy, D.S.: Dppacs: a novel data partitioning and placement aware computation scheduling scheme for data-intensive cloud applications. Comput. J. 59(1), 64–82 (2016)

    Google Scholar 

  14. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278 (2010)

    Google Scholar 

  15. Naik, N.S., Negi, A., Tapas Bapu, B.R., Anitha, R.: A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Futur. Gener. Comput. Syst. 90, 423–434 (2019)

    Google Scholar 

  16. He, C., Lu, Y., Swanson, D.: Matchmaking: a new mapreduce scheduling technique. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 40–47. IEEE (2011)

    Google Scholar 

  17. Nabavinejad, S.M., Goudarzi, M., Mozaffari, S.: The memory challenge in reduce phase of MapReduce applications. IEEE Trans. Big Data 2(4), 380–386 (2016)

    Article  Google Scholar 

  18. Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 570–576. IEEE (2011)

    Google Scholar 

  19. Hammoud, M., Rehman, M.S., Sakr, M.F.: Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: 2012 IEEE Fifth International Conference on Cloud Computing, pp. 49–58. IEEE (2012)

    Google Scholar 

  20. Arslan, E., Shekhar, M., Kosar, T.: Locality and network-aware reduce task scheduling for data-intensive applications. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds, pp. 17–24. IEEE (2014)

    Google Scholar 

  21. Ashu, A., Hussain, M.W., Roy, D.S., Reddy, H.K.: Intelligent data compression policy for Hadoop performance optimization. In: International Conference on Soft Computing and Pattern Recognition, pp. 80–89. Springer, Cham (2019)

    Google Scholar 

  22. Ho, L.Y., Wu, J.J., Liu, P.: Optimal algorithms for cross-rack communication optimization in mapreduce framework. In: 2011 IEEE 4th International Conference on Cloud Computing, pp. 420–427. IEEE (2011)

    Google Scholar 

  23. Xiong, R., Luo, J., Dong, F.: Optimizing data placement in heterogeneous Hadoop clusters. Clust. Comput. 18(4), 1465–1480 (2015)

    Article  Google Scholar 

  24. Zhang, X., Wu, Y., Zhao, C.: MrHeter: improving MapReduce performance in heterogeneous environments. Clust. Comput. 19(4), 1691–1701 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diptendu Sinha Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hussain, M.W., Roy, D.S. (2022). A Counter-Based Profiling Scheme for Improving Locality Through Data and Reducer Placement. In: Dehuri, S., Chen, YW. (eds) Advances in Machine Learning for Big Data Analysis. Intelligent Systems Reference Library, vol 218. Springer, Singapore. https://doi.org/10.1007/978-981-16-8930-7_4

Download citation

Publish with us

Policies and ethics