Skip to main content

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

  • Conference paper
  • First Online:
Advanced Multimedia and Ubiquitous Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 354))

  • 1407 Accesses

Abstract

In Hadoop MapReduce distributed file system, as the input dataset files get loaded and split to every worker, workers start to do the required computation according to user logic. This process is done in parallel using all nodes in the cluster and computes output results. However, the contention of resources between the map and reduce stages cause significant delays in execution time, especially due to the memory IO overheads. This is undesired because the task execution in the Hadoop MapReduce induces an overhead in considering redundant data in case of imprecise applications which increases the execution time. Thus, in this paper we present our approach to optimize local worker memory management mechanism to reduce the presence of null schedule slots. Efficient utilization of slots leads to reduce execution times. The local memory management mechanism adopted enables efficient parallel execution and reduced memory overheads. The approach effectively reduced the MapReduce computation time which minimizes the budget for application execution in the cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gupta, P.K.: Introduction to Analytics and Big Data/Hadoop. Implementing Information Infrastructure Summit (IIIS). Marina Mandarin, Singapore, 30 May 2013. http://issuu.com/fairfaxbm/docs/cws_jul-aug2013/17

  2. http://hadoop.apache.org/

  3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  4. Kolb, L., Thor, A., Rahm, E.: Load balancing for mapreduce-based entity resolution. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 618–629 (2012)

    Google Scholar 

  5. Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., Li, W.: A hierarchical framework for cross-domain MapReduce execution. In: Proceedings of ECMLS, pp. 15–22 (2011)

    Google Scholar 

  6. Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI. USENIX, pp. 29–42 (2008)

    Google Scholar 

  7. https://developer.yahoo.com/hadoop/tutorial/module4.html

  8. Thottethodi, M., Ahmad, F., Lee, S., Vijaykumar, T.N.: Puma: Purdue mapreduce benchmarks suite. Technical Report, Purdue University (2012)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. NRF-2013R1A1A2013401).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Abdulhakim Al-Absi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Al-Absi, A.A., Kang, DK., Kim, MJ. (2016). Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications. In: Park, J., Chao, HC., Arabnia, H., Yen, N. (eds) Advanced Multimedia and Ubiquitous Engineering. Lecture Notes in Electrical Engineering, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47895-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-47895-0_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-47894-3

  • Online ISBN: 978-3-662-47895-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics