Skip to main content

A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 503))

Abstract

As a distributed computing platform, Hadoop provides an effective way to handle big data. In Hadoop, the completion time of job will be delayed by a straggler. Although the definitive cause of the straggler is hard to detect, speculative execution is usually used for dealing with this problem, by simply backing up those stragglers on alternative nodes. In this paper, we design a new Speculative Execution algorithm based on C4.5 Decision Tree, SECDT, for Hadoop. In SECDT, we speculate completion time of stragglers and also of backup tasks, based on a kind of decision tree method: C4.5 decision tree. After we speculate the completion time, we compare the completion time of stragglers and of the backup tasks, calculating their differential value, and selecting the straggler with the maximum differential value to start the backup task. Experiment result shows that the SECDT can predict execution time more accurately than other speculative execution methods, hence reduce the job completion time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shvachko, K., Kuang, H., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)

    Google Scholar 

  2. Bhandarkar, M.: Hadoop: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1138–1138 (2013)

    Google Scholar 

  3. Apache (2014), http://hadoop.apache.org/docs/r2.2.0/

  4. Hsu, I.: Multilayer context cloud framework for mobile Web 2.0: a proposed infrastructure. International Journal of Communication Systems 26(5), 610–625 (2013)

    Article  Google Scholar 

  5. Kim, Y.P., Hong, C.H., Yoo, C.: Performance impact of JobTracker failure in Hadoop. International Journal of Communication Systems (2014)

    Google Scholar 

  6. Wang, B.Y., Pu, X.Y.: Study of an improved hadoop speculative execution algorithm. Applied Mechanics and Materials 513, 2281–2284 (2014)

    Article  Google Scholar 

  7. Hayashi, A., Grossman, M., et al.: Speculative execution of parallel programs with precise exception semantics on gpus. LCPC (2013)

    Google Scholar 

  8. Xu, H., Lau, W.C.: Optimization for Speculative Execution of Multiple Jobs in a MapReduce-like Cluster. arXiv preprint arXiv (2014)

    Google Scholar 

  9. Chen, Q., Liu, C., Xiao, Z.: Improving mapreduce performance using smart speculative execution strategy (2013)

    Google Scholar 

  10. Chen, Q., Guo, M., et al.: HAT: history-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing 64(3), 1038–1054 (2013)

    Article  Google Scholar 

  11. Chen, Q., Zhang, D., et al.: Samr: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), pp. 2736–2743. IEEE (2010)

    Google Scholar 

  12. Sun, X.: An enhanced self-adaptive MapReduce scheduling algorithm. University of Nebraska (2012)

    Google Scholar 

  13. China news summary of November 2013 (2013), http://www.datatang.com/data/45718

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Yang, Q., Lai, S., Li, B. (2015). A New Speculative Execution Algorithm Based on C4.5 Decision Tree for Hadoop. In: Wang, H., et al. Intelligent Computation in Big Data Era. ICYCSEE 2015. Communications in Computer and Information Science, vol 503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46248-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46248-5_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46247-8

  • Online ISBN: 978-3-662-46248-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics