HAT: history-based auto-tuning MapReduce in heterogeneous environments

Chen, Quan; Guo, Minyi; Deng, Qianni; Zheng, Long; Guo, Song; Shen, Yao

doi:10.1007/s11227-011-0682-5

HAT: history-based auto-tuning MapReduce in heterogeneous environments

Published: 23 September 2011

Volume 64, pages 1038–1054, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Quan Chen¹,
Minyi Guo¹,
Qianni Deng¹,
Long Zheng^2,3,
Song Guo³ &
…
Yao Shen¹

603 Accesses
21 Citations
Explore all metrics

Abstract

In MapReduce model, a job is divided into a series of map tasks and reduce tasks. The execution time of the job is prolonged by some slow tasks seriously, especially in heterogeneous environments. To finish the slow tasks as soon as possible, current MapReduce schedulers launch a backup task on other nodes for each of the slow tasks. However, traditional MapReduce schedulers cannot detect slow tasks correctly since they cannot estimate the progress of tasks accurately (Hadoop home page http://hadoop.apache.org/, 2011; Zaharia et al. in 8th USENIX symposium on operating systems design and implementation, ACM, New York, pp. 29–42, 2008). To solve this problem, this paper proposes a History-based Auto-Tuning (HAT) MapReduce scheduler, which calculates the progress of tasks accurately and adapts to the continuously varying environment automatically. HAT tunes the weight of each phase of a map task and a reduce task according to the value of them in history tasks and uses the accurate weights of the phases to calculate the progress of current tasks. Based on the accurate-calculated progress of tasks, HAT estimates the remaining time of tasks accurately and further launches backup tasks for the tasks that have the longest remaining time. Experimental results show that HAT can significantly improve the performance of MapReduce applications up to 37% compared with Hadoop and up to 16% compared with LATE scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce

Article 28 October 2021

References

Aboulnaga A, Wang Z, Zhang ZY (2009) Packing the most onto your cloud. In: Proceeding of the first international workshop on Cloud data management. ACM, New York, pp 25–28
Chapter Google Scholar
Barroso LA, Dean J, Holzle U (2003) Web search for a planet: the Google cluster architecture. IEEE MICRO 23(2):22–28
Article Google Scholar
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 25(6):599–616
Article Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX symposium on operating systems design and implementation (OSDI 2006)
Google Scholar
Chen R, Chen H, Zang B (2010) Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques. ACM, New York, pp 523–534
Chapter Google Scholar
De Kruijf M, Sankaralingam K (2010) MapReduce for the cell broadband engine architecture. IBM J Res Dev 53(5):10
Google Scholar
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77
Article Google Scholar
Dean J, Ghemawat S (2004) Mapreduce: simplied data processing on large clusters. In: OSDI 2004: proceedings of 6th symposium on operating system design and implemention. ACM Press, New York, pp 137–150
Google Scholar
Elespuru P, Shakya S, Mishra S (2009) Mapreduce system over heterogeneous mobile devices. In: Software technologies for embedded and ubiquitous systems, pp 168–179
Chapter Google Scholar
Fang W, He B, Luo Q, Govindaraju NK (2010) Mars: accelerating MapReduce with graphics processors. IEEE Trans Parallel Distrib Syst
Fischer MJ, Su X, Yin Y (2010) Assigning tasks for efficiency in Hadoop. In: Proceedings of the 22nd ACM symposium on parallelism in algorithms and architectures. ACM, New York, pp 30–39
Chapter Google Scholar
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: SOSP 2003: proceedings of the 9th ACM symposium on operating systems principles. ACM, New York, pp 29–43
Google Scholar
Hadoop (2011) Hadoop home page. http://hadoop.apache.org/
Jiang W, Ravi VT, Agrawal G (2010) A map-reduce system with an alternate API for multi-core environments. In: 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing. IEEE Press, New York, pp 84–93
Chapter Google Scholar
Morton K, Balazinska M, Grossman D (2010) ParaTimer: a progress indicator for MapReduce DAGs. In: Proceedings of the 2010 international conference on management of data. ACM, New York, pp 507–518
Google Scholar
Polo J, Carrera D, Becerra Y, Torres J, Ayguadé E, Steinder M, Whalley I (2010) Performance management of accelerated MapReduce workloads in heterogeneous clusters. In: 39th international conference on parallel processing (ICPP2010). San Diego, CA, USA
Google Scholar
Rafique MM, Rose B, Butt AR, Nikolopoulos DS (2009) CellMR: a framework for supporting mapreduce on asymmetric cell-based clusters. In: IEEE international symposium on parallel & distributed processing. IPDPS 2009. IEEE Press, New York, pp 1–12
Chapter Google Scholar
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In: HPCA 2007: proceedings of the 2007 IEEE 13th international symposium on high performance computer architecture. IEEE Computer Society, Washington, DC, pp 13–24
Chapter Google Scholar
Sandholm T, Lai K (2010) Dynamic proportional share scheduling in hadoop. In: Job scheduling strategies for parallel processing. Springer, Berlin, pp 110–131
Chapter Google Scholar
Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363
Article Google Scholar
Shan Y, Wang B, Yan J, Wang Y, Xu N, Yang H (2010) FPMR: MapReduce framework on FPGA. In: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. ACM, New York, pp 93–102
Chapter Google Scholar
Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 2009 eighth international conference on grid and cooperative computing. IEEE Computer Society, Los Alamitos, pp 218–224
Chapter Google Scholar
Vaquero LM, Rodero-Merino L, Caceres J, Lindner M (2008) A break in the clouds: towards a cloud definition. Comput Commun Rev 39(1):50–55
Article Google Scholar
Varia J (2008) Cloud architectures. White paper of Amazon. jineshvaria.s3.amazonaws.com/public/cloudarchitectures-varia.pdf
Yahoo (2011) Yahoo! hadoop tutorial. http://developer.yahoo.com/hadoop/tutorial/
Yoo RM, Romano A, Kozyrakis C (2009) Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: IEEE international symposium on workload characterization. IISWC 2009. IEEE Press, New York, pp 198–207
Chapter Google Scholar
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user mapreduce clusters. Technical report, UCB/EECS-2009-55, University of California at Berkeley
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: 8th USENIX symposium on operating systems design and implementation. ACM, New York, pp 29–42
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Quan Chen, Minyi Guo, Qianni Deng & Yao Shen
Huazhong University of Science and Technology, Wuhan, China
Long Zheng
School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan
Long Zheng & Song Guo

Authors

Quan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Qianni Deng
View author publications
You can also search for this author in PubMed Google Scholar
Long Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Song Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yao Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minyi Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Q., Guo, M., Deng, Q. et al. HAT: history-based auto-tuning MapReduce in heterogeneous environments. J Supercomput 64, 1038–1054 (2013). https://doi.org/10.1007/s11227-011-0682-5

Download citation

Published: 23 September 2011
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11227-011-0682-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HAT: history-based auto-tuning MapReduce in heterogeneous environments

Abstract

Access this article

Similar content being viewed by others

Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HAT: history-based auto-tuning MapReduce in heterogeneous environments

Abstract

Access this article

Similar content being viewed by others

Performance Improvement of MapReduce Framework by Identifying Slow TaskTrackers in Heterogeneous Hadoop Cluster

Enhancing the Performance of MapReduce Default Scheduler by Detecting Prolonged TaskTrackers in Heterogeneous Environments

HTD: heterogeneous throughput-driven task scheduling algorithm in MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation