Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

Bibal Benifa, J. V.; Dejey

doi:10.1007/s11277-017-3953-5

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

Published: 13 January 2017

Volume 95, pages 2709–2733, (2017)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

J. V. Bibal Benifa¹ &
Dejey¹

310 Accesses
10 Citations
Explore all metrics

Abstract

MapReduce is a parallel programming model for processing the data-intensive applications in a cloud environment. The scheduler greatly influences the performance of MapReduce model while utilized in heterogeneous cluster environment. The dynamic nature of cluster environment and computing workloads affect the execution time and computational resource usage in the scheduling process. Further, data locality is essential for reducing total job execution time, cross-rack communication, and to improve the throughput. In the present work, a scheduling strategy named efficient locality and replica aware scheduling (ELRAS) integrated with an autonomous replication scheme (ARS) is proposed to enhance the data locality and performs consistently in the heterogeneous environment. ARS autonomously decides the data object to be replicated by considering its popularity and removes the replica as it is idle. The proposed approach is validated in a heterogeneous cluster environment with various realistic applications that are IO bound, CPU bound and mixed workloads. ELRAS improves the throughput by a factor about 2 as compared with the existing FIFO and it also yields near optimal data locality, reduce the execution time, and effective utilization of resources. The simplicity of ELRAS algorithm proves its feasibility to adopt for a wide range of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Dynamic resource allocation in cloud computing: analysis and taxonomies

Article 28 January 2022

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

References

Wang, W., Zhu, K., & Ying, L. (2016). MapTask scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality. IEEE/ACM Transactions on Networking, 24(1), 190–203.
Article Google Scholar
Alsmirat, M. A., Jararweh, Y., Obaidat, I., & Gupta, B. B. (2016). Internet of surveillance: A cloud supported large-scale wireless surveillance system. Journal of Supercomputing. doi:10.1007/s11227-016-1857-x.
Google Scholar
Gou, Z., Yamaguchi, S., & Gupta, B. B. (2016). Analysis of various security issues and challenges in cloud computing environment: A survey. In Handbook of research on modern cryptographic solutions for computer and cyber security (pp. 393–419, Chapter 17). IGI Global. doi:10.4018/978-1-5225-0105-3.ch017.
Dean, J., & Ghemawat, S. (2008). MapReduce simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. (50th anniversary issue).
Article Google Scholar
Tripathi, S., Gupta, B. B., Almomani, A., Mishra, A., & Veluru, S. (2013). Hadoop based defense solution to handle distributed denial of service (DDoS) attacks. Journal of Information Security, 4, 150–164.
Article Google Scholar
Tiwari, N., Sarkar, S., Bellur, U., & Indrawan, M. (2015). Classification framework of MapReduce scheduling algorithms. Journal of ACM Computing Surveys, 47(3), 49.
Sun, M., Zhuang, H., Zhou, X., Lu, K., & Li, C. (2014). HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters. In Algorithms and architectures for parallel processing: 14th International conference, China (Vol. 8631, pp. 82–95).
Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., & Stoica, I. (2009). Job scheduling for multi-user MapReduce clusters. University of California, Berkeley, Technical Report No. UCB/EECS-2009-55.
Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Proceedings of the twenty-second annual ACM symposium on parallelism in algorithms and architectures, Greece (pp. 30–39).
Hadoop Distributed File System. Accesed Oct 30, 2016, from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
Lim, N., Majumdar, S., & Smith, P. A. (2015). A constraint programming based Hadoop scheduler for handling MapReduce jobs with deadlines on clouds. In Proceedings of the 6th ACM/SPEC international conference on performance engineering, Texas, USA (pp. 111–122).
Accesed Oct 30, 2016, from https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html.
Accesed Oct 30, 2016, from https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
Zhang, X., Feng, Y., Feng, S., Fan, J., & Ming, Z. (2011). An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments. In International conference on cloud and service computing.
Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In European conference on computer systems, Paris (pp. 265–278).
Palanisamy, B., Singh, A., Liu, L., & Jain, B. (2011). Purlieus: Locality-aware resource allocation for MapReduce in a cloud. In Proceedings of international conference for high performance computing, networking, storage and analysis, New York, USA.
Rasooli, A., & Down, D. G. (2014). COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems. Future Generation Computer Systems, 36, 1–15.
Article Google Scholar
Rasooli, A., & Down, D. G. (2012). A hybrid scheduling approach for scalable heterogeneous Hadoop systems. In Proceedings of the 2012 SC companion: high performance computing, networking storage and analysis, Washington DC (pp. 1284–1291).
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., & Goldberg, A. (2009). Quincy: Fair scheduling for distributed computing clusters. In Symposium on operating systems principles (pp. 261–276).
Morton, K., Balazinska, M., & Grossman, D. (2010). ParaTimer: A progress indicator for MapReduce DAGs. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 507–518). ACM.
Hanif, M., & Lee, C. (2016). An efficient key partitioning scheme for heterogeneous MapReduce clusters. In 18th International conference on advanced communication technology (ICACT), IEEE, INSPEC Accession Number: 15823957.
Mao, Y., Zhong, H., & Wang, L. (2015). A fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment. In 14th International symposium on distributed computing and applications for business engineering and science.
Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., & Stoica, I. (2009). Improving MapReduce performance in heterogeneous environments. In USENIX symposium on operating systems design and implementation (pp. 29–42).
Tian, C., Zhou, H., He, Y., and Zha, L. (2009). A dynamic MapReduce scheduler for heterogeneous workloads. In Eighth international conference on grid and cooperative computing, INSPEC Accession Number: 1090627.
Chang, R. S., Chang, J. S., & Lin, S. Y. (2007). Job scheduling and data replication on data grids. Future Generation Computer Systems, 23, 846–860.
Article Google Scholar
Foster, I., & Ranganathan, K. (2002). Decoupling computation and data scheduling in distributed data-intensive applications. In Proceedings of the 11th IEEE international symposium on high performance distributed computing, HPDC-11. IEEE, CS Press, Edinburgh, UK (pp. 352–358).
Park, S. M., Kim, J. H., Go, Y. B., & Yoon, W. S. (2003). Dynamic grid replication strategy based on internet hierarchy. In International workshop on grid and cooperative computing, Lecture note in computer science (Vol. 1001, pp. 1324–1331).
Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., & Tuecke, S. (2000). The data grid: Towards an architecture for distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23, 187–200.
Article Google Scholar
Polo, J., Castillo, C., Carrera, D., Becerra, Y., Whalley, I., Steinder, M., Torres, J., & Ayguade, E. (2011). Resource-aware adaptive scheduling for MapReduce clusters. In ACM/IFIP/USENIX international conference on distributed systems platforms and open distributed processing (pp. 187–207).
Hammoud, M., & Sakr, M. F. (2011). Locality-aware reduce task scheduling for MapReduce. In IEEE third international conference on cloud computing technology and science (CloudCom) (pp. 570–576).
Chen, Q., Guo, M., Deng, Q., Zheng, L., Guo, S., & Shen, Y. (2011). HAT: History-based auto-tuning MapReduce in heterogeneous environments. The Journal of Supercomputing, 64(3), 1038–1054.
Article Google Scholar
Chen, Q., Zhang, D., Guo, M., Deng, Q., & Guo, S. (2010). SAMR: A self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In IEEE 10th international conference on computer and information technology (CIT), Bradford (pp. 2736–2743).
Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., & Wu, S. (2012). Maestro: Replica-aware map scheduling for MapReduce. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). doi:10.1109/CCGrid.2012.122.
Kumar, K. A., Konishetty, V. K., Voruganti, K., & Rao, G. V. P. CASH: Context aware scheduler for Hadoop. In Proceedings of the international conference on advances in computing, communications and informatics, Chennai, India (pp. 52–61).
Zacheilas, N., & Kalogeraki, V. (2016). ChEsS: Cost-effective scheduling across multiple heterogeneous MapReduce clusters. In IEEE international conference on autonomic computing (ICAC) (pp. 65–74).
Huang, S., Huang, J., Liu, Y., Yi, L., & Dai, J. (2010). The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In IEEE 26th international conference on data engineering workshops (ICDEW), Long Beach, CA (pp. 41–51).

Download references

Acknowledgements

The author(s) greatly acknowledge the support of Department of Computer Science and Engineering, Anna University—Regional Campus, Tirunelveli, India for providing the computing facilities to complete this research work successfully.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Anna University- Regional Campus, Tirunelveli, Tamil Nadu, 627007, India
J. V. Bibal Benifa & Dejey

Authors

J. V. Bibal Benifa
View author publications
You can also search for this author in PubMed Google Scholar
Dejey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. V. Bibal Benifa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bibal Benifa, J.V., Dejey Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy. Wireless Pers Commun 95, 2709–2733 (2017). https://doi.org/10.1007/s11277-017-3953-5

Download citation

Published: 13 January 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11277-017-3953-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Dynamic resource allocation in cloud computing: analysis and taxonomies

Containerization technologies: taxonomies, applications and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Dynamic resource allocation in cloud computing: analysis and taxonomies

Containerization technologies: taxonomies, applications and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation