Optimized Capacity Scheduler for MapReduce Applications in Cloud Environments
Most of the current-day applications are data centric and involves lot of data processing. Technologies like hadoop enable data processing with automatic parallelism. Current-day applications which are more data intensive and compute intensive can take advantage of this automatic parallelism and the methodology of moving computation to data. In addition to it the Cloud computing technology enables users to establish the required clusters with required number of nodes instantly. Cloud computing has made easy for the users to execute large data applications without any requirement to establish/maintain the infrastructure. As cloud gives readily installed infrastructures, using hadoop on cloud has become common. The existing schedulers are very effective in static cluster environments but lack performance in virtual environments. The purpose of this work is to design an effective capacity scheduler for MapReduce applications for virtualized environments like public clouds by making scheduling decisions more intelligent using the characteristics of job and virtual machines.
KeywordsBig data Cloud computing CloudSim Hadoop MapReduce Virtual machine
- 1.Hadoop The definitive guide, O’Reilly & Yahoo Press, Tom White.Google Scholar
- 2.Sree Lakshmi, A., BalRaju, M., Subhash Chandra, N. (2016). Towards optimization of hadoop map reduce jobs on cloud. In IEEE International Conference on Computing, Analytics and Security Trends (CAST 2016), Dec 2016. ISBN: 978-1-5090-1338-8.Google Scholar
- 3.Sree Lakshmi, A., Bal Raju, M., & Subhash Chandra, N. (2015). Scheduling of parallel applications using map reduce on cloud: A literature survey. International Journal of Computer Science and Information Technologies, 6, 112–115.Google Scholar
- 4.Apache Hadoop Capacity Scheduler YARN: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
- 6.Kumar, A. K., Krishna, V., Voruganti, K., & Prabhakara Rao, G. V. (2012). CASH: Context aware scheduler for Hadoop. In ICACCI ‘12 Proceedings of the International Conference on Advances in Computing, Communications and Informatics.Google Scholar
- 7.Chen, Q., Zhang, D., Guo, M., Deng, Q., & Guo, S. (2010). SAMR: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment. Computer and Information Technology, International Conference, 2736–2743.Google Scholar
- 8.Mao, Y., Qi, H., Ping, P., & Li, X. (2016). FiGMR: A fine grained mapreduce scheduler in the heterogeneous cloud. In Proceedings of the IEEE International Conference of Information and Automation, Ningbo, China, August 2016.Google Scholar
- 9.Deshmukh, S., Aghav, J. V., & Chakravarthy, R. (2013). Job classification for mapreduce scheduler in heterogeneous environment. In 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies.Google Scholar
- 10.Wylie, A., Shi, W., Corriveau, J. P. (2016). A scheduling algorithm for hadoop mapreduce workflows with budget constraints in the heterogeneous cloud. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops.Google Scholar
- 11.Kang, H., Chen, Y., Wong, J. L., Sion, R., & Wu, J. (2011). Enhancement of Xen’s scheduler for MapReduce workloads. In Proceedings of the 20th international symposium on High performance distributed computing, New York, NY, USA, pp. 251–262.Google Scholar
- 12.Yazdanov, L., Gorbunov, M., & Fetzer, C. (2015). EHadoop: Network I/O aware scheduler for elastic MapReduce cluster. In 2015 IEEE 8th International Conference on Cloud Computing. https://doi.org/10.1109/cloud.2015.113.
- 14.Das, R., Singh, R. P., Patgiri, R. (2016). Mapreduce scheduler: A 360-degree view. International Journal of Current Engineering and Scientific Research (IJCESR), 3(11), ISSN (print): 2393–8374, (online): 2394–0697.Google Scholar
- 15.Kim, S., Kang, D., & Choi, J. (2015). I/O characteristics and implications of big data processing on virtualized environments. Applied Mathematics & Information Sciences An International Journal, 9(2L), 591–598.Google Scholar
- 16.Kim, S., Kang, D., Choi, J., & Kim, J. (2014). Burstiness-aware I/O scheduler for MapReduce framework on virtualized environments. In 2014 International Conference on Big Data and Smart Computing (BIGCOMP) (pp. 305–308). https://doi.org/10.1109/bigcomp.2014.
- 18.Wang, X., Shen, D., Yu, G., Nie, T., & Kou, Y. (2013). A throughput driven task scheduler for improving mapreduce performance in job-intensive environments. In 2013 IEEE International Congress on Big Data (pp. 211–218).Google Scholar
- 19.Yao, Y., Wang, J., Sheng, B., Lin, J., Mi, N. (2014). HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In 2014 IEEE International Conference on Cloud Computing. 978-1-4799-5063-8/14.Google Scholar