Efficient Resource Scheduling for Big Data Processing in Cloud Platform
Nowadays, Big data processing in cloud is becoming an inevitable trend. For Big data processing, a specially designed cloud resource allocation approach is required. However, it is challenging how to efficiently allocate resources dynamically based on Big data applications’ QoS demands and support energy and cost savings by optimizing the number of servers in use. In order to solve this problem, a general problem formulation is established in this paper. By giving certain assumptions, we prove that the reduction of resource waste has a direct relation with cost minimization. Based on that, we develop efficient heuristic algorithms with tuning parameters to find cost minimized dynamic resource allocation solutions for the above-mentioned problem. In paper, we study and test the workload of Big data by running a group of typical Big data jobs, i.e., video surveillance services, on Amazon Cloud EC2. Then we create a large simulation scenario and compare our proposed method with other approaches.
KeywordsBig data resource allocation cloud computing optimization
Unable to display preview. Download preview PDF.
- 1.Demchenko, Y., Zhao, Z., Grosso, P., Wibisono, A., de Laat, C.: Addressing big data challenges for scientific data infrastructure. In: 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 614–617. IEEE (2012)Google Scholar
- 2.Ji, C., Li, Y., Qiu, W., Awada, U., Li, K.: Big data processing in cloud computing environments. In: 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks (ISPAN), pp. 17–23. IEEE (2012)Google Scholar
- 3.Guo, S., Xiong, J., Wang, W., Lee, R.: Mastiff: A mapreduce-based system for time-based big data analytics. In: 2012 IEEE International Conference on Cluster Computing (CLUSTER), pp. 72–80. IEEE (2012)Google Scholar
- 4.Zhang, G., Li, C., Zhang, Y., Xing, C., Yang, J.: An efficient massive data processing model in the cloud – a preliminary report. In: 2012 Seventh ChinaGrid Annual Conference (ChinaGrid), pp. 148–155 (2012)Google Scholar
- 6.Guo, J., Zhu, Z.-M., Zhou, X.-M., Zhang, G.-X.: An instances placement algorithm based on disk i/o load for big data in private cloud. In: 2012 International Conference on Wavelet Active Media Technology and Information Processing (ICWAMTIP), pp. 287–290 (2012)Google Scholar
- 7.Kaushik, R.T., Nahrstedt, K.: T: a data-centric cooling energy costs reduction approach for big data analytics cloud. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 52. IEEE Computer Society Press (2012)Google Scholar
- 8.Mo, X., Wang, H.: Asynchronous index strategy for high performance real-time big data stream storage. In: 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC), pp. 232–236. IEEE (2012)Google Scholar
- 9.Jung, N.G., Gnanasambandam, Mukherjee, T.: Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. In: 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), pp. 811–818 (2012)Google Scholar
- 10.Rahman, M., Li, X., Palit, H.: Hybrid heuristic for scheduling data analytics workflow applications in hybrid cloud environment. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 966–974. IEEE (2011)Google Scholar
- 12.Jain, N., Menache, I., Naor, J., Yaniv, J.: Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters. In: Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 255–266. ACM (2012)Google Scholar