Cluster Computing

, Volume 22, Supplement 3, pp 6963–6976 | Cite as

Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems

  • Jenn-Wei Lin
  • Joseph M. ArulEmail author
  • Chi-Yi Lin


MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided into a number of map and reduce tasks by a well determined division manner on its processing data. In a cloud computing system, multiple MapReduce jobs may be submitted together to compete for the computing resources of the system. When a job has a particular performance requirement (e.g. execution deadline), the appropriate computing resources must be kept for executing the map/reduce tasks of the job; otherwise, the performance requirement cannot be satisfied. Several deadline-constrained MapReduce schedulers have been proposed, but most of them are not aware of the performance influence over existing tasks. We propose a deadline-constrained and influence-aware MapReduce scheduler which combines the following three factors: (1) relaxed data locality, (2) performance influence over existing tasks, and (3) coordinating allocation contention. We first adopt the data-locality criterion to make a tentative allocation plan. By verifying the data-locality allocation plan, if some new tasks severely affect existing tasks or the deadline requirements of some new tasks are not satisfied, the data-locality allocation plan will be modified by re-allocating some new tasks. To optimize the computing resource usage, the solution of a well-known network graph problem: minimum cost maximum-flow (MCMF) is applied to perform the modification of the data-locality allocation plan. A heuristic algorithm is also presented to suppress the complexity of MCMF problem. In addition to meeting the deadline requirements of new jobs, the final allocation plan also considers the performance influence over existing jobs. Finally, we conduct the performance analysis to demonstrate the performance of our proposed MapReduce scheduler using various performance metrics.


MapReduce Big data Cloud computing Scheduler Task allocation 



This research was supported by the Ministry of Science and Technology, Taiwan, R.O.C, under Grant MOST 105-2221-E-030-004-MY3.


  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)CrossRefGoogle Scholar
  2. 2.
    Buyya, R., Broberg, J., Goscinski, A.M.: Cloud Computing Principles and Paradigms. Wiley Publishing, Hoboken (2011)CrossRefGoogle Scholar
  3. 3.
    Zhang, B., Krikava, F., Rouvoy, R., Seinturier, L.: Self-configuration of the number of concurrently running MapReduce jobs in a hadoop cluster. In: Proceedings of the IEEE international conference on autonomic computing, pp. 149–150 (2015)Google Scholar
  4. 4.
    White, T.: Hadoop: The Definitive Guide, 3rd edn. Inc. O’Reilly Media, Beijing (2012)Google Scholar
  5. 5.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of USENIX Conference OSDI, pp. 29–42 (2008)Google Scholar
  6. 6.
    Tang, Z., Zhou, J., Li, K., Li, R.: A MapReduce task scheduling algorithm for deadline constraints. Clust. Comput. 16, 651–662 (2013)CrossRefGoogle Scholar
  7. 7.
    Shin, S., Kim, Y., Lee S.: Deadline-guaranteed scheduling algorithm with improved resource utilization for cloud computing. In: 12th annual IEEE consumer communications and networking conference, pp. 814–819 (2015)Google Scholar
  8. 8.
    Chen, C.H., Lin, J.W., Kuo, S.Y.: MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. In: IEEE transactions on cloud computing, accepted for publicationGoogle Scholar
  9. 9.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory Algorithms and Applications, 1st edn. Prentice Hall, Upper Saddle River (1993)zbMATHGoogle Scholar
  10. 10.
    Tiwari, N., Sarkar, S., Bellur, U., Indrawan, M.: Classification framework of MapReduce scheduling algorithms. ACM Comput. Surv. (CSUR) 47(3), 49:1–49:38 (2015)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Ho, L.Y., Wu, J.J., Liu, P.: Optimal algorithms for cross-rack communication optimization in MapReduce framework. In: Proceedings of IEEE CLOUD, pp. 420–427 (2011)Google Scholar
  13. 13.
    Sokkalingam, P.T., Ahuja, R.K., Orlin, J.B.: New polynomial-time cycle-canceling algorithms for minimum-cost flows. Networks 36(1), 53–63 (2000)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Xu, C.X.: A simple solution to maximum flow at minimum cost. In: Proceedings of 2010 2nd International Conference Information Engineering and Computer Science (ICIECS 10), pp. 1–4 (2010)Google Scholar
  15. 15.
    Kelner, J.A., Lee, Y.T., Orecchia, L., Sidford, A.: An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, p. 217–226 (2014)Google Scholar
  16. 16.
    MathWorks—MATLAB and Simulink for technical computing. (2017)
  17. 17.
  18. 18.
    Sarda, K., Sanghrajka, S., Sion, R.: Cloud Performance Benchmark Series: Amazon EC2 CPU Speed Benchmark. Department of Computer Science, Stony Brook University, Tech. Rep. (2010)Google Scholar
  19. 19.
  20. 20.
    Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2014)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Dept. of Computer Science and Information EngineeringFu Jen Catholic University InstituteNew Taipei CityTaiwan
  2. 2.Department of Computer Science and Information EngineeringTamkang UniversityNew Taipei CityTaiwan

Personalised recommendations