Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors
We propose a constant approximation algorithm for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total weighted completion time of a set of MapReduce jobs on unrelated processors and improve substantially on the model proposed by Moseley et al. (SPAA 2011) in two directions: (i) we consider jobs consisting of multiple Map and Reduce tasks, which is the key idea behind MapReduce computations, and (ii) we introduce into our model the crucial cost of the data shuffle phase, i.e., the cost for the transmission of intermediate data from Map to Reduce tasks. Moreover, we experimentally evaluate our algorithm compared with a lower bound on the optimal cost of our problem as well as with a fast algorithm, which combines a simple online assignment of tasks to processors with a standard scheduling policy. As we observe, for random instances that capture data locality issues, our algorithm achieves a better performance.
KeywordsCompletion Time Precedence Constraint Total Weighted Completion Time Reduce Task Flexible Flow Shop
Unable to display preview. Download preview PDF.
- 3.Chang, H., Kodialam, M.S., Kompella, R.R., Lakshman, T.V., Lee, M., Mukherjee, S.: Scheduling in mapreduce-like systems for fast completion time. In: IEEE Proceedings of the 30th International Conference on Computer Communications, pp. 3074–3082 (2011)Google Scholar
- 4.Chen, F., Kodialam, M.S., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in mapreduce systems. In: IEEE Proceedings of the 31st International Conference on Computer Communications, pp. 1143–1151 (2012)Google Scholar
- 6.Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation, pp. 137–150 (2004)Google Scholar
- 12.Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in map-reduce and flow-shops. In: Proc. of the 23rd ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 289–298 (2011)Google Scholar
- 15.Yoo, D.-J., Sim, K.M.: A comparative review of job scheduling for mapreduce. In: IEEE Proc. of the International Symposium on Cloud Computing and Intelligece Systems, pp. 353–358 (2011)Google Scholar