Skip to main content

Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors

  • Conference paper
  • First Online:
Experimental Algorithms (SEA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9125))

Included in the following conference series:

Abstract

We propose a constant approximation algorithm for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total weighted completion time of a set of MapReduce jobs on unrelated processors and improve substantially on the model proposed by Moseley et al. (SPAA 2011) in two directions: (i) we consider jobs consisting of multiple Map and Reduce tasks, which is the key idea behind MapReduce computations, and (ii) we introduce into our model the crucial cost of the data shuffle phase, i.e., the cost for the transmission of intermediate data from Map to Reduce tasks. Moreover, we experimentally evaluate our algorithm compared with a lower bound on the optimal cost of our problem as well as with a fast algorithm, which combines a simple online assignment of tasks to processors with a standard scheduling policy. As we observe, for random instances that capture data locality issues, our algorithm achieves a better performance.

This research was supported by the projects “Handling uncertainty in data intensive applications on a distributed computing environment (cloud computing) (DELUGE)” (D. Fotakis, I. Milis and E. Zampetakis) and “Energy Efficiency of Road Networks and Vehicles: Measurement, Pricing, Regional and Environmental Effects (EERNV)” (G.Zois), co-financed by the European Union (European Social Fund - ESF) and Greek national funds, through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES, investing in knowledge society through the European Social Fund. A short extended abstract of this work, including partial results, appeared in EDBT/ICDT 2014 Workshop on Algorithms for MapReduce and Beyond.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Transactions on Knowledge and Data Engineering 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  2. Aspnes, J., Azar, Y., Fiat, A., Plotkin, S., Waarts, O.: On-line Routing of Virtual Circuits with Applications to Load Balancing and Machine Scheduling. Journal of the ACM 44(3), 486–504 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  3. Chang, H., Kodialam, M.S., Kompella, R.R., Lakshman, T.V., Lee, M., Mukherjee, S.: Scheduling in mapreduce-like systems for fast completion time. In: IEEE Proceedings of the 30th International Conference on Computer Communications, pp. 3074–3082 (2011)

    Google Scholar 

  4. Chen, F., Kodialam, M.S., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in mapreduce systems. In: IEEE Proceedings of the 31st International Conference on Computer Communications, pp. 1143–1151 (2012)

    Google Scholar 

  5. Correa, J.R., Skutella, M., Verschae, J.: The power of preemption on unrelated machines and applications to scheduling orders. Mathematics of Operations Research 37(2), 379–398 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation, pp. 137–150 (2004)

    Google Scholar 

  7. Garey, M.R., Johnson, D.S., Sethi, R.: The complexity of flowshop and jobshop scheduling. Mathematics of Operations Research 1(2), 117–129 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  8. Gonzalez, T., Sahni, S.: Flowshop and jobshop schedules: complexity and approximation. Operations research 26(1), 36–52 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  9. Hall, L.A., Schulz, A.S., Shmoys, D.B., Wein, J.: Scheduling to minimize average completion time: Off-line and on-line approximation algorithms. Mathematics of Operations Research 22, 513–544 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  10. Hariri, A.M., Potts, C.N.: Heuristics for scheduling unrelated parallel machines. Computers and Operations Research 18(3), 323–331 (1991)

    Article  MATH  Google Scholar 

  11. Mastrolilli, M., Queyranne, M., Schulz, A.S., Svensson, O., Uhan, N.A.: Minimizing the sum of weighted completion times in a concurrent open shop. Operations Research Letters 38(5), 390–395 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  12. Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in map-reduce and flow-shops. In: Proc. of the 23rd ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 289–298 (2011)

    Google Scholar 

  13. Schuurman, P., Woeginger, G.J.: A polynomial time approximation scheme for the two-stage multiprocessor flow shop problem. Theoretical Computer Science 237(1), 105–122 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Shmoys, D.B., Tardos, É.: An approximation algorithm for the generalized assignment problem. Mathematical Programming 62, 461–474 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  15. Yoo, D.-J., Sim, K.M.: A comparative review of job scheduling for mapreduce. In: IEEE Proc. of the International Symposium on Cloud Computing and Intelligece Systems, pp. 353–358 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Zois .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Fotakis, D., Milis, I., Papadigenopoulos, O., Zampetakis, E., Zois, G. (2015). Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors. In: Bampis, E. (eds) Experimental Algorithms. SEA 2015. Lecture Notes in Computer Science(), vol 9125. Springer, Cham. https://doi.org/10.1007/978-3-319-20086-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20086-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20085-9

  • Online ISBN: 978-3-319-20086-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics