On Task Assignment in Data Intensive Scalable Computing

  • Giovanni Agosta
  • Gerardo PelosiEmail author
  • Ettore Speziale
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8429)


MapReduce and other Data-Intensive Scalable Computing paradigms have emerged as the most popular solution for processing massive data sets, a crucial task in surviving the “Data Deluge”. Recent works have shown that maintaining data locality is paramount to achieve high performance in such paradigms. To this end, suitable task assignment algorithms are needed. Current solutions use round-robin task assignment policies, which was shown to yield suboptimal results. In this paper, we propose and evaluate new algorithms for task assignment on a model of the Hadoop framework, comparing them with state-of-the-art solutions proposed in theoretical works as well as with the current Hadoop polices.


Task Assignment Replication Factor Data Placement Resource Accounting Master Server 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Apache Foundation: Hadoop.
  2. 2.
    Bortnikov, E.: Open-source grid technologies for web-scale computing. SIGACT News 40(2), 87–93 (2009)CrossRefGoogle Scholar
  3. 3.
    Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R.R., Bradshaw, R., Weizenbaum, N.: FlumeJava: easy, efficient data-parallel pipelines. In: PLDI, pp. 363–375 (2010)Google Scholar
  4. 4.
    Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: Evaluating MapReduce performance using workload suites. In: MASCOTS, pp. 390–399 (2011)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)Google Scholar
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  7. 7.
    Fischer, M.J., Su, X., Yin, Y.: Assigning tasks for efficiency in hadoop: extended abstract. In: SPAA, pp. 30–39 (2010)Google Scholar
  8. 8.
    Hey, A.J.G., Trefethen, A.: The data deluge: an e-Science perspective. In: Berman, F., Fox, G.C., Hey, A.J.G. (eds.) Grid Computing-Making the Global Infrastructure a Reality, pp. 809–824. J. Wiley & Sons, New York (2003)Google Scholar
  9. 9.
    Karloff, H.J., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. In: SODA, pp. 938–948 (2010)Google Scholar
  10. 10.
    Kavulya, S., Tan, J., Gandhi, R., Narasimhan, P.: An analysis of traces from a production MapReduce cluster. In: CCGRID, pp. 94–103. IEEE (2010)Google Scholar
  11. 11.
    Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in Map-Reduce and Flow-Shops. In: Rajaraman, R., Meyer auf der Heide, F. (eds.) SPAA, pp. 289–298. ACM (2011)Google Scholar
  12. 12.
    Park, J., Lee, D., Kim, B., Huh, J., Maeng, S.: Locality-aware dynamic VM reconfiguration on MapReduce clouds. In: HPDC, pp. 27–36 (2012)Google Scholar
  13. 13.
    Polo, J., Castillo, C., Carrera, D., Becerra, Y., Whalley, I., Steinder, M., Torres, J., Ayguadé, E.: Resource-aware adaptive scheduling for MapReduce clusters. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 187–207. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  14. 14.
    Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.-L., Balmin, A.: FLEX: a slot allocation scheduling optimizer for MapReduce workloads. In: Gupta, I., Mascolo, C. (eds.) Middleware 2010. LNCS, vol. 6452, pp. 1–20. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  15. 15.
    Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: EuroSys, pp. 265–278 (2010)Google Scholar
  16. 16.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Draves, R., van Renesse, R. (eds.) OSDI, pp. 29–42. USENIX Association (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Giovanni Agosta
    • 1
  • Gerardo Pelosi
    • 1
    Email author
  • Ettore Speziale
    • 1
  1. 1.Dipartimento di Elettronica, Informazione e Bioingegneria – DEIBPolitecnico di MilanoMilanItaly

Personalised recommendations