A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

  • Sven Groot
  • Kazuo Goda
  • Masaru Kitsuregawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5999)

Abstract

In recent years, several frameworks have been developed for processing very large quantities of data on large clusters of commodity PCs. These frameworks have focused on fault-tolerance and scalability. However, when using heterogeneous environments these systems do not offer optimal workload balancing. In this paper we present Jumbo, a distributed computation platform designed to explore possible solutions to this issue.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, Berkeley, CA, USA, p. 10. USENIX Association (2004)Google Scholar
  2. 2.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: SOSP 2003: Proceedings of the nineteenth ACM symposium on Operating systems principles, pp. 29–43. ACM Press, New York (2003)CrossRefGoogle Scholar
  3. 3.
    Apache: Hadoop core, http://hadoop.apache.org/core
  4. 4.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev. 41(3), 59–72 (2007)CrossRefGoogle Scholar
  5. 5.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: RecSys 2008: Proceedings of the 2008 ACM conference on Recommender systems, pp. 107–114. ACM, New York (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Sven Groot
    • 1
  • Kazuo Goda
    • 1
  • Masaru Kitsuregawa
    • 1
  1. 1.University of TokyoTokyoJapan

Personalised recommendations