A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
In recent years, several frameworks have been developed for processing very large quantities of data on large clusters of commodity PCs. These frameworks have focused on fault-tolerance and scalability. However, when using heterogeneous environments these systems do not offer optimal workload balancing. In this paper we present Jumbo, a distributed computation platform designed to explore possible solutions to this issue.
KeywordsExecution Time Intermediate Data Workload Balance Speculative Execution MapReduce Model
Unable to display preview. Download preview PDF.
- 1.Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, Berkeley, CA, USA, p. 10. USENIX Association (2004)Google Scholar
- 3.Apache: Hadoop core, http://hadoop.apache.org/core