H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution

  • Petar Jovanovic
  • Oscar Romero
  • Toon Calders
  • Alberto Abelló
Conference paper

DOI: 10.1007/978-3-319-44039-2_21

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9809)
Cite this paper as:
Jovanovic P., Romero O., Calders T., Abelló A. (2016) H-WorD: Supporting Job Scheduling in Hadoop with Workload-Driven Data Redistribution. In: Pokorný J., Ivanović M., Thalheim B., Šaloun P. (eds) Advances in Databases and Information Systems. ADBIS 2016. Lecture Notes in Computer Science, vol 9809. Springer, Cham

Abstract

Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.

Keywords

Data-intensive flows Task scheduling Data locality 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Petar Jovanovic
    • 1
  • Oscar Romero
    • 1
  • Toon Calders
    • 2
    • 3
  • Alberto Abelló
    • 1
  1. 1.Universitat Politècnica de Catalunya, BarcelonaTechBarcelonaSpain
  2. 2.Universite Libre de BruxellesBrusselsBelgium
  3. 3.University of AntwerpAntwerpBelgium

Personalised recommendations