Ensemble Learning of Run-Time Prediction Models for Data-Intensive Scientific Workflows

  • David A. Monge
  • Matĕj Holec
  • Filip Z̆elezný
  • Carlos García Garino
Conference paper

DOI: 10.1007/978-3-662-45483-1_7

Volume 485 of the book series Communications in Computer and Information Science (CCIS)
Cite this paper as:
Monge D.A., Holec M., Z̆elezný F., García Garino C. (2014) Ensemble Learning of Run-Time Prediction Models for Data-Intensive Scientific Workflows. In: Hernández G. et al. (eds) High Performance Computing. CARLA 2014. Communications in Computer and Information Science, vol 485. Springer, Berlin, Heidelberg

Abstract

Workflow applications for in-silico experimentation involve the processing of large amounts of data. One of the core issues for the efficient management of such applications is the prediction of tasks performance. This paper proposes a novel approach that enables the construction models for predicting task’s running-times of data-intensive scientific workflows. Ensemble Machine Learning techniques are used to produce robust combined models with high predictive accuracy. Information derived from workflow systems and the characteristics and provenance of the data are exploited to guarantee the accuracy of the models. The proposed approach has been tested on Bioinformatics workflows for Gene Expressions Analysis over homogeneous and heterogeneous computing environments. Obtained results highlight the convenience of using ensemble models in comparison with single/standalone prediction models. Ensemble learning techniques permitted reductions of the prediction error up to 24.9% in comparison with single-model strategies.

Keywords

Performance prediction Scientific workflows Ensemble Learning Data Provenance Data-intensive computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • David A. Monge
    • 1
    • 2
  • Matĕj Holec
    • 3
  • Filip Z̆elezný
    • 3
  • Carlos García Garino
    • 1
    • 4
  1. 1.ITIC Research InstituteNational University of Cuyo (UNCuyo)Argentina
  2. 2.Faculty of Exact and Natural SciencesUNCuyoArgentina
  3. 3.IDA Research GroupCzech Technical UniversityCzech Republic
  4. 4.Faculty of EngineeringUNCuyoArgentina