Journal of Grid Computing

, Volume 11, Issue 3, pp 407–428

Managing and Optimizing Bioinformatics Workflows for Data Analysis in Clouds

  • Vincent C. Emeakaroha
  • Michael Maurer
  • Patrick Stern
  • Paweł P. Łabaj
  • Ivona Brandic
  • David P. Kreil
Article

DOI: 10.1007/s10723-013-9260-9

Cite this article as:
Emeakaroha, V.C., Maurer, M., Stern, P. et al. J Grid Computing (2013) 11: 407. doi:10.1007/s10723-013-9260-9

Abstract

The rapid advancements in recent years of high-throughput technologies in the life sciences are facilitating the generation and storage of huge amount of data in different databases. Despite significant developments in computing capacity and performance, an analysis of these large-scale data in a search for biomedical relevant patterns remains a challenging task. Scientific workflow applications are deemed to support data-mining in more complex scenarios that include many data sources and computational tools, as commonly found in bioinformatics. A scientific workflow application is a holistic unit that defines, executes, and manages scientific applications using different software tools. Existing workflow applications are process- or data- rather than resource-oriented. Thus, they lack efficient computational resource management capabilities, such as those provided by Cloud computing environments. Insufficient computational resources disrupt the execution of workflow applications, wasting time and money. To address this issue, advanced resource monitoring and management strategies are required to determine the resource consumption behaviours of workflow applications to enable a dynamical allocation and deallocation of resources. In this paper, we present a novel Cloud management infrastructure consisting of resource level-, application level monitoring techniques, and a knowledge management strategy to manage computational resources for supporting workflow application executions in order to guarantee their performance goals and their successful completion. We present the design description of these techniques, demonstrate how they can be applied to scientific workflow applications, and present detailed evaluation results as a proof of concept.

Keywords

Workflow execution Resource level monitoring Application level monitoring Workflow management Knowledge database Cloud computing 

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Vincent C. Emeakaroha
    • 1
  • Michael Maurer
    • 1
  • Patrick Stern
    • 1
  • Paweł P. Łabaj
    • 2
  • Ivona Brandic
    • 1
  • David P. Kreil
    • 2
  1. 1.Information Systems InstituteVienna University of TechnologyViennaAustria
  2. 2.Chair of BioinformaticsBoku University ViennaViennaAustria