Cluster Computing

, Volume 13, Issue 3, pp 315–333

Parameterized specification, configuration and execution of data-intensive scientific workflows

Authors

    • Dept. of Computer Science and EngineeringOhio State University
  • Tahsin Kurc
    • Center for Comprehensive InformaticsEmory University
  • Varun Ratnakar
    • Information Sciences InstituteUniversity of Southern California
  • Jihie Kim
    • Information Sciences InstituteUniversity of Southern California
  • Gaurang Mehta
    • Information Sciences InstituteUniversity of Southern California
  • Karan Vahi
    • Information Sciences InstituteUniversity of Southern California
  • Yoonju Lee Nelson
    • Information Sciences InstituteUniversity of Southern California
  • P. Sadayappan
    • Dept. of Computer Science and EngineeringOhio State University
  • Ewa Deelman
    • Information Sciences InstituteUniversity of Southern California
  • Yolanda Gil
    • Information Sciences InstituteUniversity of Southern California
  • Mary Hall
    • School of ComputingUniversity of Utah
  • Joel Saltz
    • Center for Comprehensive InformaticsEmory University
Article

DOI: 10.1007/s10586-010-0133-8

Cite this article as:
Kumar, V.S., Kurc, T., Ratnakar, V. et al. Cluster Comput (2010) 13: 315. doi:10.1007/s10586-010-0133-8

Abstract

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework.

Keywords

Scientific workflowPerformance parametersSemantic representationsGridApplication QoS
Download to read the full article text

Copyright information

© Springer Science+Business Media, LLC 2010