Mario: Interactive Tuning of Biological Analysis Pipelines Using Iterative Processing

  • Martin Ernstsen
  • Erik Kjærner-Semb
  • Nils Peder Willassen
  • Lars Ailo Bongo
Conference paper

DOI: 10.1007/978-3-319-14325-5_23

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8805)
Cite this paper as:
Ernstsen M., Kjærner-Semb E., Willassen N.P., Bongo L.A. (2014) Mario: Interactive Tuning of Biological Analysis Pipelines Using Iterative Processing. In: Lopes L. et al. (eds) Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8805. Springer, Cham

Abstract

Biological data analysis relies on complex pipelines for cleaning, integrating, and summarizing data before presenting the results to a user. Specifically, biological data analysis is usually implemented as a pipeline that combines many independent tools. During development, it is necessary to tune the pipeline to find the tools and parameters that work well with a particular dataset. However, as the dataset size increases, the pipeline execution time also increases and parameter tuning becomes impractical. No current biological data analysis frameworks enable analysts to interactively tune the parameters of a biological analysis pipelines for large-scale datasets. We present Mario, a system that quickly updates pipeline output data when pipeline parameters are changed. It combines reservoir sampling, fine-grained caching of derived datasets, and an iterative data-parallel processing model. We demonstrate the usability of our approach through a biological use case, and experimentally evaluate the latency, throughput, and resource usage of the Mario system. Mario is open-sourced at bdps.cs.uit.no/code/Mario.

Keywords

Iterative processing interactive processing biological data analysis parameter tuning provenance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Martin Ernstsen
    • 1
  • Erik Kjærner-Semb
    • 2
  • Nils Peder Willassen
    • 2
  • Lars Ailo Bongo
    • 1
  1. 1.Dept. of Computer Science and Center for BioinformaticsUniversity of TromsøNorway
  2. 2.Dept. of Chemistry and Center for BioinformaticsUniversity of TromsøNorway

Personalised recommendations