StarFlow: A Script-Centric Data Analysis Environment

  • Elaine Angelino
  • Daniel Yamins
  • Margo Seltzer
Conference paper

DOI: 10.1007/978-3-642-17819-1_27

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6378)
Cite this paper as:
Angelino E., Yamins D., Seltzer M. (2010) StarFlow: A Script-Centric Data Analysis Environment. In: McGuinness D.L., Michaelis J.R., Moreau L. (eds) Provenance and Annotation of Data and Processes. IPAW 2010. Lecture Notes in Computer Science, vol 6378. Springer, Berlin, Heidelberg

Abstract

We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language. We describe real applications of StarFlow, including automatic parallelization of complex workflows in the cloud.

Keywords

automatic parallelization automatic updating computational workflows control flow data-flow data analysis dependency tracking provenance Python workflow abstraction 

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Elaine Angelino
    • 1
  • Daniel Yamins
    • 1
  • Margo Seltzer
    • 1
  1. 1.School of Engineering and Applied SciencesHarvard UniversityCambridgeUSA

Personalised recommendations