StarFlow: A Script-Centric Data Analysis Environment
- Cite this paper as:
- Angelino E., Yamins D., Seltzer M. (2010) StarFlow: A Script-Centric Data Analysis Environment. In: McGuinness D.L., Michaelis J.R., Moreau L. (eds) Provenance and Annotation of Data and Processes. IPAW 2010. Lecture Notes in Computer Science, vol 6378. Springer, Berlin, Heidelberg
We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions enabling robust parallel executions of complex analysis pipelines, and (4) a seamless interface with the Python scripting language. We describe real applications of StarFlow, including automatic parallelization of complex workflows in the cloud.