Modeling and Evaluating MID1 ICAL Pipeline on Spark
- 2.4k Downloads
Squire Kilometre Array (SKA) project generates almost the hugest data volume in the world. SKA data flow pipelines need almost real-time processing ability, which brings huge challenges to the execution frameworks (EF for short). We propose a cost model for a typical SKA data flow pipeline named as MID1 ICAL pipeline on Spark. By simulating the I/O of MID1 ICAL pipeline with a reduced SKA data, we evaluate several different implementations of MID1 ICAL pipeline and conclude the optimized method for this pipeline on Spark.
- 3.Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, Seattle, WA, USA, 03–05 November 2014, pp. 6:1–6:15 (2014)Google Scholar
- 4.Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, 25–27 April 2012, pp. 15–28 (2012)Google Scholar