Measuring the Effectiveness of Throttled Data Transfers on Data-Intensive Workflows
In data intensive workflows, which often involve files, transfer between tasks is typically accomplished as fast as the network links allow, and once transferred, the files are buffered/stored at their destination. Where a task requires multiple files to execute (from different previous tasks), it must remain idle until all files are available. Hence, network bandwidth and buffer/storage within a workflow are often not used effectively. In this paper, we are quantitatively measuring the impact that applying an intelligent data movement policy can have on buffer/storage in comparison with existing approaches. Our main objective is to propose a metric that considers a workflow structure expressed as a Directed Acyclic Graph (DAG), and performance information collected from historical past executions of the considered workflow. This metric is intended for use at the design-stage, to compare various DAG structures and evaluate their potential for optimisation (of network bandwidth and buffer use).
KeywordsDirected Acyclic Graph Network Bandwidth Performance Information Input Place Synchronisation Point
Unable to display preview. Download preview PDF.
- 1.Park, S.M., Humphrey, M.: Data Throttling for Data-Intensive Workflows. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–11 (April 2008)Google Scholar
- 2.van der Aalst, W., van Hee, K.: Workflow Management: Models, Methods, and Systems. MIT Press Books, vol. 1. The MIT Press (2004)Google Scholar
- 5.Yu, J., Buyya, R.: A Taxonomy of Workflow Management Systems for Grid Computing. CoRR 34(3), 44–49 (2005)Google Scholar
- 6.Oinn, T., Greenwood, M., Addis, M., Alpdemir, M.N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M.R., Senger, M., Stevens, R., Wipat, A., Wroe, C.: Taverna: lessons in creating a workflow environment for the life sciences: Research Articles. Concurr. Comput.: Pract. Exper. 18(10), 1067–1100 (2006)CrossRefGoogle Scholar
- 7.Deelman, E., Mehta, G., Singh, G., Su, M., Vahi, K.: Pegasus: Mapping Large-Scale Workflows to Distributed Resources. In: Workflows for eScience, pp. 376–394. Springer (2007)Google Scholar
- 8.Rodríguez, R.J., Tolosana-Calasanz, R., Rana, O.F.: Automating Data-Throttling Analysis for Data-Intensive Workflows. In: Proceedings of CCGrid (accepted for publication, 2012)Google Scholar
- 14.Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experiments. In: 10th IEEE International Conference on Computer Modeling and Simulation (March 2008)Google Scholar