Scientific Workflow Partitioning in Multisite Cloud
Abstract
Scientific workflows allow scientists to conduct experiments that manipulate data with multiple computational activities using Scientific Workflow Management Systems (SWfMSs). As the scale of the data increases, SWfMSs need to support workflow execution in High Performance Computing (HPC) environments. Because of various benefits, cloud emerges as an appropriate infrastructure for workflow execution. However, it is difficult to execute some scientific workflows in one cloud site because of geographical distribution of scientists, data and computing resources. Therefore, a scientific workflow often needs to be partitioned and executed in a multisite environment. Also, SWfMSs generally execute a scientific workflow in parallel within one site. This paper proposes a non-intrusive approach to execute scientific workflows in a multisite cloud with three workflow partitioning techniques. We describe an experimental validation using an adaptation of Chiron SWfMS for Microsoft Azure multisite cloud. The experiment results reveal the efficiency of our partitioning techniques, and their superiority in different environments.
Keywords
scientific workflow scientific workflow management system workflow partitioning parallel execution multisite cloudPreview
Unable to display preview. Download preview PDF.
References
- 1.DBLP Computer Science Bibliography, http://dblp.uni-trier.de/
- 2.Microsoft Azure, http://azure.microsoft.com
- 3.VM bandwidth in Azure, http://windowsazureguide.net/tag/auzre-virtual-machines-sizes-bandwidth/
- 4.VM parameters in Azure, http://msdn.microsoft.com/en-us/library/azure/dn197896.aspx
- 5.Anglano, C., Canonico, M.: Scheduling algorithms for multiple bag-of-task applications on desktop grids: A knowledge-free approach. In: 22nd IEEE Int. Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–8 (2008)Google Scholar
- 6.Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Mandal, A., Kennedy, K.: Task scheduling strategies for workflow-based applications in grids. In: 5th IEEE Int. Symposium on Cluster Computing and the Grid (CCGrid), pp. 759–767 (2005)Google Scholar
- 7.Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part II. LNCS, vol. 7204, pp. 11–20. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 8.Costa, F., Silva, V., de Oliveira, D., Ocaña, K.A.C.S., Ogasawara, E.S., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with prov: a practical approach. In: EDBT/ICDT Workshops, pp. 282–289 (2013)Google Scholar
- 9.Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems 25(5), 528–540 (2009)CrossRefGoogle Scholar
- 10.Deng, K., Kong, L., Song, J., Ren, K., Yuan, D.: A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments. In: IEEE 9th Int. Conf. on Dependable, Autonomic and Secure Computing (DASC), pp. 547–554 (2011)Google Scholar
- 11.Dias, J., Ogasawara, E.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Algebraic dataflows for big data analysis. In: IEEE Int. Conf. on Big Data, pp. 150–155 (2013)Google Scholar
- 12.Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE Conf. on Supercomputing, pp. 1–13 (1998)Google Scholar
- 13.Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: 8th Heterogeneous Computing Workshop, p. 30 (1999)Google Scholar
- 14.Ogasawara, E., Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. Proceedings of the VLDB Endowment (PVLDB) 4(12), 1328–1339 (2011)Google Scholar
- 15.Ogasawara, E.S., Dias, J., Silva, V., Chirigati, F.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Chiron: a parallel engine for algebraic scientific workflows. Concurrency and Computation: Practice and Experience 25(16), 2327–2341 (2013)CrossRefGoogle Scholar
- 16.Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer (2011)Google Scholar
- 17.Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 12th IEEE/ACM Int. Symposium on Cluster, Cloud and Grid Computing (Ccgrid), pp. 65–72 (2012)Google Scholar
- 18.Tarapanoff, K., Quoniam, L., Henrique, R., de Araújo, J., Alvares, L.: Intelligence obtained by applying data mining to a database of french theses on the subject of brazil. Information Research 7(1) (2001)Google Scholar
- 19.Topcuouglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. on Parallel and Distributed Systems 13(3), 260–274 (2002)CrossRefGoogle Scholar