Advertisement

Scientific Workflow Partitioning in Multisite Cloud

  • Ji Liu
  • Vítor Silva
  • Esther Pacitti
  • Patrick Valduriez
  • Marta Mattoso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8805)

Abstract

Scientific workflows allow scientists to conduct experiments that manipulate data with multiple computational activities using Scientific Workflow Management Systems (SWfMSs). As the scale of the data increases, SWfMSs need to support workflow execution in High Performance Computing (HPC) environments. Because of various benefits, cloud emerges as an appropriate infrastructure for workflow execution. However, it is difficult to execute some scientific workflows in one cloud site because of geographical distribution of scientists, data and computing resources. Therefore, a scientific workflow often needs to be partitioned and executed in a multisite environment. Also, SWfMSs generally execute a scientific workflow in parallel within one site. This paper proposes a non-intrusive approach to execute scientific workflows in a multisite cloud with three workflow partitioning techniques. We describe an experimental validation using an adaptation of Chiron SWfMS for Microsoft Azure multisite cloud. The experiment results reveal the efficiency of our partitioning techniques, and their superiority in different environments.

Keywords

scientific workflow scientific workflow management system workflow partitioning parallel execution multisite cloud 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    DBLP Computer Science Bibliography, http://dblp.uni-trier.de/
  2. 2.
    Microsoft Azure, http://azure.microsoft.com
  3. 3.
  4. 4.
  5. 5.
    Anglano, C., Canonico, M.: Scheduling algorithms for multiple bag-of-task applications on desktop grids: A knowledge-free approach. In: 22nd IEEE Int. Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–8 (2008)Google Scholar
  6. 6.
    Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Mandal, A., Kennedy, K.: Task scheduling strategies for workflow-based applications in grids. In: 5th IEEE Int. Symposium on Cluster Computing and the Grid (CCGrid), pp. 759–767 (2005)Google Scholar
  7. 7.
    Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part II. LNCS, vol. 7204, pp. 11–20. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Costa, F., Silva, V., de Oliveira, D., Ocaña, K.A.C.S., Ogasawara, E.S., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with prov: a practical approach. In: EDBT/ICDT Workshops, pp. 282–289 (2013)Google Scholar
  9. 9.
    Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems 25(5), 528–540 (2009)CrossRefGoogle Scholar
  10. 10.
    Deng, K., Kong, L., Song, J., Ren, K., Yuan, D.: A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments. In: IEEE 9th Int. Conf. on Dependable, Autonomic and Secure Computing (DASC), pp. 547–554 (2011)Google Scholar
  11. 11.
    Dias, J., Ogasawara, E.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Algebraic dataflows for big data analysis. In: IEEE Int. Conf. on Big Data, pp. 150–155 (2013)Google Scholar
  12. 12.
    Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE Conf. on Supercomputing, pp. 1–13 (1998)Google Scholar
  13. 13.
    Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: 8th Heterogeneous Computing Workshop, p. 30 (1999)Google Scholar
  14. 14.
    Ogasawara, E., Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. Proceedings of the VLDB Endowment (PVLDB) 4(12), 1328–1339 (2011)Google Scholar
  15. 15.
    Ogasawara, E.S., Dias, J., Silva, V., Chirigati, F.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Chiron: a parallel engine for algebraic scientific workflows. Concurrency and Computation: Practice and Experience 25(16), 2327–2341 (2013)CrossRefGoogle Scholar
  16. 16.
    Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer (2011)Google Scholar
  17. 17.
    Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 12th IEEE/ACM Int. Symposium on Cluster, Cloud and Grid Computing (Ccgrid), pp. 65–72 (2012)Google Scholar
  18. 18.
    Tarapanoff, K., Quoniam, L., Henrique, R., de Araújo, J., Alvares, L.: Intelligence obtained by applying data mining to a database of french theses on the subject of brazil. Information Research 7(1) (2001)Google Scholar
  19. 19.
    Topcuouglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. on Parallel and Distributed Systems 13(3), 260–274 (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ji Liu
    • 1
    • 2
    • 3
    • 5
  • Vítor Silva
    • 4
  • Esther Pacitti
    • 2
    • 3
    • 5
  • Patrick Valduriez
    • 1
    • 2
    • 5
  • Marta Mattoso
    • 4
  1. 1.Microsoft-Inria Joint CentreParisFrance
  2. 2.LIRMMMontpellierFrance
  3. 3.University Montpellier 2France
  4. 4.COPPE/UFRJRio de JaneiroBrazil
  5. 5.InriaMontpellierFrance

Personalised recommendations