Scientific Workflow Scheduling with Provenance Support in Multisite Cloud

  • Ji LiuEmail author
  • Esther Pacitti
  • Patrick Valduriez
  • Marta Mattoso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10150)


Recently, some Scientific Workflow Management Systems (SWfMSs) with provenance support (e.g. Chiron) have been deployed in the cloud. However, they typically use a single cloud site. In this paper, we consider a multisite cloud, where the data and computing resources are distributed at different sites (possibly in different regions). Based on a multisite architecture of SWfMS, i.e. multisite Chiron, we propose a multisite task scheduling algorithm that considers the time to generate provenance data. We performed an extensive experimental evaluation of our algorithm using Microsoft Azure multisite cloud and two real-life scientific workflows (Buzz and Montage). The results show that our scheduling algorithm is up to 49,6% better than baseline algorithms in terms of total execution time.


Scientific workflow Scientific workflow management system Scheduling Parallel execution Multisite cloud 



Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft (ZcloudFlow project) and performed in the context of the Computational Biology Institute ( We would like to thank Weiwei Chen and Pegasus project for the help in modeling and executing the Montage SWf.


  1. 1.
    Microsoft Azure.
  2. 2.
  3. 3.
    Parameters of different types of vms in microsoft Azure.
  4. 4.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)Google Scholar
  5. 5.
    Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRefGoogle Scholar
  6. 6.
    Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2008)Google Scholar
  7. 7.
    Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)Google Scholar
  8. 8.
    Dias, J., Ogasawara, E.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Algebraic dataflows for big data analysis. In: IEEE International Conference on Big Data, pp. 150–155 (2013)Google Scholar
  9. 9.
    Duan, R., Prodan, R., Li, X.: Multi-objective game theoretic scheduling of bag-of-tasks workflows on hybrid clouds. IEEE Trans. Cloud Comput. 2(1), 29–42 (2014)CrossRefGoogle Scholar
  10. 10.
    Etminani, K., Naghibzadeh, M.: A min-min max-min selective algorihtm for grid task scheduling. In: The Third IEEE/IFIP International Conference in Central Asia on Internet (ICI 2007), pp. 1–7 (2007)Google Scholar
  11. 11.
    Liu, J., Pacitti, E., Valduriez, P., de Oliveira, D., Mattoso, M.: Multi-objective scheduling of scientific workflows in multisite clouds. Future Gener. Comput. Syst. 63, 76–95 (2016)CrossRefGoogle Scholar
  12. 12.
    Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 1–37 (2015)Google Scholar
  13. 13.
    Liu, J., Silva, V., Pacitti, E., Valduriez, P., Mattoso, M.: Scientific workflow partitioning in multisite cloud. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 105–116. Springer, Cham (2014). doi: 10.1007/978-3-319-14325-5_10 Google Scholar
  14. 14.
    Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: 8th Heterogeneous Computing Workshop, p. 30 (1999)Google Scholar
  15. 15.
    Ogasawara, E.S., Dias, J., Silva, V., Chirigati, F.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Chiron: a parallel engine for algebraic scientific workflows. Concurr. Comput. Pract. Exp. 25(16), 2327–2341 (2013)CrossRefGoogle Scholar
  16. 16.
    Pineda-Morales, L., Costan, A., Antoniu, G.: Towards multi-site metadata management for geographically distributed cloud workflows. In: 2015 IEEE International Conference on Cluster Computing, (CLUSTER), pp. 294–303 (2015)Google Scholar
  17. 17.
    Smanchat, S., Indrawan, M., Ling, S., Enticott, C., Abramson, D.: Scheduling multiple parameter sweep workflow instances on the grid. In: 5th IEEE International Conference on E-Science, pp. 300–306 (2009)Google Scholar
  18. 18.
    Topcuouglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRefGoogle Scholar
  19. 19.
    Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Rec. 34(3), 56–62 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ji Liu
    • 1
    Email author
  • Esther Pacitti
    • 1
  • Patrick Valduriez
    • 1
  • Marta Mattoso
    • 2
  1. 1.Inria, Microsoft-Inria Joint Centre, LIRMM and University of MontpellierMontpellierFrance
  2. 2.COPPEFederal University of Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations