Journal of Grid Computing

, Volume 13, Issue 4, pp 457–493 | Cite as

A Survey of Data-Intensive Scientific Workflow Management

  • Ji Liu
  • Esther Pacitti
  • Patrick Valduriez
  • Marta Mattoso
Article

Abstract

Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.

Keywords

Scientific workflow Scientific workflow management system Grid Cloud Multisite cloud Distributed and parallel data management Scheduling Parallelization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amazon cloud (2015). http://aws.amazon.com/
  2. 2.
    Grid’5000 project (2015). https://www.grid5000.fr/mediawiki/index.php
  3. 3.
    Microsoft Azure cloud (2015). http://azure.microsoft.com/
  4. 4.
    Pegasus 4.4.1 user guide (2015). https://pegasus.isi.edu/wms/docs/latest/
  5. 5.
    Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: Integrating taverna and galaxy workflows with cloud computing support. BMC Bioinforma. 13(1), 77 (2012)CrossRefGoogle Scholar
  6. 6.
    Afgan, E., Baker, D., Coraor, N., Chapman, B., Nekrutenko, A., Taylor, J.: Galaxy cloudman: delivering cloud compute clusters. BMC Bioinforma. 11(Suppl 12), S4 (2010)CrossRefGoogle Scholar
  7. 7.
    Albrecht, M., Donnelly, P., Bui, P., Thain, D.: Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids. In: 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, pp. 1:1–1:13 (2012)Google Scholar
  8. 8.
    Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the kepler scientific workflow system. In: International Conference on Provenance and Annotation of Data, pp. 118–132 (2006)Google Scholar
  9. 9.
    Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 423–424 (2004)Google Scholar
  10. 10.
    Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S.: Kepler: Towards a Grid-Enabled system for scientific workflows. The Workflow in Grid Systems Workshop in GGF10-The 10th Global Grid Forum (2004)Google Scholar
  11. 11.
    Anglano, C., Canonico, M.: Scheduling algorithms for multiple bag-of-task applications on desktop grids: A knowledge-free approach. In: 22nd IEEE Int. Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–8 (2008)Google Scholar
  12. 12.
    Balaskó, Á. Workflow concept of ws-pgrade/guse. In: Kacsuk, P. (ed.) : Science Gateways for Distributed Computing Infrastructures, pp. 33–50. Springer International Publishing (2014)Google Scholar
  13. 13.
    Barker, A., Hemert, J.V.: Scientific workflow: A survey and research directions. In: 7th Int. Conf. on Parallel Processing and Applied Mathematics, pp. 746–753 (2008)Google Scholar
  14. 14.
    Belhajjame, K., Cresswell, S., Gil, Y., Golden, R., Groth, P., Klyne, G., McCusker, J., Miles, S., Myers, J., Sahoo, S.: The prov data model and abstract syntax notation (2011). http://www.w3.org/TR/2011/WD-prov-dm-20111215/
  15. 15.
    Bergmann, R., Gil, Y.: Retrieval of semantic workflows with knowledge intensive similarity measures. In: 19th International Conference on Case-Based Reasoning Research and Development, pp. 17–31 (2011)Google Scholar
  16. 16.
    Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Mandal, A., Kennedy, K.: Task scheduling strategies for workflow-based applications in grids. In: 5th IEEE Int. Symposium on Cluster Computing and the Grid (CCGrid), pp. 759–767 (2005)Google Scholar
  17. 17.
    Bouganim, L., Fabret, F., Mohan, C., Valduriez, P.: Dynamic query scheduling in data integration systems. In: International Conference on Data Engineering (ICDE), pp. 425–434 (2000)Google Scholar
  18. 18.
    Brandic, I., Dustdar, S.: Grid vs cloud - A technology comparison. IT - Inf. Technol. 53(4), 173–179 (2011)CrossRefGoogle Scholar
  19. 19.
    Bux, M., Leser, U.: Parallelization in scientific workflow management systems. The Computing Research Repository (CoRR), abs/1303.7195 (2013)Google Scholar
  20. 20.
    Carpenter, B., Getov, V., Judd, G., Skjellum, A., Mpj, G. Fox.: Mpi-like message passing for java. Concurrency and Computation: Practice and Experience 12(11), 1019–1038 (2000)CrossRefMATHGoogle Scholar
  21. 21.
    Chen, W., Deelman, E.: Integration of workflow partitioning and resource provisioning. In: IEEE/ACM Int. Symposium on Cluster Computing and the Grid (CCGRID), pp. 764–768 (2012)Google Scholar
  22. 22.
    Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: 9th Int. Conf. on Parallel Processing and Applied Mathematics - Volume Part II, vol. 7204, pp. 11–20 (2012)Google Scholar
  23. 23.
    Chen, W., Silva, R.D., Deelman, E., Sakellariou, R.: Balanced task clustering in scientific workflows. In: IEEE 9th Int. Conf. on e-Science, pp. 188–195 (2013)Google Scholar
  24. 24.
    Chervenak, A. L., Smith, D. E., Chen, W., Deelman, E.: Integrating policy with scientific workflow management for data-intensive applications. In: Supercomputing (SC) Companion: High Performance Computing, Networking Storage and Analysis, pp. 140–149 (2012)Google Scholar
  25. 25.
    Chirigati, F., Silva, V., Ogasawara, E., de Oliveira, D., Dias, J., Porto, F., Valduriez, P., Mattoso, M.: Evaluating parameter sweep workflows in high performance computing. In: 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, pp. 2:1–2:10 (2012)Google Scholar
  26. 26.
    Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., Stoica, I.: Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communications 41(4), 98–109 (2011)CrossRefGoogle Scholar
  27. 27.
    Coalition, W.M.: Workflow management coalition terminology and glossary (1999)Google Scholar
  28. 28.
    Cohen-Boulakia, S., Chen, J., Missier, P., Goble, C.A., Williams, A.R., Froidevaux, C.: Distilling structure in taverna scientific workflows: a refactoring approach. BMC Bioinformatics 15(S-1), S12 (2014)CrossRefGoogle Scholar
  29. 29.
    Costa, F., de Oliveira, D., Ocala, K., Ogasawara, E., Dias, J., Mattoso, M.: Handling failures in parallel scientific workflows using clouds. In: Supercomputing (SC) Companion: High Performance Computing, Networking Storage and Analysis, pp. 129–139 (2012)Google Scholar
  30. 30.
    Costa, F., Silva, V., de Oliveira, D., Ocaña, K.A.C.S., Ogasawara, E.S., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with prov: a practical approach. In: EDBT/ICDT Workshops, pp. 282–289 (2013)Google Scholar
  31. 31.
    Crawl, D., Wang, J., Altintas, I.: Provenance for mapreduce-based data-intensive workflows. In: 6th Workshop on Workflows in Support of Large-scale Science, pp. 21–30 (2011)Google Scholar
  32. 32.
    Critchlow, T., Jr, G.C.: Supercomputing and scientific workflows gaps and requirements. In: World Congress on Services, pp. 208–211 (2011)Google Scholar
  33. 33.
    de Oliveira, D., Ocaña, K.A.C.S., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10(3), 521–552 (2012)CrossRefGoogle Scholar
  34. 34.
    de Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: Scicumulus: A lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: 3rd Int. Conf. on Cloud Computing (CLOUD), pp. 378–385 (2010)Google Scholar
  35. 35.
    de Oliveira, D., Ogasawara, E., Ocaña, K., Baião, F., Mattoso, M.: An adaptive parallel execution strategy for cloud-based scientific workflows. Concurrency and Computation: Practice & Experience 24(13), 1531–1550 (2012)CrossRefGoogle Scholar
  36. 36.
    de Oliveira, D., Viana, V., Ogasawara, E., Ocana, K., Mattoso, M.: Dimensioning the virtual cluster for parallel scientific workflows in clouds. In: 4th ACM Workshop on Scientific Cloud Computing, pp. 5–12 (2013)Google Scholar
  37. 37.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)Google Scholar
  38. 38.
    Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Futur. Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRefGoogle Scholar
  39. 39.
    Deelman, E., Juve, G., Berriman, G.B.: Using clouds for science, is it just kicking the can down the road? In: Cloud Computing and Services Science (CLOSER), 2nd Int. Conf. on Cloud Computing and Services Science, pp. 127–134 (2012)Google Scholar
  40. 40.
    Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: The montage example. In: ACM/IEEE Conf. on High Performance Computing, pp. 1–12 (2008)Google Scholar
  41. 41.
    Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)Google Scholar
  42. 42.
    Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., Silva, R.F.d., Livny, M., Wenger, K.: Pegasus: a workflow management system for science automation. Futur. Gener. Comput. Syst. (2014)Google Scholar
  43. 43.
    Deng, K., Kong, L., Song, J., Ren, K., Yuan, D.: A weighted k-means clustering based co-scheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments. In: IEEE 9th Int. Conf. on Dependable, Autonomic and Secure Computing (DASC), pp. 547–554 (2011)Google Scholar
  44. 44.
    Dias, J., de Oliveira, D., Mattoso, M., Ocana, K.A.C.S., Ogasawara, E.: Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow. In: IEEE 8th Int. Conf. on E-Science (e-Science), pp. 1–8 (2012)Google Scholar
  45. 45.
    Dias, J., Ogasawara, E.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Algebraic dataflows for big data analysis. In: IEEE Int. Conf. on Big Data, pp. 150–155 (2013)Google Scholar
  46. 46.
    Duan, R., Prodan, R., Li, X.: Multi-objective game theoretic schedulingof bag-of-tasks workflows on hybrid clouds. IEEE Transactions on Cloud Computing 2(1), 29–42 (2014)CrossRefGoogle Scholar
  47. 47.
    Fahringer, T., Prodan, R., Duan, R., Hofer, J., Nadeem, F., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H., Villazon, A., Wieczorek, M.: Askalon: A development and grid computing environment for scientific workflows. In: Workflows for e-Science, pp. 450–471. Springer (2007)Google Scholar
  48. 48.
    Fard, H.M., Fahringer, T., Prodan, R.: Budget-constrained resource provisioning for scientific applications in clouds. In: IEEE 5th Int. Conf. on Cloud Computing Technology and Science (CloudCom), vol. 1, pp. 315–322 (2013)Google Scholar
  49. 49.
    Fard, H.M., Prodan, R., Fahringer, T.: Multi-objective list scheduling of workflow applications in distributed computing infrastructures. J. Parallel Distrib. Comput. 74(3), 2152–2165 (2014)CrossRefMATHGoogle Scholar
  50. 50.
    Farkas, Z., Hajnal, Á., Kacsuk, P.: Ws-pgrade/guse and clouds. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp. 97–109. Springer International Publishing (2014)Google Scholar
  51. 51.
    Felsenstein, J.: Phylip - phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989)Google Scholar
  52. 52.
    Foster, I., Kesselman, C.: The Grid 2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers Inc. (2003)Google Scholar
  53. 53.
    Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Computing in Science and Engineering 10(3), 11–21 (2008)CrossRefGoogle Scholar
  54. 54.
    Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-g: a computation management agent for multi-institutional grids. In: 10th IEEE Int. Symposium on High Performance Distributed Computing, pp. 55–63 (2001)Google Scholar
  55. 55.
    Gadelha Jr., L.M.R., Wilde, M., Mattoso, M., Foster, I.: Provenance traces of the swift parallel scripting system, pp. 325–326 (2013)Google Scholar
  56. 56.
    Ganga, K., Karthik, S.: A fault tolerent approach in scientific workflow systems based on cloud computing. In: Int. Conf. on Pattern Recognition, Informatics and Mobile Engineering (PRIME), pp. 387–390 (2013)Google Scholar
  57. 57.
    Garijo, D., Alper, P., Belhajjame, K., Corcho, Ó., Gil, Y., Goble, C.A.: Common motifs in scientific workflows: An empirical analysis. Futur. Gener. Comput. Syst. 36, 338–351 (2014)CrossRefGoogle Scholar
  58. 58.
    Gesing, S., Krüger, J., Grunzke, R., de la Garza, L., Herres-Pawlis, S., Hoffmann, A.: Molecular simulation grid (mosgrid): A science gateway tailored to the molecular simulation community. In: Kacsuk, P. (ed.) : Science Gateways for Distributed Computing Infrastructures, pp. 151–165. Springer International Publishing (2014)Google Scholar
  59. 59.
    Gil, Y., Kim, J., Ratnakar, V., Deelman, E.: Wings for pegasus: A semantic approach to creating very large scientific workflows. In: OWLED*06 Workshop on OWL: Experiences and Directions, vol. 216 (2006)Google Scholar
  60. 60.
    Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), 1–13 (2010)CrossRefGoogle Scholar
  61. 61.
    Goecks, J., Nekrutenko, A., Taylor, J.: Lessons learned from galaxy, a web-based platform for high-throughput genomic analyses. In: IEEE Int. Conf. on E-Science, e-Science, pp. 1–6 (2012)Google Scholar
  62. 62.
    Gonçalves, J.A.R., Oliveira, D., Ocaña, K., Ogasawara, E., Mattoso, M.: Using domain-specific data to enhance scientific workflow steering queries. In: Provenance and Annotation of Data and Processes, vol. 7525, pp. 152–167 (2012)Google Scholar
  63. 63.
    Görlach, K., Sonntag, M., Karastoyanova, D., Leymann, F., Reiter, M.: Conventional workflow technology for scientific simulation. In: Guide to e-Science, pp. 323–352 (2011)Google Scholar
  64. 64.
    Gottdank, T.: Introduction to the ws-pgrade/guse science gateway framework. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp.. 19–32. Springer International Publishing (2014)Google Scholar
  65. 65.
    Gu, Y., Wu, C., Liu, X., Yu, D.: Distributed throughput optimization for large-scale scientific workflows under fault-tolerance constraint. Journal of Grid Computing 11(3), 361–379 (2013)CrossRefGoogle Scholar
  66. 66.
    Gunter, D., Deelman, E., Samak, T., Brooks, C., Goode, M., Juve, G., Mehta, G., Moraes, P., Silva, F., Swany, M., Vahi, K.: Online workflow management and performance analysis with stampede. In: 7th Int. Conf. on Network and Service Management (CNSM), pp. 1–10 (2011)Google Scholar
  67. 67.
    Hategan, M., Wozniak, J., Maheshwari, K.: Coasters: Uniform resource provisioning and access for clouds and grids. In. In: 4th IEEE Int. Conf. on Utility and Cloud Computing, pp. 114–121 (2011)Google Scholar
  68. 68.
    Hernández, F., Fahringer, T.: Towards workflow sharing and reusein the askalon grid environment. In: Proceedings of Cracow Grid Workshops (CGW), pp. 111–119 (2008)Google Scholar
  69. 69.
    Holl, S., Zimmermann, O., Hofmann-Apitius, M.: A new optimization phase for scientific workflow management systems. In: 8th IEEE Int. Conf. on E-Science, pp. 1–8 (2012)Google Scholar
  70. 70.
    Horta, F., Dias, J., Ocana, K., de Oliveira, D., Ogasawara, E., Mattoso, M.: Abstract: Using provenance to visualize data from large-scale experiments. In: Supercomputing (SC): High Performance Computing, Networking Storage and Analysis, pp. 1418–1419 (2012)Google Scholar
  71. 71.
    Huedo, E., Montero, R.S., Llorente, I. M.: A framework for adaptive execution in grids. Software - Practice and Experience (SPE) 34(7), 631–651 (2004)CrossRefGoogle Scholar
  72. 72.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: 2nd ACM SIGOPS/EuroSys European Conf. on Computer Systems, pp. 59–72 (2007)Google Scholar
  73. 73.
    Jackson, K.: OpenStack Cloud Computing Cookbook. Packt Publishing (2012)Google Scholar
  74. 74.
    Jacob, J.C., Katz, D.S., Berriman, G.B., Good, J.C., Laity, A.C., Deelman, E., Kesselman, C., Singh, G., Su, M.-H., Prince, T.A., Williams, R.: Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking. Int. J. Comput. Sci. Eng. 4(2), 73–87 (2009)CrossRefGoogle Scholar
  75. 75.
    Juve, G., Deelman, E.: Scientific workflows in the cloud. In: Grids, Clouds and Virtualization, pp. 71–91. Springer (2011)Google Scholar
  76. 76.
    Juve, G., Deelman, E.: Wrangler: Virtual cluster provisioning for the cloud. In: 20th Int. Symposium on High Performance Distributed Computing, pp. 277–278 (2011)Google Scholar
  77. 77.
    Kacsuk, P.: P-grade portal family for grid infrastructures. Concurrency and Computation: Practice and Experience 23(3), 235–245 (2011)CrossRefGoogle Scholar
  78. 78.
    Kacsuk, P., Farkas, Z., Kozlovszky, M., Hermann, G., Balasko, A., Karoczkai, K., Marton, I.: Ws-pgrade/guse generic dci gateway framework for a large variety of user communities. J. Grid Comput. 10(4), 601–630 (2012)CrossRefGoogle Scholar
  79. 79.
    Karuna, K., Mangala, N., Janaki, C., Shashi, S., Subrata, C.: Galaxy workflow integration on garuda grid. In: IEEE Int. Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 194–196 (2012)Google Scholar
  80. 80.
    Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE Conf. on Supercomputing, pp. 1–13 (1998)Google Scholar
  81. 81.
    Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings-pegasus system. Concurrency and Computation: Practice and Experience 20, 587–597 (2008)CrossRefGoogle Scholar
  82. 82.
    Kiss, T., Kacsuk, P., Lovas, R., Balaskó, Á., Spinuso, A., Atkinson, M., D’Agostino, D., Danovaro, E., Schiffers, M. Ws-pgrade/guse in european projects. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp.. 235–254. Springer International Publishing (2014)Google Scholar
  83. 83.
    Kiss, T., Kacsuk, P., Takács, E., Szabó, Á., Tihanyi, P., Taylor, S.: Commercial use of ws-pgrade/guse. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp.. 271–286. Springer International Publishing (2014)Google Scholar
  84. 84.
    Kocair, Ç., Şener, C., Akkaya, A. Statistical seismology science gateway. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp. 167–180. Springer International Publishing (2014)Google Scholar
  85. 85.
    Korf, I., Yandell, M., Bedell, J.A.: BLAST - an essential guide to the basic local alignment search tool. O’Reilly (2003)Google Scholar
  86. 86.
    Kozlovszky, M., Karóczkai, K., Márton, I., Kacsuk, P., Gottdank, T.: Dci bridge: Executing ws-pgrade workflows in distributed computing infrastructures. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp. 51–67. Springer International Publishing (2014)Google Scholar
  87. 87.
    Litzkow, M.J., Livny, M., Mutka, M.W.: Condor-a hunter of idle workstations. In: 8th Int. Conf. on Distributed Computing Systems, pp. 104–111 (1988)Google Scholar
  88. 88.
    Liu, B., Sotomayor, B., Madduri, R., Chard, K., Foster, I.: Deploying bioinformatics workflows on clouds with galaxy and globus provision. In: Supercomputing (SC) Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pp. 1087–1095 (2012)Google Scholar
  89. 89.
    Liu, J., Silva, V., Pacitti, E., Valduriez, P., Mattoso, M.: Scientific workflow partitioning in multisite cloud. In: Parallel Processing Workshops - Euro-Par 2014 Int. Workshops, pp. 105–116 (2014)Google Scholar
  90. 90.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M.B., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience 18(10), 1039–1065 (2006)CrossRefGoogle Scholar
  91. 91.
    Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: 8th Heterogeneous Computing Workshop, p. 30 (1999)Google Scholar
  92. 92.
    Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Cost- and deadline-constrained provisioning for scientific workflow ensembles in iaas clouds. In: Supercomputing (SC) Conf. on High Performance Computing Networking, Storage and Analysis, pp. 1–11 (2012)Google Scholar
  93. 93.
    Mattoso, M., Dias, J., Ocaña, K. A., Ogasawara, E., Costa, F., Horta, F., Silva, V., de Oliveira, D.: Dynamic steering of HPC scientific workflows: A survey. Futur. Gener. Comput. Syst. 0 (2014)Google Scholar
  94. 94.
    Mattoso, M., Werner, C., Travassos, G., Braganholo, V., Ogasawara, E., Oliveira, D., Cruz, S., Martinho, W., Murta, L.: Towards supporting the life cycle of large scale scientific experiments. In: Int. J. Business Process Integration and Management, vol. 5, pp. 79–82 (2010)Google Scholar
  95. 95.
    Milojicic, D.S., Llorente, I.M., Montero, R.S.: Opennebula: A cloud management tool. IEEE Internet Computing 15(2), 11–14 (2011)CrossRefGoogle Scholar
  96. 96.
    Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: Int. Conf. on Scientific and Statistical Database Management, pp. 471–481 (2010)Google Scholar
  97. 97.
    Nagavaram, A., Agrawal, G., Freitas, M.A., Telu, K.H., Mehta, G., Mayani, R. G., Deelman, E.: A cloud-based dynamic workflow for mass spectrometry data analysis. In: IEEE 7th Int. Conf. on E-Science (e-Science), pp. 47–54 (2011)Google Scholar
  98. 98.
    Nguyen, D., Thoai, N.: Ebc: Application-level migration on multi-site cloud. In: Int. Conf. on Systems and Informatics (ICSAI), pp. 876–880 (2012)Google Scholar
  99. 99.
    Ocaña, K.A., Oliveira, D., Ogasawara, E., Dávila, A.M., Lima, A.A., Mattoso, M.: Sciphy: A cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes. In: Advances in Bioinformatics and Computational Biology, vol. 6832, pp. 66–70 (2011)Google Scholar
  100. 100.
    Ocaña, K.A.C.S., Oliveira, D., Horta, F., Dias, J., Ogasawara, E., Mattoso, M.: Exploring molecular evolution reconstruction using a parallel cloud based scientific workflow. In: Advances in Bioinformatics and Computational Biology, vol. 7409, pp. 179–191 (2012)Google Scholar
  101. 101.
    Ogasawara, E.S., de Oliveira, D., Valduriez, P., Dias, J., Porto, F., Mattoso, M.: An algebraic approach for data-centric scientific workflows. Proceedings of the VLDB Endowment (PVLDB) 4(12), 1328–1339 (2011)Google Scholar
  102. 102.
    Ogasawara, E.S., Dias, J., Silva, V., Chirigati, F.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Chiron: a parallel engine for algebraic scientific workflows. Concurrency and Computation: Practice and Experience 25(16), 2327–2341 (2013)CrossRefGoogle Scholar
  103. 103.
    Oinn, T., Li, P., Kell, D.B., Goble, C., Goderis, A., Greenwood, M., Hull, D., Stevens, R., Turi, D., Zhao, J.: Taverna/mygrid: Aligning a workflow system with the life sciences community. In: Workflows for e-Science, pp. 300–319 (2007)Google Scholar
  104. 104.
    Oinn, T.M., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R.M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)CrossRefGoogle Scholar
  105. 105.
    Olabarriaga, S., Benabdelkader, A., Caan, M., Jaghoori, M., Krüger, J., de la Garza, L., Mohr, C., Schubert, B., Danezi, A., Kiss, T.: Ws-pgrade/guse-based science gateways in teaching. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp. 223–234. Springer International Publishing (2014)Google Scholar
  106. 106.
    Oliveira, D.D., Ocaña, K.A.C.S., Ogasawara, E., Dias, J., Gonçalves, J., Baião, F., Mattoso, M.: Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows. Futur. Gener. Comput. Syst. 29(7), 1816–1825 (2013)CrossRefGoogle Scholar
  107. 107.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), pp. 1099–1110 (2008)Google Scholar
  108. 108.
    Ostermann, S., Plankensteiner, K., Prodan, R., Fahringer, T.: Groudsim: An event-based simulation framework for computational grids and clouds. In: European Conf. on Parallel Processing (Euro-Par) Workshops, pp. 305–313 (2011)Google Scholar
  109. 109.
    Ostermann, S., Prodan, R., Fahringer, T.: Extending grids with cloud resource management for scientific computing. In: 10th IEEE/ACM Int. Conf. on Grid Computing, pp. 42–49 (2009)Google Scholar
  110. 110.
    Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer (2011)Google Scholar
  111. 111.
    Pacitti, E., Akbarinia, R., Dick, M.E.: P2P Techniques for Decentralized Applications. Morgan & Claypool Publishers (2012)Google Scholar
  112. 112.
    Pautasso, C., Alonso, G.: Parallel computing patterns for grid workflows (2006)Google Scholar
  113. 113.
    Plankensteiner, K., Prodan, R., Janetschek, M., Fahringer, T., Montagnat, J., Rogers, D., Harvey, I., Taylor, I., Balaskó, Á., Kacsuk, P.: Fine-grain interoperability of scientific workflows in distributed computing infrastructures. J. Grid Comput. 11(3), 429–455 (2013)CrossRefGoogle Scholar
  114. 114.
    Prodan, R.: Online analysis and runtime steering of dynamic workflows in the askalon grid environment. In: 7th IEEE Int. Symposium on Cluster Computing and the Grid (CCGRID), pp. 389–400 (2007)Google Scholar
  115. 115.
    Raicu, I., Zhao, Y., Foster, I.T., Szalay, A.S.: Data diffusion: Dynamic resource provision and data-aware scheduling for data intensive applications. The Computing Research Repository (CoRR), abs/0808.3535 (2008)Google Scholar
  116. 116.
    Ramakrishnan, A., Singh, G., Zhao, H., Deelman, E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers, D., Samidi, M.: Scheduling data-intensiveworkflows onto storage-constrained distributed resources. In: 7th IEEE Int. Symposium on Cluster Computing and the Grid (CCGRID), pp. 401–409 (2007)Google Scholar
  117. 117.
    Reynolds, C.J., Winter, S.C., Terstyánszky, G., Kiss, T., Greenwell, P., Acs, S., Kacsuk, P.: Scientific workflow makespan reduction through cloud augmented desktop grids. In: IEEE 3rd International Conference on Cloud Computing Technology and Science, pp. 18–23 (2011)Google Scholar
  118. 118.
    Samak, T., Gunter, D., Goode, M., Deelman, E., Juve, G., Mehta, G., Silva, F., Vahi, K.: Online fault and anomaly detection for large-scale scientific workflows. In: 13th IEEE Int. Conf. on High Performance Computing and Communications (HPCC), pp. 373–381 (2011)Google Scholar
  119. 119.
    Sciacca, E., Vitello, F., Becciani, U., Costa, A., Massimino, P. Visivo gateway and visivo mobile for the astrophysics community. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp. 181–194. Springer International Publishing (2014)Google Scholar
  120. 120.
    Shahand, S., Jaghoori, M., Benabdelkader, A., Font-Calvo, J., Huguet, J., Caan, M., van Kampen, A., Olabarriaga, S.: Computational neuroscience gateway: A science gateway based on the ws-pgrade/guse. In: Kacsuk, P. (ed.) Science Gateways for Distributed Computing Infrastructures, pp. 139–149. Springer International Publishing (2014)Google Scholar
  121. 121.
    Shankar, S., DeWitt, D.J.: Data driven workflow planning in cluster management systems. In: 16th International Symposium on High-Performance Distributed Computing (HPDC-16), pp. 127–136 (2007)Google Scholar
  122. 122.
    Singh, G., Su, M.-H., Vahi, K., Deelman, E., Berriman, B., Good, J., Katz, D. S., Mehta, G.: Workflow task clustering for best effort systems with pegasus. In: 15th ACM Mardi Gras Conf.: From Lightweight Mash-ups to Lambda Grids: Understanding the Spectrum of Distributed Computing Requirements, Applications, Tools, Infrastructures, Interoperability, and the Incremental Adoption of Key Capabilities, pp. 9:1–9:8 (2008)Google Scholar
  123. 123.
    Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI-The Complete Reference, Volume 1: The MPI Core. MIT Press (1998)Google Scholar
  124. 124.
    Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 12th IEEE/ACM Int. Symposium on Cluster, Cloud and Grid Computing (Ccgrid), pp. 65–72 (2012)Google Scholar
  125. 125.
    Taylor, I., Shields, M., Wang, I., Harrison, A.: The triana workflow environment: Architecture and applications. In: Workflows for e-Science, pp. 320–339. Springer (2007)Google Scholar
  126. 126.
    Terstyánszky, G., Kukla, T., Kiss, T., Kacsuk, P., Balaskó, Á., Farkas, Z.: Enabling scientific workflow sharing through coarse-grained interoperability. Futur. Gener. Comput. Syst. 37, 46–59 (2014)CrossRefGoogle Scholar
  127. 127.
    Terstyánszky, G., Michniak, E., Kiss, T., Balaskó, Á.: Sharing science gateway artefacts through repositories. In: Kacsuk, P. (ed.) : Science Gateways for Distributed Computing Infrastructures, pp. 123–135. Springer International Publishing (2014)Google Scholar
  128. 128.
    Topcuouglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)CrossRefGoogle Scholar
  129. 129.
    Aalst, W.M.P.v.d., Weske, M., Wirtz, G.: Advanced topics in workflow management: Issues, requirements, and solutions. Trans. SDPS 7(3), 49–77 (2003)Google Scholar
  130. 130.
    Vahi, K., Harvey, I., Samak, T., Gunter, D., Evans, K., Rogers, D., Taylor, I., Goode, M., Silva, F., Al-Shakarchi, E., Mehta, G., Jones, A., Deelman, E.: A general approach to real-time workflow monitoring. In: Supercomputing (SC) Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pp. 108–118 (2012)Google Scholar
  131. 131.
    Wang, J., Altintas, I.: Early cloud experiences with the kepler scientific workflow system. In: Int. Conf. on Computational Science (ICCS), vol. 9, pp. 1630–1634 (2012)Google Scholar
  132. 132.
    Wang, J., Crawl, D., Altintas, I.: Kepler + hadoop: A general architecture facilitating data-intensive applications in scientific workflow systems. In: 4th Workshop on Workflows in Support of Large-Scale Science, pp. 12:1–12:8 (2009)Google Scholar
  133. 133.
    White, T.: Hadoop: The Definitive Guide, O’Reilly Media, Inc. (2009)Google Scholar
  134. 134.
    Wieder, P., Butler, J.M., Theilmann, W., Yahyapour, R.: Service Level Agreements for Cloud Computing. Springer (2011)Google Scholar
  135. 135.
    Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: A language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)CrossRefGoogle Scholar
  136. 136.
    Wolstencroft, K., Haines, R., Fellows, D., Williams, A.R., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., de la Hidalga, A.N., Vargas, M.P.B., Sufi, S., Goble, C.A.: The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(Webserver-Issue), 557–561 (2013)CrossRefGoogle Scholar
  137. 137.
    Wozniak, J.M., Armstrong, T.G., Maheshwari, K., Lusk, E.L., Katz, D.S., Wilde, M., Foster, I.T.: Turbine: A distributed-memory dataflow engine for extreme-scale many-task applications. In: 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, pp. 5:1–5:12 (2012)Google Scholar
  138. 138.
    Yildiz, U., Guabtni, A., Ngu, A.H.H.: Business versus scientific workflows: A comparative study. In: IEEE Congress on Services, Part I, Services I, pp. 340–343 (2009)Google Scholar
  139. 139.
    Yu, J., Buyya, R.: A taxonomy of workflow management systems for grid computing. J. Grid Comput. 3, 171–200 (2005)CrossRefGoogle Scholar
  140. 140.
    Yu, Z., Shi, W.: An adaptive rescheduling strategy for grid workflow applications. In: IEEE Int. Parallel and Distributed Processing Symposium (IPDPS), pp. 1–8 (2007)Google Scholar
  141. 141.
    Yuan, D., Yang, Y., Liu, X., Chen, J.: A cost-effective strategy for intermediate data storage in scientific cloud workflow systems. In: IEEE Int. Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (2010)Google Scholar
  142. 142.
    Zhang, H., Soiland-Reyes, S., Goble, C.A.: Taverna mobile: Taverna workflows on android. The Computing Research Repository (CoRR), abs/1309.2787 (2013)Google Scholar
  143. 143.
    Zhang, Q., Cheng, L., Boutaba, R.: Cloud computing: state-of-the-art and research challenges. Journal of Internet Services and Applications 1, 7–18 (2010)CrossRefGoogle Scholar
  144. 144.
    Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: Fast, reliable, loosely coupled parallel computation. In: IEEE Int. Conf. on Services Computing - Workshops (SCW), pp 199–206 (2007)Google Scholar
  145. 145.
    Zhao, Y., Raicu, I., Foster, I.T.: Scientific workflow systems for 21st century, new bottle or new wine? In: IEEE Congress on Services, Part I, Services I, pp 467–471 (2008)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Ji Liu
    • 1
  • Esther Pacitti
    • 2
  • Patrick Valduriez
    • 1
  • Marta Mattoso
    • 3
  1. 1.MSR-Inria Joint Centre, Inria and LIRMM and University of MontpellierMontpellierFrance
  2. 2.Inria and LIRMM, University of MontpellierMontpellierFrance
  3. 3.COPPE/Federal University of Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations