Advertisement

An Experimental Study of Data Transfer Strategies for Execution of Scientific Workflows

  • Oleg SukhoroslovEmail author
Conference paper
  • 271 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11657)

Abstract

The paper studies the impact of data transfer strategies on the execution of scientific workflows. Five strategies are described, which define when and in what order data transfers are performed during the workflow execution. The strategies are experimentally evaluated by means of simulation using a realistic network model. It is demonstrated that the execution time of data-intensive workflows significantly depends on the used strategy. In particular, Eager and Lazy strategies, often used in theory and practice of workflow scheduling, demonstrate the poor results in most cases. The alternative strategies provide up to 36% makespan improvement by overlapping communications and computations, prioritizing data transfers and reducing network contention.

Keywords

Scientific workflows Data-intensive computing Task scheduling Data management Simulation 

Notes

Acknowledgments

This work is supported by the Russian Science Foundation (project 16-11-10352).

References

  1. 1.
    Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Sys. 29(1), 158–169 (2013)CrossRefGoogle Scholar
  2. 2.
    Bharathi, S., Chervenak, A.: Data staging strategies and their impact on the execution of scientific workflows. In: Proceedings of the Second International Workshop on Data-Aware Distributed Computing, p. 5. ACM (2009)Google Scholar
  3. 3.
    Bharathi S., Chervenak A., Deelman E., Mehta G., Su M.H., Vahi K.: Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp. 1–10, November 2008Google Scholar
  4. 4.
    Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14(2), 359–378 (2016)CrossRefGoogle Scholar
  5. 5.
    Byun, E.K., Kee, Y.S., Kim, J.S., Maeng, S.: Cost optimized provisioning of elastic resources for application workflows. Future Gener. Comput. Syst. 27(8), 1011–1026 (2011)CrossRefGoogle Scholar
  6. 6.
    Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014)CrossRefGoogle Scholar
  7. 7.
    Çatalyürek, Ü.V., Kaya, K., Uçar, B.: Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, pp. 45–54. ACM (2011)Google Scholar
  8. 8.
    Deelman, E., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)CrossRefGoogle Scholar
  9. 9.
    Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gener. Comput. Syst. 29(3), 682–692 (2013)CrossRefGoogle Scholar
  10. 10.
    Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)CrossRefGoogle Scholar
  11. 11.
    Liu, Z., et al.: A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 556–563. IEEE (2018)Google Scholar
  12. 12.
    Nazarenko, A., Sukhoroslov, O.: An experimental study of workflow scheduling algorithms for heterogeneous systems. In: Malyshkin, V. (ed.) PaCT 2017. LNCS, vol. 10421, pp. 327–341. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-62932-2_32CrossRefGoogle Scholar
  13. 13.
    Pandey, S., Wu, L., Guru, S.M., Buyya, R.: A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: 2010 24th IEEE International Conference on Advanced Information Networking and Applications, pp. 400–407. IEEE (2010)Google Scholar
  14. 14.
    da Silva, R.F., Filgueira, R., Deelman, E., Pairo-Castineira, E., Overton, I.M., Atkinson, M.P.: Using simple PID controllers to prevent and mitigate faults in scientific workflows. In: WORKS@ SC, pp. 15–24 (2016)Google Scholar
  15. 15.
    Szabo, C., Sheng, Q.Z., Kroeger, T., Zhang, Y., Yu, J.: Science in the cloud: allocation and execution of data-intensive scientific workflows. J. Grid Comput. 12(2), 245–264 (2014)CrossRefGoogle Scholar
  16. 16.
    Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer, London (2014).  https://doi.org/10.1007/978-1-84628-757-2CrossRefGoogle Scholar
  17. 17.
    Teylo, L., de Paula, U., Frota, Y., de Oliveira, D., Drummond, L.M.: A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds. Future Gener. Comput. Syst. 76, 1–17 (2017)CrossRefGoogle Scholar
  18. 18.
    Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRefGoogle Scholar
  19. 19.
    Velho, P., Schnorr, L.M., Casanova, H., Legrand, A.: On the validity of flow-level TCP network models for grid and cloud simulations. ACM Trans. Model. Comput. Simul. (TOMACS) 23(4), 23 (2013)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Wang, M., Zhang, J., Dong, F., Luo, J.: Data placement and task scheduling optimization for data intensive scientific workflow in multiple data centers environment. In: 2014 Second International Conference on Advanced Cloud and Big Data, pp. 77–84. IEEE (2014)Google Scholar
  21. 21.
    Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 71(9), 3373–3418 (2015)CrossRefGoogle Scholar
  22. 22.
    Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Xhafa, F., Abraham, A. (eds.) Metaheuristics for Scheduling in Distributed Computing Environments. Studies in Computational Intelligence, vol. 146, pp. 173–214. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-69277-5_7
  23. 23.
    Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Gener. Comput. Syst. 26(8), 1200–1214 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute for Information Transmission Problems of the Russian Academy of SciencesMoscowRussia

Personalised recommendations