Abstract
We present a solution for cloud infrastructure automation for scientific workflows. Unlike existing approaches, our solution is based on widely adopted tools, such as Terraform, and achieves a strict separation of two concerns: infrastructure description and provisioning vs. workflow description. At the same time it enables a comprehensive integration with a given cloud infrastructure, i.e. such wherein workflow execution can be managed by the cloud. The solution is integrated with our HyperFlow workflow management system and evaluated by demonstrating its use in experiments related to auto-scaling of scientific workflows in two types of cloud infrastructures: containerized Infrastructure-as-a-Service (IaaS) and Function-as-a-Service (FaaS). Experimental evaluation involves deployment and execution of a test workflow in Amazon ECS/Docker cluster and on a hybrid of Amazon ECS and AWS Lambda. The results show that our solution not only helps in the creation of repeatable infrastructures for scientific computing but also greatly facilitates automation of research experiments related to the execution of scientific workflows on advanced computing infrastructures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Infrastructure descriptions are stored in configuration files with .tf extension, and can be expressed in either JSON or a terraform-specific format.
- 4.
The dashboards show a different run in which t2.micro instances were used. However, besides the longer execution time the execution patterns were the same.
References
Azarnoosh, S., et al.: Introducing PRECIP: an API for managing repeatable experiments in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 19–26. IEEE (2013)
Balis, B., Figiela, K., Malawski, M., Pawlik, M., Bubak, M.: A lightweight approach for deployment of scientific workflows in cloud infrastructures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 281–290. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_27
Balis, B.: Hyperflow: a model of computation, programming approach and enactment engine for complex distributed workflows. Future Gener. Comput. Syst. 55, 147–162 (2016)
Berriman, G.B., Deelman, E., et al.: Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand. In: Astronomical Telescopes and Instrumentation, pp. 221–232. International Society for Optics and Photonics (2004)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)
Deelman, E., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2014)
Kacsuk, P., Kecskemeti, G., Kertesz, A., Nemeth, Z., Visegradi, A., Gergely, M.: Infrastructure aware scientific workflows and their support by a science gateway. In: 7th International Workshop on Science Gateways (IWSG), pp. 22–27. IEEE (2015)
Malawski, M., Gajek, A., Zima, A., Balis, B., Figiela, K.: Serverless execution of scientific workflows: experiments with HyperFlow, AWS Lambda and Google Cloud Functions. Future Gener. Comput. Syst. (2017, in Press)
Morris, K.: Infrastructure as Code: Managing Servers in the Cloud. O’Reilly Media Inc., Newton (2016)
Posey, B., Gropp, C., Herzog, A., Apon, A.: Automated cluster provisioning and workflow management for parallel scientific applications in the cloud. In: Proceedings 10th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) (2017)
Santana-Perez, I., Pérez-Hernández, M.S.: Towards reproducibility in scientific workflows: an infrastructure-based approach. Sci. Program. (2015)
Varghese, B., Buyya, R.: Next generation cloud computing: new trends and research directions. Future Gener. Comput. Syst. 79, 849–861 (2018)
Wang, J., AbdelBaky, M., Diaz-Montes, J., Purawat, S., Parashar, M., Altintas, I.: Kepler+ cometcloud: dynamic scientific workflow execution on federated cloud resources. Procedia Comput. Sci. 80, 700–711 (2016)
Wang, J., Altintas, I.: Early cloud experiences with the kepler scientific workflow system. Procedia Comput. Sci. 9, 1630–1634 (2012)
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.T.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)
Acknowledgment
This work was supported by the National Science Centre, Poland, grant 2016/21/B/ST6/01497.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Balis, B., Orzechowski, M., Pawlik, K., Pawlik, M., Malawski, M. (2020). Cloud Infrastructure Automation for Scientific Workflows. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-43229-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)