Skip to main content

Cloud Infrastructure Automation for Scientific Workflows

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12043))

Abstract

We present a solution for cloud infrastructure automation for scientific workflows. Unlike existing approaches, our solution is based on widely adopted tools, such as Terraform, and achieves a strict separation of two concerns: infrastructure description and provisioning vs. workflow description. At the same time it enables a comprehensive integration with a given cloud infrastructure, i.e. such wherein workflow execution can be managed by the cloud. The solution is integrated with our HyperFlow workflow management system and evaluated by demonstrating its use in experiments related to auto-scaling of scientific workflows in two types of cloud infrastructures: containerized Infrastructure-as-a-Service (IaaS) and Function-as-a-Service (FaaS). Experimental evaluation involves deployment and execution of a test workflow in Amazon ECS/Docker cluster and on a hybrid of Amazon ECS and AWS Lambda. The results show that our solution not only helps in the creation of repeatable infrastructures for scientific computing but also greatly facilitates automation of research experiments related to the execution of scientific workflows on advanced computing infrastructures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://occopus.lpds.sztaki.hu.

  2. 2.

    https://terraform.io.

  3. 3.

    Infrastructure descriptions are stored in configuration files with .tf extension, and can be expressed in either JSON or a terraform-specific format.

  4. 4.

    The dashboards show a different run in which t2.micro instances were used. However, besides the longer execution time the execution patterns were the same.

References

  1. Azarnoosh, S., et al.: Introducing PRECIP: an API for managing repeatable experiments in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 19–26. IEEE (2013)

    Google Scholar 

  2. Balis, B., Figiela, K., Malawski, M., Pawlik, M., Bubak, M.: A lightweight approach for deployment of scientific workflows in cloud infrastructures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9573, pp. 281–290. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_27

    Chapter  Google Scholar 

  3. Balis, B.: Hyperflow: a model of computation, programming approach and enactment engine for complex distributed workflows. Future Gener. Comput. Syst. 55, 147–162 (2016)

    Article  Google Scholar 

  4. Berriman, G.B., Deelman, E., et al.: Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand. In: Astronomical Telescopes and Instrumentation, pp. 221–232. International Society for Optics and Photonics (2004)

    Google Scholar 

  5. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)

    Article  Google Scholar 

  6. Deelman, E., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2014)

    Article  Google Scholar 

  7. Kacsuk, P., Kecskemeti, G., Kertesz, A., Nemeth, Z., Visegradi, A., Gergely, M.: Infrastructure aware scientific workflows and their support by a science gateway. In: 7th International Workshop on Science Gateways (IWSG), pp. 22–27. IEEE (2015)

    Google Scholar 

  8. Malawski, M., Gajek, A., Zima, A., Balis, B., Figiela, K.: Serverless execution of scientific workflows: experiments with HyperFlow, AWS Lambda and Google Cloud Functions. Future Gener. Comput. Syst. (2017, in Press)

    Google Scholar 

  9. Morris, K.: Infrastructure as Code: Managing Servers in the Cloud. O’Reilly Media Inc., Newton (2016)

    Google Scholar 

  10. Posey, B., Gropp, C., Herzog, A., Apon, A.: Automated cluster provisioning and workflow management for parallel scientific applications in the cloud. In: Proceedings 10th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) (2017)

    Google Scholar 

  11. Santana-Perez, I., Pérez-Hernández, M.S.: Towards reproducibility in scientific workflows: an infrastructure-based approach. Sci. Program. (2015)

    Google Scholar 

  12. Varghese, B., Buyya, R.: Next generation cloud computing: new trends and research directions. Future Gener. Comput. Syst. 79, 849–861 (2018)

    Article  Google Scholar 

  13. Wang, J., AbdelBaky, M., Diaz-Montes, J., Purawat, S., Parashar, M., Altintas, I.: Kepler+ cometcloud: dynamic scientific workflow execution on federated cloud resources. Procedia Comput. Sci. 80, 700–711 (2016)

    Article  Google Scholar 

  14. Wang, J., Altintas, I.: Early cloud experiences with the kepler scientific workflow system. Procedia Comput. Sci. 9, 1630–1634 (2012)

    Article  Google Scholar 

  15. Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.T.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by the National Science Centre, Poland, grant 2016/21/B/ST6/01497.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bartosz Balis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balis, B., Orzechowski, M., Pawlik, K., Pawlik, M., Malawski, M. (2020). Cloud Infrastructure Automation for Scientific Workflows. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43229-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43228-7

  • Online ISBN: 978-3-030-43229-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics