Skip to main content

Designing Cloud-Friendly HPC Applications

  • Chapter
  • First Online:
High Performance Computing in Clouds

Abstract

Today, we no longer live in an era where cloud computing was addressed only for enterprise transaction-based applications like e-commerce and those centered on database usage. We are experiencing several cloud providers exploiting bottleneck performance bypass through optimized software and hardware infrastructures. With this in mind, each time more, we see researchers migrating their applications from on-premise to cloud resources, so enabling competitive performance rates with lower financial costs. In this context, this book chapter presents how we can design cloud-friendly HPC applications. The key objective is first to detail the main features of cloud computing, then detail how the existing HPC application models can address each feature. In the end, we will have the best insights for developing HPC applications to run in the cloud and the hardware aspects that influence the applications’ performance. Thus, we plan to provide a short best-practices guide that could be followed when planning to use (or migrate to) cloud testbeds to run parallel applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aws.amazon.com/pt/fsx/lustre/.

  2. 2.

    https://taskflow.github.io/taskflow/ParallelPipeline.html.

References

  1. Stefan Kehrer and Wolfgang Blochinger. A survey on cloud migration strategies for high performance computing. In Proceedings of the 13th Advanced Summer School on Service-Oriented Computing, pages 57–69. IBM Research Division, 2019.

    Google Scholar 

  2. Guilherme Galante, Luis Carlos Erpen De Bona, Antonio Roberto Mury, Bruno Schulze, and Rodrigo Rosa Righi. An analysis of public clouds elasticity in the execution of scientific applications: A survey. J. Grid Comput., 14(2):193–216, June 2016.

    Google Scholar 

  3. Christoph Fehling, Frank Leymann, Ralph Retter, Walter Schupeck, and Peter Arbitter. Cloud Computing Patterns: Fundamentals to Design, Build, and Manage Cloud Applications. Springer Publishing Company, Incorporated, 2014.

    Book  Google Scholar 

  4. Stefan Kehrer and Wolfgang Blochinger. Migrating parallel applications to the cloud: assessing cloud readiness based on parallel design decisions. SICS Softw.-Intensive Cyber Phys. Syst., 34(2–3):73–84, 2019.

    Article  Google Scholar 

  5. Geoffrey C. Fox and Dennis Gannon. Using clouds for technical computing. In High Performance Computing Workshop (1), volume 24 of Advances in Parallel Computing, pages 81–102. IOS Press, 2012.

    Google Scholar 

  6. Guilherme Galante and Rodrigo da Rosa Righi. Exploring cloud elasticity in scientific applications. In Nick Antonopoulos and Lee Gillam, editors, Cloud Computing - Principles, Systems and Applications, Second Edition, Computer Communications and Networks, pages 101–125. Springer, 2017.

    Google Scholar 

  7. Emanuel Ferreira Coutinho, Flávio Rubens de Carvalho Sousa, Paulo Antonio Leal Rego, and Danielo Goncalves Gomes anJosé Neuman de Souza. Elasticity in cloud computing: a survey. Ann. des Télécommunications, 70(7–8):289–309, 2015.

    Google Scholar 

  8. Yahya Al-Dhuraibi, Fawaz Paraiso, Nabil Djarallah, and Philippe Merle. Elasticity in cloud computing: State of the art and research challenges. IEEE Transactions on Services Computing, 11(2):430–447, 2018.

    Article  Google Scholar 

  9. Stefan Kehrer and Wolfgang Blochinger. Elastic parallel systems for high performance cloud computing: State-of-the-art and future directions. Parallel Processing Letters, 29(02):1950006, 2019.

    Google Scholar 

  10. Thilina Gunarathne, Tak-Lon Wu, Jong Youl Choi, Seung-Hee Bae, and Judy Qiu. Cloud computing paradigms for pleasingly parallel biomedical applications. Concurrency and Computation: Practice and Experience, 23(17):2338–2354, 2011.

    Google Scholar 

  11. Eunji Hwang, Suntae Kim, Tae-kyung Yoo, Jik-Soo Kim, Soonwook Hwang, and Young-ri Choi. Resource allocation policies for loosely coupled applications in heterogeneous computing systems. IEEE Transactions on Parallel and Distributed Systems, 27(8):2349–2362, 2016.

    Article  Google Scholar 

  12. Mohamed Ben Belgacem and Bastien Chopard. A hybrid HPC/cloud distributed infrastructure: Coupling EC2 cloud resources with HPC clusters to run large tightly coupled multiscale applications. Future Generation Computer Systems, 42:11–21, 2015.

    Google Scholar 

  13. Marco A. S. Netto, Rodrigo N. Calheiros, Eduardo R. Rodrigues, Renato L. F. Cunha, and Rajkumar Buyya. HPC cloud for scientific and business applications: Taxonomy, vision, and research challenges. ACM Comput. Surv., 51(1), Jan 2018.

    Google Scholar 

  14. Sulav Malla and Ken Christensen. HPC in the cloud: Performance comparison of function as a service (FaaS) vs infrastructure as a service (IaaS). Internet Technology Letters, 3(1):e137, 2020.

    Google Scholar 

  15. Hermes Senger and Fabrício Alves Barbosa da Silva. Bounds on the scalability of bag-of-tasks applications running on master-slave platforms. Parallel Processing Letters, 22(02):1250004, 2012.

    Google Scholar 

  16. Long Thai, Blesson Varghese, and Adam Barker. A survey and taxonomy of resource optimisation for executing bag-of-task applications on public clouds. Future Generation Computer Systems, 82:1–11, 2018.

    Article  Google Scholar 

  17. Michael Kaplan, Charles Kneifel, Victor Orlikowski, James Dorff, Mike Newton, Andy Howard, Don Shinn, Muath Bishawi, Simbarashe Chidyagwai, Peter Balogh, and Amanda Randles. Cloud computing for covid-19: Lessons learned from massively parallel models of ventilator splitting. Computing in Science & Engineering, 22(6):37–47, 2020.

    Article  Google Scholar 

  18. Paweł Czarnul. Parallel Programming for Modern High Performance Computing Systems. CRC Press, USA, 2018.

    Book  Google Scholar 

  19. Mohammad Hammoud and Majd F. Sakr. Distributed programming for the cloud: Models, challenges, and analytics engines. In Sherif Sakr and Mohamed Gaber, editors, Large Scale and Big Data, pages 1–38. Auerbach Publications, Boca Raton, Florida, 2014.

    Google Scholar 

  20. Lucas Baldo, Leonardo Brenner, Luiz Gustavo Fernandes, Paulo Fernandes, and Afonso Sales. Performance models for master/slave parallel programs. Electronic Notes in Theoretical Computer Science, 128(4):101–121, 2005. Proceedings of the First International Workshop on Practical Applications of Stochastic Modelling (PASM 2004).

    Google Scholar 

  21. Dinesh Rajan, Anthony Canino, Jesus A. Izaguirre, and Douglas Thain. Converting a high performance application to an elastic cloud application. In 2011 IEEE Third International Conference on Cloud Computing Technology and Science, CLOUDCOM ’11, page 383–390, USA, 2011. IEEE Computer Society.

    Google Scholar 

  22. Rodrigo da Rosa Righi, Vinicius Facco Rodrigues, Cristiano André da Costa, Guilherme Galante, Luis Carlos Erpen De Bona, and Tiago C. Ferreto. Autoelastic: Automatic resource elasticity for high performance applications in the cloud. IEEE Trans. Cloud Comput., 4(1):6–19, 2016.

    Google Scholar 

  23. B. Abdul-Wahid, L. Yu, D. Rajan, H. Feng, E. Darve, D. Thain, and J. A. Izaguirre. Folding proteins at 500 ns/hour with work queue. In 2012 IEEE 8th International Conference on E-Science (e-Science), pages 1–8, Los Alamitos, CA, USA, Oct 2012. IEEE Computer Society.

    Google Scholar 

  24. Barry Wilkinson and Michael Allen. Parallel programming - techniques and applications using networked workstations and parallel computers. Pearson Education, 1998.

    Google Scholar 

  25. Michael McCool, James Reinders, and Arch Robison. Structured Parallel Programming: Patterns for Efficient Computation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2012.

    Google Scholar 

  26. Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin. Taskflow: A lightweight parallel and heterogeneous task graph computing system. IEEE Transactions on Parallel and Distributed Systems, 33(6):1303–1320, 2022.

    Article  Google Scholar 

  27. Vinicius Meyer, Vinicius Facco Rodrigues, Rodrigo da Rosa Righi, Cristiano André da Costa, Guilherme Galante, and Cristiano Bonato Both. Pipel: exploiting resource reorganisation to optimise performance of pipeline-structured applications in the cloud. Int. J. Computational Systems Engineering, 5(1), 2019.

    Google Scholar 

  28. Andreu Moreno, Anna Sikora, Eduardo César, Joan Sorribes, and Tomàs Margalef. HeDPM: Load balancing of linear pipeline applications on heterogeneous systems. J. Supercomput., 73(9):3738–3760, Sep 2017.

    Article  Google Scholar 

  29. Marco Danelutto, Tiziano De Matteis, Gabriele Mencagli, and Massimo Torquati. A divide-and-conquer parallel pattern implementation for multicores. In Proceedings of the 3rd International Workshop on Software Engineering for Parallel Systems, SEPS 2016, page 10–19, New York, NY, USA, 2016. Association for Computing Machinery.

    Google Scholar 

  30. Mattias V. Eriksson, Christoph W. Keßler, and Mikhail Chalabine. Load balancing of irregular parallel divide-and-conquer algorithms in group-SPMD programming environments. In ARCS Workshops, volume P-81 of LNI, pages 313–322. GI, 2006.

    Google Scholar 

  31. Barry Wilkinson. Grid Computing: Techniques and Applications. CRC Press, Boca Raton, FL, 1st ed. edition, 2009.

    Google Scholar 

  32. Dariusz Rafał Augustyn and Łukasz Warchał. Cloud service solving n-body problem based on windows azure platform. In Andrzej Kwiecień, Piotr Gaj, and Piotr Stera, editors, Computer Networks, pages 84–95, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

    Google Scholar 

  33. Pavlos Katsogridakis, Sofia Papagiannaki, and Polyvios Pratikakis. Execution of recursive queries in Apache Spark. In Francisco F. Rivera, Tomás F. Pena, and José C. Cabaleiro, editors, Euro-Par 2017: Parallel Processing, pages 289–302, Cham, 2017. Springer International Publishing.

    Google Scholar 

  34. Yuang Jiang, Murali Kodialam, T. V. Lakshman, Sarit Mukherjee, and Leandros Tassiulas. Resource allocation in data centers using fast reinforcement learning algorithms. IEEE Transactions on Network and Service Management, 2021.

    Google Scholar 

  35. Mahendra Pratap Yadav, Rohit, and Dharmendra Kumar Yadav. Resource provisioning through machine learning in cloud services. Arabian Journal for Science and Engineering, 2021.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the following Brazilian funding entities: FAPERGS (process 21/2551-0000118-6), CAPES (process 88881.310440/2018-01) and CNPq (process 305263/2021-8).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo da Rosa Righi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

da Rosa Righi, R. et al. (2023). Designing Cloud-Friendly HPC Applications. In: Borin, E., Drummond, L.M.A., Gaudiot, JL., Melo, A., Melo Alves, M., Navaux, P.O.A. (eds) High Performance Computing in Clouds . Springer, Cham. https://doi.org/10.1007/978-3-031-29769-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29769-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29768-7

  • Online ISBN: 978-3-031-29769-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics