Skip to main content
Log in

Fair sharing of network resources among workflow ensembles

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Computational science depends on complex, data intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist’s workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfiguration, end-to-end infrastructure that can increase scientific productivity, but applications are not taking advantage of them. In our previous work, we introduced DyNamo, that enabled CASA scientists to improve the efficiency of their operations and effortlessly leverage capabilities of the cloud resources available to them that previously remained underutilized. However, the provided workflow automation did not satisfy all the operational requirements of CASA. Custom scripts were still in production to manage workflow triggering, while multiple layer 2 connections would have to be allocated to maintain network QoS requirements. To address these issues, we enhance the DyNamo system with advanced network manipulation mechanisms, end-to-end infrastructure monitoring and ensemble workflow management capabilities. DyNamo’s Virtual Software Defined Exchange (vSDX) capabilities have been extended, enabling link adaptation, flow prioritization and traffic control between endpoints. These new features allow us to enforce network QoS requirements for each workflow ensemble and can lead to more fair network sharing. Additionally, to accommodate CASA’s operational needs we have extended the newly integrated Pegasus Ensemble Manager with event based triggering functionality, that improves managing CASA’s workflow ensembles. The Pegasus Ensemble Manager, apart from managing the workflow ensembles can also create conditions for a more fair resource usage, by employing throttling techniques to reduce compute and network resource contention. We evaluate the effects of the DyNamo’s vSDX policies by using two CASA workflow ensembles competing for network resources, and we show that traffic shaping of the ensembles can lead to a fairer sharing of the network links. Finally, we study how changing the Pegasus Ensemble Manager’s throttling for each of the two workflow ensembles affects their performance while they compete for the same network resources, and we assess if this approach is a viable alternative compared to the vSDX policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Baldin, I., Chase, J., Xin, Y., Mandal, A., Ruth, P., Castillo, C., Orlikowski, V., Heermann, C., Mills, J.: ExoGENI: a multi-domain infrastructure-as-a-service testbed, pp. 279–315. Springer, Cham (2016)

    Google Scholar 

  2. Lyons, E., Papadimitriou, G., Wang, C., Thareja, K., Ruth, P., Villalobos, J., Rodero, I., Deelman, E.,Zink, M., Mandal, A.: Toward a dynamic network-centric distributed cloud platform for scientific workflows: A case study for adaptive weather sensing. In: 2019 15th International Conference on eScience (eScience), pp. 67–76. (2019)

  3. Gupta, A., Vanbever, L., Shahbaz, M., Donovan, S.P., Schlinker, B., Feamster, N., Rexford, J., Shenker, S., Clark, R., Katz-Bassett, E.: Sdx: a software defined internet exchange. SIGCOMM 44, 551–562 (2014)

    Article  Google Scholar 

  4. Mambretti, J., Chen, J., Yeh, F.: Next generation clouds, the chameleon cloud testbed, and software defined networking (sdn), In: 2015 international conference on cloud computing research and innovation (ICCCRI), pp. 73–79. (2015)

  5. Amazon Elastic Compute Cloud. http://www.amazon.com/ec2

  6. Microsoft Azure Cloud. https://azure.microsoft.com/en-us/

  7. AWS CloudFormation. http://aws.amazon.com/cloudformation

  8. OpenStack Heat Project. https://wiki.openstack.org/wiki/Heat

  9. Baldin, I., Ruth, P., Wang, C., Chase, J. S.: The future of multi-clouds: a survey of essential architectural elements, In: 2018 international scientific and technical conference modern computer network technologies (MoNeTeC), pp. 1–13. (2018)

  10. Foster, I.: Globus online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput. 15(3), 70–73 (2011)

    Article  Google Scholar 

  11. Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)

    Article  Google Scholar 

  12. Galante, G., Erpen De Bona, L.C., Mury, A.R., Schulze, B., Rosa Righi, R.: An analysis of public clouds elasticity in the execution of scientific applications: a survey. J. Grid. Comput. 14(2), 193–216 (2016)

    Article  Google Scholar 

  13. Coutinho, E.. F.., de Carvalho Sousa, F.. R.., Rego, P.. A.. L.., Gomes, D.. G.., de Souza, J.. N..: Elasticity in cloud computing: a survey. Ann.Telecommun. - annales des telecommunications 70(7), 289–309 (2015)

    Article  Google Scholar 

  14. Wang, J., AbdelBaky, M., Diaz-Montes, J., Purawat, S., Parashar, M., Altintas, I.: “Kepler + cometcloud: Dynamic scientific workflow execution on federated cloud resources (international Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016. San Diego, California, USA), Proced. Comput. Sci. 80, 700–711 (2016)

  15. Ostermann, S., Prodan, R., Fahringer, T.: Dynamic cloud provisioning for scientific grid workflows. In: 2010 11th IEEE/ACM international conference on grid computing, pp. 97–104. (2010)

  16. Mandal, A., Ruth, P., Baldin, I., Xin, Y., Castillo, C., Juve, G., Rynge, M., Deelman, E., Chase, J.: Adapting scientific workflows on networked clouds using proactive introspection, In: IEEE/ACM Utility and Cloud Computing (UCC). (2015)

  17. Macker, J.P., Taylor, I.: Orchestration and analysis of decentralized workflows within heterogeneous networking infrastructures. Future Gener. Comput. Syst. 75, 388–401 (2017)

    Article  Google Scholar 

  18. Ramakrishnan, L., Koelbel, C., Kee, Y., Wolski, Y., Nurmi, Y., Gannon, D., Obertelli, G., YarKhan, A., Mandal, A., Huang, T. M., Thyagaraja, T. M., Zagorodnov, D.: Vgrads: enabling e-science workflows on grids and clouds with fault tolerance. In: Proceedings of the conference on high performance computing networking, storage and analysis, pp. 1–12. (2009)

  19. Liu, Q., Rao, N. S. V., Sen, S., Settlemyer, B. W., Chen, H.-B., Boley, J. M., Kettimuthu, R., Katramatos, D.: Virtual environment for testing software-defined networking solutions for scientific workflows. In: Proceedings of the 1st international workshop on autonomous infrastructure for Science, ser. AI-Science’18. New York, NY, USA: Association for Computing Machinery. (2018). https://doi.org/10.1145/3217197.3217202

  20. Ghahramani, M.H., Zhou, M., Hon, C.T.: Toward cloud computing qos architecture: analysis of cloud systems and cloud services. IEEE/CAA J. At. Sin. 4(1), 6–18 (2017)

    Article  MathSciNet  Google Scholar 

  21. Varshney, S., Sandhu, R., Gupta, P.K.: Qos based resource provisioning in cloud computing environment: a technical survey. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds.) Advances in computing and data sciences, pp. 711–723. Springer, Singapore (2019)

    Chapter  Google Scholar 

  22. On-demand secure circuits and advance reservation system. https://doi.org/10.1145/2443416.2443420

  23. Islam, M.,Huang, A. K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., Neumann, A.,Abdelnur,a.: Oozie: Towards a scalable workflow management system for hadoop. In: Proceedings of the 1st ACM SIGMOD workshop on scalable workflow execution engines and technologies, ser. SWEET ’12. Association for Computing Machinery, New York, (2012). https://doi.org/10.1145/2443416.2443420

  24. Senturk, I. F., Balakrishnan, P., Abu-Doleh, A., Kaya, K., Malluhi, Q., ., Çatalyürek, Ümit. V.: A resource provisioning framework for bioinformatics applications in multi-cloud environments. Future Gener. Comput. Syst. 78, 379–391 (2018)

    Article  Google Scholar 

  25. Malawski, M., Figiela, K., Bubak, M., Deelman, E., Nabrzyski, J.: Scheduling multilevel deadline-constrained scientific workflows on clouds based on cost optimization. Sci. Pogram. 29, 158–169 (2015)

    Google Scholar 

  26. Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)

    Article  Google Scholar 

  27. Dickinson, M., Debroy, S., Calyam, P., Valluripally, S., Zhang, Y., Bazan Antequera, R., Joshi, T., White, T., Xu, D.: Multi-cloud performance and security driven federated workflow management. IEEE Trans.Cloud Comput. 9, 240–257 (2018)

    Article  Google Scholar 

  28. Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., Ferreira da Silva, R., Livny, M., Wenger, K.: Pegasus: a workflow management system for science automation (funding Acknowledgements: NSF ACI SDCI 0722019, NSF ACI SI2-SSI 1148515 and NSF OCI-1053575). Future Gener. Comput. Syst. 46, 17–35 (2015)

  29. National Energy Research Scientific Computing Center (NERSC). https://www.nersc.gov

  30. Oak Ridge Leadership Computing Facility. https://www.olcf.ornl.gov

  31. Extreme science and engineering discovery environment (xsede). http://www.xsede.org

  32. Pordes, R., Petravick, D., Kramer, B., Olson, D., Livny, M., Roy, A., Avery, P., Blackburn, K., Wenaus, T., Würthwein, F., Foster, I., Gardner, R., Wilde, M., Blatecky, A., McGee, J., Quick, R.: The open science grid. J.Phys. Conf.Ser. 78, 012057 (2007)

    Article  Google Scholar 

  33. Amazon.com, Inc.: Amazon Web Services (AWS). http://aws.amazon.com

  34. Keahey, K., Riteau, K., Stanzione, D., Cockerill, K., Mambretti, J., Rad, P., Ruth, P.: “Chameleon: a scalable production testbed for computer science research,” in Contemporary High Performance Computing: From Petascale toward Exascale, 1st ed., ser. Chapman & Hall/CRC Computational Science, J. Vetter, Ed.Boca Raton, FL: CRC Press, 2018, vol. 3, ch. 5

  35. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurr. Comput. 17(2–4), 323–356 (2005)

    Article  Google Scholar 

  36. Gunter, D., Deelman, E., Samak, T., Brooks, C., Goode, M., Juve, G., Mehta, G., Moraes, P., Silva, F., Swany, M., Vahi, K.: “Online workflow management and performance analysis with stampede,” in 7th International Conference on Network and Service Management (CNSM-2011), (2011)

  37. Bayucan, A., Henderson, R. L., Lesiak, C., Mann, B., Proett, T., Tweten, D.: Portable batch system: external reference specification. In: Technical report, MRJ technology solutions, vol. 5, (1999)

  38. Simple Linux Utility for Resource Management. http://slurm.schedmd.com/

  39. Raman, R., Livny, M., Solomon, M.: “Matchmaking: distributed resource management for high throughput computing,” in Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244), pp. 140–146(1998)

  40. Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: “Condor-G: A computation management agent for multi-institutional grids,” in Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC), pp. 7–9. California, August, San Francisco (2001)

  41. Mobius Github Repository. https://github.com/RENCI-NRIG/Mobius

  42. Internet 2. https://www.internet2.edu/

  43. The energy science network. https://www.es.net/

  44. OpenStack Cloud Software. http://openstack.org

  45. McLaughlin, D., Pepyne, D., Chandrasekar, V., Philips, B., Kurose, J., Zink, M., Droegemeier, K., Cruz-Pol, S., Junyent, F., Brotzge, J., Westbrook, D., Bharadwaj, N., Wang, Y., Lyons, E., Hondl, K., Liu, Y., Knapp, E., Xue, M., Hopf, A., Kloesel, K., DeFonzo, A., Kollias, P., Brewster, K., Contreras, R., Dolan, B., Djaferis, T., Insanic, E., Frasier, S., Carr, F.: Short-wavelength technology and the potential for distributed networks of small radar systems. Bull. Am. Meteorol. Soc. 90(12), 1797–1818 (2009). https://doi.org/10.1175/2009BAMS2507.1.

    Article  Google Scholar 

  46. Lyons, E. J., Zink, M.,Philips, B.: Efficient data processing with exogeni for the casa dfw urban testbed. In: 2017 IEEE international geoscience and remote sensing symposium (IGARSS), pp. 5977–5980, (2017)

  47. Li, L., Schmid, W., Joss, J.: Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteorol. 34(6), 1286–1300 (1995)

    Article  Google Scholar 

  48. Ruzanski, E., Chandrasekar, V.: Weather radar data interpolation using a kernel-based lagrangian nowcasting technique. IEEE Trans. Geosci. Remote Sens. 53(6), 3073–3083 (2015)

    Article  Google Scholar 

  49. Yao, Y., Cao, Q., Farias, R., Chase, J., Orlikowski, V., Ruth, P., Cevik, M., Wang, C., Buraglio, N.: Toward live inter-domain network services on the exogeni testbed. In: IEEE INFOCOM 2018—IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp. 772–777, (2018)

  50. Zeek Github Repository. https://github.com/zeek/zeek

  51. Ahab Github Repository. https://github.com/RENCI-NRIG/ahab

  52. Linux Foundation Collaborative Projects. https://www.openvswitch.org/

  53. Open flow SDN Controllers. https://en.wikipedia.org/wiki/List_of_SDN_controller_software/

  54. Ryu SDN Controller. https://ryu-sdn.org/

  55. Ryu Rest Router. https://github.com/faucetsdn/ryu/blob/master/ryu/app/rest_router.py

  56. Exoplex Github Repository. https://github.com/RENCI-NRIG/CICI-SAFE

  57. Pandey, S., Vahi, K., Ferreira da Silva, R., Deelman, E., Jian, M., Harrison, C., Chu, A., Casanova, A.: Event-based triggering and management of scientific workflow ensembles, In: 2018, poster presented at the HPC Asia 2018: Tokyo, Japan. http://sighpc.ipsj.or.jp/HPCAsia2018/poster/post102s2-file1.pdf

  58. Prometheus. https://prometheus.io/

  59. Node Exporter. https://prometheus.io/docs/guides/node-exporter/

  60. Grafana. https://grafana.com/

  61. ELK stack. (2018). https://www.elastic.co/elk-stack

  62. Scitech, CASA Nowcast Pegasus Workflow. https://github.com/pegasus-isi/casa-nowcast-workflow

  63. Scitech: CASA Wind Pegasus Workflow. https://github.com/pegasus-isi/casa-wind-workflow

  64. Hasegawa, G., Murata, M., Miyahara, H.: Fairness and stability of congestion control mechanisms of tcp. In: IEEE INFOCOM ’99. Conference on computer communications. Proceedings. Eighteenth annual joint conference of the IEEE computer and communications societies. The future is now (Cat. No.99CH36320), vol. 3, pp. 1329–1336, (1999)

Download references

Acknowledgements

This work is funded by NSF award #1826997. We thank Mert Cevik (RENCI), engineers from UNT and LEARN for the UNT stitchport setup. Results in this paper were obtained using Chameleon and ExoGENI testbeds supported by NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Papadimitriou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Papadimitriou, G., Lyons, E., Wang, C. et al. Fair sharing of network resources among workflow ensembles. Cluster Comput 25, 2873–2891 (2022). https://doi.org/10.1007/s10586-021-03457-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03457-3

Keywords

Navigation