Skip to main content

Performance benchmarking and auto-tuning for scientific applications on virtual cluster

Abstract

Virtualization can provide many benefits for managing resources, including higher resource utilization, lower energy cost, faster fault recovery, and more flexible resource provisioning. However, provisioning resources for applications in the cloud environment has been challenging, especially for scientific applications with more complex runtime behavior and higher performance demand. In this work, we use real scientific applications and performance benchmarking tools to analyze the application performance of our in-house virtualized cluster. We demonstrate that the performance degradation of virtualization can be less than 10% with proper virtual machine configuration and the support of hardware virtualized InfiniBand. Our study on four real scientific applications also proved that the application performance is difficult to model or predict. Therefore, we developed an auto-tuning tool for finding the best resource provisioning setting in terms of both time and cost for any given application. We evaluate our design on an in-house KVM-based virtualized cluster with an InfiniBand connection. Comparing an optimal result from an exhausting search, we verified that our auto-tuning tool achieves accuracy over 90%, comparing to the best deployment, by using much less tuning time and execution runs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    MCX353A-FCBT Mellanox FDR InfiniBand card: http://www.mellanox.com/page/infiniband_cards_overview.

References

  1. 1.

    Beloglazov A, Buyya R, Lee YC, Zomaya A (2013) A taxonomy and survey of energy-efficient data centers and cloud computing systems. Int J Adv Res Comput Commun Eng (IJARCCE)

  2. 2.

    Xiao Z, Song W, Chen Q (2013) Dynamic resource allocation using virtual machines for cloud computing environment. IEEE TPDS 24(6):1107–1117

    Google Scholar 

  3. 3.

    Amazon Web Services. https://aws.amazon.com/

  4. 4.

    Microsoft Azure. https://azure.microsoft.com/

  5. 5.

    Google Cloud Platform. https://cloud.google.com

  6. 6.

    Coghlan S, Yelick K (2011) The magellan final report on cloud computing. https://doi.org/10.2172/1076794. https://www.osti.gov/biblio/1076794

  7. 7.

    HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/

  8. 8.

    Linux KVM. https://www.linux-kvm.org/page/Main_Page

  9. 9.

    Tillet P, Cox DD (2017) Input-aware auto-tuning of compute-bound HPC kernels. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017. ACM. https://doi.org/10.1145/3126908.3126939

  10. 10.

    Guo Y, Shan H, Huang S, Hwang K, Fan J, Yu Z (2021) GML: efficiently auto-tuning flink’s configurations via guided machine learning. IEEE Trans Parallel Distrib Syst 32(12). doi: https://doi.org/10.1109/TPDS.2021.3081600

  11. 11.

    Shu T, Guo Y, Wozniak JM, Ding X, Foster IT, Kurç TM (2021) In-situ workflow auto-tuning through combining component models. In: PPoPP ’21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, February 27- March 3, 2021. ACM. https://doi.org/10.1145/3437801.3441615

  12. 12.

    Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multim. Tools Appl. pp 8091–8126. doi: https://doi.org/10.1007/s11042-020-10139-6

  13. 13.

    Mock WBT (2011) Pareto Optimality. Springer Netherlands. https://doi.org/10.1007/978-1-4020-9160-5_341

  14. 14.

    Liu J (2012) Evaluating standard-based self-virtualizing devices: a performance study on 10 GbE NICs with SR-IOV support. Parall Distrib Comput Appl Technol (PDCAT)

  15. 15.

    Dong Y, Yang X, Li X, Li J, Tian K, Guan H (2010) High performance network virtualization with SR-IOV. In: 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 9-14 January 2010, Bangalore, India. IEEE Computer Society. doi: https://doi.org/10.1109/HPCA.2010.5416637

  16. 16.

    Suzuki J, Hidaka Y, Higuchi J, Baba T, Kami N, Yoshikawa T (2010) Multi-root share of single-root i/o virtualization (sr-iov) compliant pci express device. In: High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, pp 25–31. https://doi.org/10.1109/HOTI.2010.21

  17. 17.

    Huang Z, Ma R, Li J, Chang Z, Guan H (2012) Adaptive and scalable optimizations for high performance sr-iov. In: Cluster Computing (CLUSTER), 2012 IEEE International Conference on, pp 459–467. https://doi.org/10.1109/CLUSTER.2012.28

  18. 18.

    Lockwood GK, Tatineni M, Wagner R (2014) Sr-iov: performance benefits for virtualized interconnects. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE ’14, pp 47:1–47:7

  19. 19.

    Jose J, Li M, Lu X, Kandalla K, Arnold M, Panda D (2013) Sr-iov support for virtualization on infiniband clusters: Early experience. In: Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, pp 385–392

  20. 20.

    Hunt GDH, Pai R, Le MV, Jamjoom H, Bhattiprolu S, Boivie R, Dufour L, Frey B, Kapur M, Goldman KA, Grimm R, Janakirman J, Ludden JM, Mackerras P, May C, Palmer ER, Rao BB, Roy L, Starke WA, Stuecheli J, Valdez E, Voigt W (2021) Confidential computing for openpower. In: EuroSys ’21: Sixteenth European Conference on Computer Systems, Online Event, United Kingdom, April 26-28, 2021. ACM. https://doi.org/10.1145/3447786.3456243

  21. 21.

    Agache A, Ionescu M, Raiciu C (2017) CloudTalk: Enabling Distributed Application Optimisations in Public Clouds. In: Proceedings of the Twelfth European Conference on Computer Systems, EuroSys 2017, Belgrade, Serbia, April 23-26, pp. 605–619. ACM (2017). https://doi.org/10.1145/3064176.3064185

  22. 22.

    Akkus IE, Chen R, Rimac I, Stein M, Satzke K, Beck A, Aditya P, Hilt V (2018) SAND: Towards High-Performance Serverless Computing. In: 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11-13, 2018, pp. 923–935. USENIX Association. https://www.usenix.org/conference/atc18/presentation/akkus

  23. 23.

    National Center for High-performance Computing. https://iservice.nchc.org.tw/nchc_service/index.php?lang_type=

  24. 24.

    Chameleon Cloud. https://chameleoncloud.readthedocs.io/en/latest/

  25. 25.

    Yelick K, Coghlan S, Draney B, Canon RS (2011) The magellan report on cloud computing for science

  26. 26.

    Zhai Y, Liu M, Zhai J, Ma X, Chen W (2011) Cloud versus in-house cluster: Evaluating amazon cluster compute instances for running mpi applications. In: State of the Practice Reports, SC ’11, pp 11:1–11:10. ACM, New York, NY, USA. https://doi.org/10.1145/2063348.2063363

  27. 27.

    He Q, Zhou S, Kobler B, Duffy D, McGlynn T (2010) Case study for running hpc applications in public clouds. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp 395–401

  28. 28.

    Jackson KR, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman HJ, Wright NJ (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, CLOUDCOM ’10, pp 159–168

  29. 29.

    Thomas S, Voelker GM, Porter G (2018) Cachecloud: Towards speed-of-light datacenter communication. In: Ananthanarayanan G, Gupta I (eds.) 10th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2018, Boston, MA, USA, July 9, 2018. USENIX Association. https://www.usenix.org/conference/hotcloud18/presentation/thomas

  30. 30.

    Azure M (2021) High-performance computing on InfiniBand enabled H-series and N-series VMs. https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/hpc/overview

  31. 31.

    Services AW (2019) Leveraging Elastic Fabric Adapter to run HPC and ML Workloads on AWS Batch. https://aws.amazon.com/tw/blogs/compute/leveraging-efa-to-run-hpc-and-ml-workloads-on-aws-batch/

  32. 32.

    Herodotou H, Chen Y, Lu J (2020) A survey on automatic parameter tuning for big data processing systems. ACM Comput Surv 53(2). https://doi.org/10.1145/3381027

  33. 33.

    HadoopTuning: [Online]. Available: http://hadooptutorial.info/ hadoop-performance-tuning/ (2015)

  34. 34.

    Verma A, Cherkasova L, Campbell RH (2011) ARIA: Automatic Resource Inference and Allocation for Mapreduce Environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 235–244

  35. 35.

    Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: In CIDR, pp 261–272

  36. 36.

    Zhang Z, Cherkasova L, Loo BT (2013) Autotune: Optimizing execution concurrency and resource usage in mapreduce workflows. In: Proceedings of the 10th International Conference on Autonomic Computing, pp 175–181. USENIX. https://www.usenix.org/conference/icac13/technical-sessions/presentation/zhang_zhuoyao

  37. 37.

    Bei Z, Yu Z, Zhang H, Xiong W, Xu C, Eeckhout L, Feng S (2016) Rfhoc: a random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans Parallel Distrib Syst 27(5):1470–1483. https://doi.org/10.1109/TPDS.2015.2449299

  38. 38.

    Li M, Zeng L, Meng S, Tan J, Zhang L, Butt AR, Fuller N (2014) MRONLINE: MapReduce Online Performance Tuning. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’14, pp 165–176. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2600212.2600229

  39. 39.

    Lolos K, Konstantinou I, Kantere V, Koziris N (2017) Elastic management of cloud applications using adaptive reinforcement learning. In: 2017 IEEE International Conference on Big Data (Big Data), pp 203–212. https://doi.org/10.1109/BigData.2017.8257928

  40. 40.

    Nouri SMR, Li H, Venugopal S, Guo W, He M, Tian W (2019) Autonomic decentralized elasticity based on a reinforcement learning controller for cloud applications. Future Gener Comput Syst 94:765–780 https://doi.org/10.1016/j.future.2018.11.049. https://www.sciencedirect.com/science/article/pii/S0167739X18302826

  41. 41.

    Jamshidi P, Sharifloo A, Pahl C, Arabnejad H, Metzger A, Estrada G (2016) Fuzzy self-learning controllers for elasticity management in dynamic cloud architectures. In: 2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA), pp. 70–79. https://doi.org/10.1109/QoSA.2016.13

  42. 42.

    Arabnejad H, Jamshidi P, Estrada G, El Ioini N, Pahl C (2016) An auto-scaling cloud controller using fuzzy q-learning - implementation in openstack. In: Aiello M, Johnsen EB, Dustdar S, Georgievski I (eds) Service-Oriented and Cloud Computing. Springer International Publishing, Cham, pp 152–167

    Chapter  Google Scholar 

  43. 43.

    Hanafy WA, Mohamed AE, Salem SA (2019) A new infrastructure elasticity control algorithm for containerized cloud. IEEE Access 7:39731–39741. https://doi.org/10.1109/ACCESS.2019.2907171

    Article  Google Scholar 

  44. 44.

    Chen T, Moreau T, Jiang Z, Zheng L, Yan E, Shen H, Cowan M, Wang L, Hu Y, Ceze L, Guestrin C, Krishnamurthy A (2018) TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp 578–594. USENIX Association, Carlsbad, CA. https://www.usenix.org/conference/osdi18/presentation/chen

  45. 45.

    Mahgoub A, Wood P, Ganesh S, Mitra S, Gerlach W, Harrison T, Meyer F, Grama A, Bagchi S, Chaterji S (2017) Rafiki: A middleware for parameter tuning of nosql datastores for dynamic metagenomics workloads. In: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Middleware ’17, pp 28–40. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3135974.3135991

  46. 46.

    Roy RB, Patel T, Gadepally V, Tiwari D (2021) Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models. In: PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 20211. ACM. https://doi.org/10.1145/3453483.3454109

  47. 47.

    Ghosh R, Ghosh M, Yearwood J, Bagirov A (2005) Comparative analysis of genetic algorithm, simulated annealing and cutting angle method for artificial neural networks. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Springer, Berlin, Heidelberg, pp 62–70

    Chapter  Google Scholar 

  48. 48.

    Russell R (2008) Virtio: towards a de-facto standard for virtual i/o devices. SIGOPS Oper Syst Rev 42(5):95–103

    Article  Google Scholar 

  49. 49.

    Overview of Single Root I/O Virtualization (SR-IOV). https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-

  50. 50.

    HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://www.netlib.org/benchmark/hpl/

  51. 51.

    Parkbench Matrix Kernel Benchmarks. http://www.netlib.org/parkbench/html/matrix-kernels.html

  52. 52.

    IOR HPC Benchmark. http://sourceforge.net/projects/ior-sio/

  53. 53.

    Gadget2. Gadget2,http://www.mpa-garching.mpg.de/gadget/

  54. 54.

    WRF: The Weather Research & Forecasting Model. http://www.wrf-model.org/

  55. 55.

    libvirt: The virtualization API. https://libvirt.org

  56. 56.

    Run commands on your Linux instance at launch. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html

  57. 57.

    Chen CC, Hasio YT, Lin CY, Lu S, Lu HT, Chou J (2017) Using deep learning to predict and optimize hadoop data analytic service in a cloud platform. In: 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), pp 909–916. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.153

  58. 58.

    Jordan H, Thoman P, Durillo JJ, Gschwandtner SPP, Fahringer T, Moritsch H (2012) A multi-objective autotuning framework for parallel codes. In: SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp 1–12, https://doi.org/10.1109/SC.2012.7

  59. 59.

    Kessaci Y, Melab N, Talbi EG (2011) A pareto-based ga for scheduling hpc applications on distributed cloud infrastructures. IEEE HPCS

  60. 60.

    Source code of stdlib in C lang. http://www.jbox.dk/sanos/source/lib/stdlib.c.html

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jerry Chou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hsu, KJ., Chou, J. Performance benchmarking and auto-tuning for scientific applications on virtual cluster. J Supercomput (2021). https://doi.org/10.1007/s11227-021-04103-w

Download citation

Keywords

  • Scientific application
  • Virtualization
  • Performance auto-tuning