An SLA-Based Advisor for Placement of HPC Jobs on Hybrid Clouds

  • Kiran Mantripragada
  • Leonardo P. Tizzei
  • Alecio P. D. Binotto
  • Marco A. S. Netto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9435)

Abstract

Several scientific and industry applications require High Performance Computing (HPC) resources to process and/or simulate complex models. Not long ago, companies, research institutes, and universities used to acquire and maintain on-premise computer clusters; but, recently, cloud computing has emerged as an alternative for a subset of HPC applications. This poses a challenge to end-users, who have to decide where to run their jobs: on local clusters or burst to a remote cloud service provider. While current research on HPC cloud has focused on comparing performance of on-premise clusters against cloud resources, we build on top of existing efforts and introduce an advisory service to help users make this decision considering the trade-offs of resource costs, performance, and availability on hybrid clouds. We evaluated our service using a real test-bed with a seismic processing application based on Full Waveform Inversion; a technique used by geophysicists in the oil & gas industry and earthquake prediction. We also discuss how the advisor can be used for other applications and highlight the main lessons learned constructing this service to reduce costs and turnaround times.

Notes

Acknowledgment

We thank Eduardo Rodrigues and Nicole Sultanum for their comments on this paper. This work has been partially supported by FINEP/MCTI under grant no. 03.14.0062.00.

References

  1. 1.
    Belgacem, M.B., Chopard, B.: A hybrid HPC/cloud distributed infrastructure: coupling EC2 cloud resources with HPC clusters to run large tightly coupled multiscale applications. Future Gener. Comput. Syst. 42, 11–21 (2015)CrossRefGoogle Scholar
  2. 2.
    Binotto, A.P.D., Wehrmeister, M.A., Kuijper, A., Pereira, C.E.: Sm@rtConfig: a context-aware runtime and tuning system using an aspect-oriented approach for data intensive engineering applications. Control Eng. Prac. 21(2), 204–217 (2013)CrossRefGoogle Scholar
  3. 3.
    Calheiros, R.N., Netto, M.A.S., Rose, C.A.F.D., Buyya, R.: EMUSIM: an integrated emulation and simulation environment for modeling, evaluation, and validation of performance of cloud computing applications. Prac. Experience, Softw. 43(5), 595–612 (2013)CrossRefGoogle Scholar
  4. 4.
    De Assunção, M.D., Di Costanzo, A., Buyya, R.: Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters. In: Proceedings of the ACM International Symposium on High Performance Distributed Computing (2009)Google Scholar
  5. 5.
    Delimitrou, C., Kozyrakis, C.: QoS-aware scheduling in heterogeneous datacenters with paragon. ACM Trans. Comput. Syst. 31(4), 12 (2013)CrossRefGoogle Scholar
  6. 6.
    Gentzsch, W., Yenier, B.: The UberCloud HPC experiment: compendium of case studies. Technical report, Tabor Communications, Inc. (2013)Google Scholar
  7. 7.
    Gentzsch, W., Yenier, B.: The UberCloud experiment: technical computing in the cloud - 2nd compendium of case studies. Technical report, Tabor Communications, Inc. (2014)Google Scholar
  8. 8.
    Gupta, A., Kale, L.V., Gioachin, F., March, V., Suen, C.H., Lee, B.S., Faraboschi, P., Kaufmann, R., Milojicic, D.: The who, what, why and how of high performance computing applications in the cloud. In: Proceedings of the IEEE International Conference on Cloud Computing Technology and Science (2013)Google Scholar
  9. 9.
    Gupta, A., Milojicic, D.: Evaluation of HPC applications on cloud. In: Open Cirrus Summit (2011)Google Scholar
  10. 10.
    Jarvis, S.A., Spooner, D.P., Keung, H.N.L.C., Cao, J., Saini, S., Nudd, G.R.: Performance prediction and its use in parallel and distributed computing systems. Future Gener. Comput. Syst. 22(7), 745–754 (2006)CrossRefGoogle Scholar
  11. 11.
    Li, H.: Workload dynamics on clusters and grids. J. Supercomput. 47(1), 1–20 (2009)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Lowen, S.B., Teich, M.C.: Fractal-Based Point Processes. Wiley, New York (2005)CrossRefMATHGoogle Scholar
  13. 13.
    Mantripragada, K., Binotto, A., Tizzei, L.P.: A self-adaptive auto-scaling method for scientific applications on HPC environments and clouds. In: Proceedings of the International Workshop on Adaptive Self-tuning Computing Systems (2015)Google Scholar
  14. 14.
    Marathe, A., Harris, R., Lowenthal, D.K., de Supinski, B.R., Rountree, B., Schulz, M., Yuan, X.: A comparative study of high-performance computing on the cloud. In: Proceedings of the International Symposium on High-performance Parallel and Distributed Computing (2013)Google Scholar
  15. 15.
    Mateescu, G., Gentzsch, W., Ribbens, C.J.: Hybrid computing-where HPC meets grid and cloud computing. Future Gener. Comput. Syst. 27(5), 440–453 (2011)CrossRefGoogle Scholar
  16. 16.
    Napper, J., Bientinesi, P.: Can cloud computing reach the top500? In: Proceedings of the Combined Workshops on UnConventional High Performance Computing Workshop Plus Memory Access Workshop (2009)Google Scholar
  17. 17.
    Ostermann, S., Iosup, A., Yigitbasi, N., Prodan, R., Fahringer, T., Epema, D.: A performance analysis of EC2 cloud computing services for scientific computing. In: Avresky, D.R., Diaz, M., Bode, A., Ciciani, B., Dekel, E. (eds.) Proceedings of Cloud Computing. LNICST, vol. 34, pp. 115–131. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Sadjadi, S.M., Shimizu, S., Figueroa, J., Rangaswami, R., Delgado, J., Duran, H., Collazo-Mojica, X.J.: A modeling approach for estimating execution time of long-running scientific applications. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (2008)Google Scholar
  19. 19.
    Unuvar, M., Steinder, M., Tantawi, A.N.: Hybrid cloud placement algorithm. In: Proceedings of the IEEE International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (2014)Google Scholar
  20. 20.
    Vecchiola, C., Pandey, S., Buyya, R.: High-performance cloud computing: a view of scientific applications. In: Proceedings of the International Symposium on Pervasive Systems, Algorithms, and Networks (2009)Google Scholar
  21. 21.
    Virieux, J., Operto, S.: An overview of full-waveform inversion in exploration geophysics. Geophysics 74(6), WCC1–WCC26 (2009)CrossRefGoogle Scholar
  22. 22.
    Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: ACM/IEEE Supercomputing (2005)Google Scholar
  23. 23.
    Zaspel, P., Griebel, M.: Massively parallel fluid simulations on amazon’s HPC cloud. In: Proceedings of the International Symposium on Network Cloud Computing and Applications (2011)Google Scholar
  24. 24.
    Zheng, G., Wilmarth, T., Jagadishprasad, P., Kalé, L.V.: Simulation-based performance prediction for large parallel machines. Int. J. Parallel Program. 33(2), 183–207 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Kiran Mantripragada
    • 1
  • Leonardo P. Tizzei
    • 1
  • Alecio P. D. Binotto
    • 1
  • Marco A. S. Netto
    • 1
  1. 1.IBM ResearchSao pauloBrazil

Personalised recommendations