Skip to main content
Log in

How good are machine learning clouds? Benchmarking two snapshots over 5 years

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We conduct an empirical study of machine learning functionalities provided by major cloud service providers, which we call machine learning clouds. Machine learning clouds hold the promise of hiding all the sophistication of running large-scale machine learning: Instead of specifying how to run a machine learning task, users only specify what machine learning task to run and the cloud figures out the rest. Raising the level of abstraction, however, rarely comes free—a performance penalty is possible. How good, then, are current machine learning clouds on real-world machine learning workloads? We study this question by conducting benchmark on the mainstream machine learning clouds. Since these platforms continue to innovate, our benchmark tries to reflect their evolvement. Concretely, this paper consists of two sub-benchmarks—mlbench and automlbench. When we first started this work in 2016, only two cloud platforms provide machine learning services and limited themselves to model training and simple hyper-parameter tuning. We then focus on binary classification problems and present mlbench, a novel benchmark constructed by harvesting datasets from Kaggle competitions. We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench. In the recent few years, more cloud providers support machine learning and include automatic machine learning (AutoML) techniques in their machine learning clouds. Their AutoML services can ease manual tuning on the whole machine learning pipeline, including but not limited to data preprocessing, feature selection, model selection, hyper-parameter, and model ensemble. To reflect these advancements, we design automlbench to assess the AutoML performance of four machine learning clouds using different kinds of workloads. Our comparative study reveals the strength and weakness of existing machine learning clouds and points out potential future directions for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44
Fig. 45
Fig. 46

Similar content being viewed by others

Notes

  1. Refer to https://dl.acm.org/doi/pdf/10.14778/3231751.3231770.

  2. https://drive.google.com/file/d/1lXC47nBjDyfqrNyUIC9xv-SEIuks44Gg/view.

  3. That is, when users can only be satisfied by winning the Kaggle competition or being ranked among the top 1%.

  4. https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose.

  5. https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State.

  6. http://www.tpc.org/information/benchmarks.asp.

References

  1. https://archive.ics.uci.edu/ml/datasets/

  2. Aguilar Melgar, L., et al.: Ease.ml: a lifecycle management system for machine learning. In: 11th Annual Conference on Innovative Data Systems Research (CIDR 2021) (virtual). CIDR (2021)

  3. Amazon: Amazon cloud. http://docs.aws.amazon.com/machine-learning/latest/dg/learning-algorithm.html (2021)

  4. Amazon: Amazon sagemaker autopilot. https://aws.amazon.com/sagemaker/autopilot/ (2021)

  5. Auto, I.: Ibm autoai. https://www.ibm.com/cloud/watson-studio/autoai (2021)

  6. Azure, M.: Azure automated machine learning. https://aws.amazon.com/sagemaker/autopilot/ (2021)

  7. Balaji, A., Allen, A.: Benchmarking automatic machine learning frameworks. arXiv:1808.06492 (2018)

  8. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36, 105–139 (1998)

    Article  Google Scholar 

  9. Bergstra, J., et al.: Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in science conference, vol. 13, p. 20. Citeseer (2013)

  10. Caruana, R., et al.: An empirical comparison of supervised learning algorithms. In: ICML (2006)

  11. Cooper, B.F., et al.: Benchmarking cloud serving systems with YCSB. In: SoCC (2010)

  12. Cortes, C., Vapnik, V.: Support-vector networks. Mach Learn 20, 273–297 (1995)

    Article  Google Scholar 

  13. DeWitt, D.J.: The Wisconsin benchmark: past, present, and future. In: The Benchmark Handbook for Database and Transaction Systems (1993)

  14. Domingos, P.: A few useful things to know about machine learning. In: CACM (2012)

  15. Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M., Smola, A.: Autogluon-tabular: robust and accurate automl for structured data. arXiv:2003.06505 (2020)

  16. Fernández-Delgado, M., et al.: Do we need hundreds of classifiers to solve real world classification problems. In: JMLR (2014)

  17. Feurer, M., et al.: Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)

  18. Feurer, M., et al.: Auto-sklearn: efficient and robust automated machine learning. In: Automated Machine Learning, pp. 113–134 (2019)

  19. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: JCSS (1997)

  20. Fusi, N., et al.: Probabilistic matrix factorization for automated machine learning. Adv. Neural Inf. Process. Syst. 31, 3348–3357 (2018)

    Google Scholar 

  21. Gomes, T.A., et al.: Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75(1), 3–13 (2012)

    Article  Google Scholar 

  22. Google: Google cloud automl. https://cloud.google.com/automl (2021)

  23. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Hoboken (1998)

    Google Scholar 

  24. He, X., et al.: Automl: a survey of the state-of-the-art. Knowl.-Based Syst. 212, 106622 (2021)

    Article  Google Scholar 

  25. Herbrich, R., et al.: Bayes point machines. In: JMLR (2001)

  26. Ho, T.K.: Random decision forests. In: ICDAR (1995)

  27. Hutter, F., et al.: Sequential model-based optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization, pp. 507–523. Springer (2011)

  28. Jiang, J., Gan, S., Liu, Y., Wang, F., Alonso, G., Klimovic, A., Singla, A., Wu, W., Zhang, C.: Towards demystifying serverless machine learning training. In: Proceedings of the 2021 International Conference on Management of Data, pp. 857–871 (2021)

  29. Kotthoff, L., et al.: Auto-weka: automatic model selection and hyperparameter optimization in weka. In: Automated Machine Learning, pp. 81–95. Springer, Cham (2019)

  30. LeDell, E., Poirier, S.: H2o automl: scalable automatic machine learning. In: Proceedings of the AutoML Workshop at ICML (2020)

  31. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)

    MathSciNet  Google Scholar 

  32. Li, P., et al.: Cleanml: a study for evaluating the impact of data cleaning on ml classification tasks. In: 36th IEEE International Conference on Data Engineering (ICDE 2020) (virtual) (2021)

  33. Li, Y., Shen, Y., Zhang, W., Zhang, C., Cui, B.: Volcanoml: speeding up end-to-end automl via scalable search space decomposition. VLDB J. 32(2), 389–413 (2023)

    Article  Google Scholar 

  34. Liu, Y., et al.: MLbench: benchmarking machine learning services against human experts. Proc. VLDB Endow. 11(10), 1220–1232 (2018)

    Article  Google Scholar 

  35. Luo, C., et al.: Cloudrank-d: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6, 347–362 (2012)

    Article  MathSciNet  Google Scholar 

  36. Mısır, M., et al.: Alors: an algorithm recommender system. Artif. Intell. 244, 291–314 (2017)

    Article  MathSciNet  Google Scholar 

  37. Olson, R.S., Moore, J.H.: Tpot: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on Automatic Machine Learning, pp. 66–74. PMLR (2016)

  38. Olson, R.S., et al.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492 (2016)

  39. Parry, P., et al.: auto_ml. https://github.com/ClimbsRocks/auto_ml (2007)

  40. Perrone, V., Shen, H., Seeger, M.W., Archambeau, C., Jenatton, R.: Learning search spaces for Bayesian optimization: another view of hyperparameter transfer learning. Adv. Neural Inf. Process. Syst. 32 (2019)

  41. Quinlan, J.R.: Induction of decision trees. Mach. Learn. (1986)

  42. Reif, M., et al.: Meta-learning for evolutionary parameter optimization of classifiers. Mach. Learn. 87(3), 357–380 (2012)

    Article  MathSciNet  Google Scholar 

  43. Shotton, J., et al.: Decision jungles: compact and rich models for classification. In: NIPS (2013)

  44. Sun-Hosoya, L., et al.: Activmetal: algorithm recommendation with active meta learning. In: IAL 2018 workshop, ECML PKDD (2018)

  45. Thornton, C., et al.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)

  46. Wong, C., et al.: Transfer learning with neural automl. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8366–8375 (2018)

  47. Wu, Z., Ramsundar, B., Feinberg, E.N., Gomes, J., Geniesse, C., Pappu, A.S., Leswing, K., Pande, V.: Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 9(2), 513–530 (2018)

  48. Yakovlev, A., et al.: Oracle automl: a fast and predictive automl pipeline. Proc. VLDB Endow. 13(12), 3166–3180 (2020)

  49. Yogatama, D., Mann, G.: Efficient transfer learning method for automatic hyperparameter tuning. In: Artificial Intelligence and Statistics, pp. 1077–1085. PMLR (2014)

  50. Zhang, C., et al.: An overreaction to the broken machine learning abstraction: the ease.ml vision. In: HILDA (2017)

  51. Zöller, M.A., Huber, M.F.: Benchmark and survey of automated machine learning frameworks. J. Artif. Intell. Res. 70, 409–472 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was sponsored by National Science and Technology Major Project (No. 2022ZD0116315) and Key R &D Program of Hubei Province (No. 2023BAB077). CZ and the DS3Lab gratefully acknowledge the support from the Swiss National Science Foundation (Project Number 200021_184628), Innosuisse/SNF BRIDGE Discovery (Project Number 40B2-0_187132), European Union Horizon 2020 Research and Innovation Programme (DAPHNE, 957407), Botnar Research Centre for Child Health, Swiss Data Science Center, Alibaba, Cisco, eBay, Google Focused Research Awards, Microsoft Swiss Joint Research Center, Oracle Labs, Swisscom, Zurich Insurance, Chinese Scholarship Council and the Department of Computer Science at ETH Zurich.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiawei Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, J., Wei, Y., Liu, Y. et al. How good are machine learning clouds? Benchmarking two snapshots over 5 years. The VLDB Journal 33, 833–857 (2024). https://doi.org/10.1007/s00778-024-00842-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-024-00842-3

Keywords

Navigation