Skip to main content

AI Benchmarking for Science: Efforts from the MLCommons Science Working Group

  • Conference paper
  • First Online:
High Performance Computing. ISC High Performance 2022 International Workshops (ISC High Performance 2022)


With machine learning (ML) becoming a transformative tool for science, the scientific community needs a clear catalogue of ML techniques, and their relative benefits on various scientific problems, if they were to make significant advances in science using AI. Although this comes under the purview of benchmarking, conventional benchmarking initiatives are focused on performance, and as such, science, often becomes a secondary criteria.

In this paper, we describe a community effort from a working group, namely, MLCommons Science Working Group, in developing science-specific AI benchmarking for the international scientific community. Since the inception of the working group in 2020, the group has worked very collaboratively with a number of national laboratories, academic institutions and industries, across the world, and has developed four science-specific AI benchmarks. We will describe the overall process, the resulting benchmarks along with some initial results. We foresee that this initiative is likely to be very transformative for the AI for Science, and for performance-focused communities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

  7. 7.


  1. Callaway, E.: It will change everything: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 588, 203–204 (2020)

    Article  Google Scholar 

  2. Department of Energy: Artificial Intelligence for Science in the US Department of Energy. Accessed 30 June 2022

  3. Earthquake Data. Accessed 30 June 2022

  4. ECP-CANDLE: Benchmarks. GitHub. Accessed 30 June 2022

  5. Farrell, S., et al.: MLPerf HPC: a holistic benchmark suite for scientific machine learning on HPC systems (2021). arXiv:2110.11466

  6. Fox, G., Hey, T., Thiyagalingam, J.: Science data working group of MLCommons research. Web Page. Accessed 30 June 2022

  7. Fox, G., Rundle, J., Donnellan, A., Feng, B.: Earthquake nowcasting with deep learning. Geohazards 3(2), 199 (2022)

    Article  Google Scholar 

  8. Fox, G.C., von Laszewski, G., Knuuti, R., Butler, T., Kolesar, J.: MLCommons science benchmark earthquake code.

  9. Henghes, B., Pettitt, C., Thiyagalingam, J., Hey, T., Lahav, O.: Benchmarking and scalability of machine-learning methods for photometric redshift estimation. Mon. Notices Royal Astron. Soc. 505(4), 4847–4856 (2021)

    Article  Google Scholar 

  10. Henghes, B., Thiyagalingam, J., Pettitt, C., Hey, T., Lahav, O.: Deep learning methods for obtaining photometric redshift estimations from images. Mon. Notices Royal Astron. Soc. 512(2), 1696–1709 (2022)

    Article  Google Scholar 

  11. Hey, T., Butler, K., Jackson, S., Thiyagalingam, J.: Machine learning and big scientific data. Philos. Trans. Ser. A Math. Phys. Eng. Sci. 378(2166), 20190054 (2020)

    Google Scholar 

  12. Jackson, S., Cox, C., Thiyagalingam, J., Hey, T.: SciML-Bench: SciML benchmarking suite for AI for science: cloud masking benchmark. GitHub (2021). Accessed 30 June 2022

  13. Jumper, J., Evans, R., Pritzel, A., et al.: Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)

    Article  Google Scholar 

  14. Laanait, N., Borisevich, A., Yin, J.: A database of convergent beam electron diffraction patterns for machine learning of the structural properties of materials (2019).

  15. Laanait, N., et al.: Exascale deep learning for scientific inverse problems (2019). arXiv:1909.11150

  16. Lim, B., Arık, S.Ö., Loeff, N., Pfister, T.: Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021)

    Article  Google Scholar 

  17. Merchant, C.J., Harris, A.R., Maturi, E., Maccallum, S.: Probabilistic physically based cloud screening of satellite infrared imagery for operational sea surface temperature retrieval. Q. J. R. Meteorol. Soc. 131(611), 2735–2755 (2005)

    Article  Google Scholar 

  18. Nash, J., Sutcliffe, J.: River flow forecasting through conceptual models part I - a discussion of principles. J. Hydrol. 10(3), 282–290 (1970)

    Article  Google Scholar 

  19. Pan, J.: Probability flow for classifying crystallographic space groups. In: Nichols, J., Verastegui, B., Maccabe, A.B., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds.) SMC 2020. CCIS, vol. 1315, pp. 451–464. Springer, Cham (2020).

    Chapter  Google Scholar 

  20. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).

    Chapter  Google Scholar 

  21. STEMDL Benchmark: STEMDL Benchmark. GitHub. Accessed 30 June 2022

  22. Tanaka, A., Tomiya, A., Hashimoto, K.: Deep Learning and Physics. Springer, Singapore (2021).

  23. Thiyagalingam, J., et al.: SciML-bench: SciML benchmarking suite for AI for science. GitHub (2021). Accessed 30 June 2022

  24. Thiyagalingam, J., Shankar, M., Fox, G., Hey, T.: Scientific machine learning benchmarks. Nat. Rev. Phys. 4, 413–420 (2022)

    Article  Google Scholar 

  25. Index of Pilot1 CANDLE-UNO Benchmark. Accessed 30 June 2022

  26. Wilkinson, M.D., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3(1) (2016)

    Google Scholar 

Download references


We would like to thank Samuel Jackson from the Scientific Machine Learning Group at the Rutherford Appleton Laboratory (RAL) of the Science and Technology Facilities Council (STFC)(UK) for his contributions towards the Cloud Masking benchmark. This work was supported by Wave 1 of the UKRI Strategic Priorities Fund under the EPSRC grant EP/T001569/1, particularly the ‘AI for Science’ theme within that grant, by the Alan Turing Institute and by the Benchmarking for AI for Science at Exascale (BASE) project under the EPSRC grant EP/V001310/1, along with the Facilities Funding from Science and Technology Facilities Council (STFC) of UKRI, NSF Grants 2204115 and 2204115, and DOE Award DE-SC0021418. This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan ( This research also used resources from the Oak Ridge and Argonne Leadership Computing Facilities, which are DOE Office of Science user facilities, supported under contracts DE-AC05-00OR22725 and DE-AC05-00OR22725, respectively, and from the PEARL AI resource at the RAL, STFC. This work would not have been possible without the continued support of MLCommons and MLCommons Research, and in particular, we thank Peter Mattson, David Kanter and Vijay Janapa Reddi for their leadership and help.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Jeyan Thiyagalingam or Geoffrey Fox .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thiyagalingam, J. et al. (2022). AI Benchmarking for Science: Efforts from the MLCommons Science Working Group. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23219-0

  • Online ISBN: 978-3-031-23220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics