Skip to main content

Integrating Deep Learning in Domain Sciences at Exascale

Part of the Communications in Computer and Information Science book series (CCIS,volume 1315)

Abstract

This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and upcoming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. Many deep learning frameworks are targeted at data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP. Advancements are also illustrated through the use of algorithmic enhancements in reduced- and mixed-precision, as well as asynchronous optimization methods. Finally, we present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications at ORNL and UTK with AI. The approaches and future challenges are illustrated in materials science, imaging, and climate applications.

This manuscript has been authored by UT-Battelle, LLC., under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Ise, T., Oba, Y.: Forecasting climatic trends using neural networks: an experimental study using global historical data. Front. Robot. AI 6, 32 (2019)

    CrossRef  Google Scholar 

  2. Wang, J., Balaprakash, P., Kotamarthi, R.: Fast domain-aware neural network emulation of a planetary boundary layer parameterization in a numerical weather forecast model. Geosci. Model Dev. 12(10), 4261–4274 (2019)

    CrossRef  Google Scholar 

  3. Agrawal, A., Choudhary, A.: Deep materials informatics: applications of deep learning in materials science. MRS Commun. 9(3), 779–792 (2019)

    CrossRef  Google Scholar 

  4. Feng, S., Zhou, H., Dong, H.: Using deep neural network with small dataset to predict material defects. Mater. Des. 162, 300–310 (2019). Citation Key: FENG2019300

    CrossRef  Google Scholar 

  5. Ye, W., Chen, C., Wang, Z., Chu, I.-H., Ong, S.P.: Deep neural networks for accurate predictions of crystal stability. Nat. Commun. 9, 3800 (2018)

    CrossRef  Google Scholar 

  6. Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an indepth concurrency analysis. ACM Comput. Surv. 1, 1–37 (2019)

    CrossRef  Google Scholar 

  7. Han, J., Xu, L., Rafique, M.M., Butt, A.R., Lim, S.: A quantitative study of deep learning training on heterogeneous supercomputers. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–12 (2019)

    Google Scholar 

  8. You, Y., Zhang, Z., Hsieh, C.-J., Demmel, J., Keutzer, K.: ImageNet training in minutes. In: Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018. NY, USA. Association for Computing Machinery, New York (2018)

    Google Scholar 

  9. Shazeer, N., et al.: Mesh-TensorFlow: deep learning for supercomputers. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10414–10423, Curran Associates Inc. (2018)

    Google Scholar 

  10. Geng, J., Li, D., Wang, S.: ElasticPipe: an efficient and dynamic model-parallel solution to DNN training. In: Proceedings of the 10th Workshop on Scientific Cloud Computing, ScienceCloud 2019, New York, USA, pp. 5–9. Association for Computing Machinery (2019)

    Google Scholar 

  11. Huang, Y.: GPipe: efficient training of giant neural networks using pipeline parallelism. In: Advances in Neural Information Processing Systems, vol. 32, pp. 103–112. Curran Associates Inc. (2019)

    Google Scholar 

  12. Gholami, A., Azad, A., Jin, P., Keutzer, K., Buluc, A.: Integrated model, batch, and domain parallelism in training neural networks. In: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, SPAA 2018, New York, USA, pp. 77–86. Association for Computing Machinery (2018)

    Google Scholar 

  13. Chen, C.-C., Yang, C.-L., Cheng, H.-Y.: Efficient and robust parallel DNN training through model parallelism on multi-GPU platform. arXiv abs/1809.02839 (2018)

  14. Stevens, R.: Exascale computing: the coming integration of simulation, data and machine learning. In: European Technology Platform for High-Performance Computing (ETP4HPC) ISC’18 Workshop (2018)

    Google Scholar 

  15. Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. CoRR, vol. abs/1603.04467 (2016)

    Google Scholar 

  16. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8026–8037. Curran Associates Inc. (2019). Citation Key: NIPS2019_9015

    Google Scholar 

  17. Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, vol. abs/1512.01274 (2015)

    Google Scholar 

  18. Nichols, D., Wong, K., Tomov, S., Ng, L., Chen, S., Gessinger, A.: MagmaDNN: accelerated deep learning using MAGMA. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC 2019. NY, USA. Association for Computing Machinery, New York (2019)

    Google Scholar 

  19. Nichols, D., Tomov, N.-S., Betancourt, F., Tomov, S., Wong, K., Dongarra, J.: MagmaDNN: towards high-performance data analytics and machine learning for data-driven scientific computing. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds.) ISC High Performance 2019. LNCS, vol. 11887, pp. 490–503. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34356-9_37

    CrossRef  Google Scholar 

  20. Nichols, D., Febbo, R., Lopez, F., Wong, K., Tomov, S., Dongarra, J.: MagmaDNN (Version 1.2), July 2020. https://doi.org/10.5281/zenodo.3972406

  21. Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36, 232–240 (2010)

    CrossRef  Google Scholar 

  22. Ozik, J., Collier, N., Wozniak, J., Spagnuolo, C.: From desktop to large-scale model exploration with Swift/T. In: Proceedings of the 2016 Winter Simulation Conference (2016)

    Google Scholar 

  23. Wong, K., Trzil, Z.: Tuple space implementation in a parallel workflow engine, OpenDIEL. In: Student Paper, PEARC 2018 (2018)

    Google Scholar 

  24. Betancourt, F., Wong, K., Asemota, E., Marshall, Q., Nichols, D., Tomov, S.: openDIEL: a parallel workflow engine and data analytics framework. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines Learning, Student Paper, PEARC 2019, NY, USA. Association for Computing Machinery, New York (2019)

    Google Scholar 

  25. Niu, F., Recht, B., Re, C., Wright, S.J.: HOGWILD! A lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS 2011, Red Hook, NY, USA, pp. 693–701. Curran Associates Inc. (2011)

    Google Scholar 

  26. Sergeev, A., Balso, M.D.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)

  27. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1223–1231. Curran Associates Inc. (2012)

    Google Scholar 

  28. Sallinen, S., Satish, N., Smelyanskiy, M., Sury, S.S., Ré, C.: High performance parallel stochastic gradient descent in shared memory. In: IPDPS, pp. 873–882. IEEE Computer Society (2016)

    Google Scholar 

  29. Sa, C.D., Zhang, C., Olukotun, K., Ré, C.: Taming the wild: a unified analysis of HOG WILD! -style algorithms. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, Cambridge, MA, USA, vol. 2, pp. 2674–2682. MIT Press (2015)

    Google Scholar 

  30. Lian, X., Huang, Y., Li, Y., Liu, J.: Asynchronous parallel stochastic gradient for nonconvex optimization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2737–2745. Curran Associates Inc. (2015)

    Google Scholar 

  31. Lopez, F., Chow, E., Tomov, S., Dongarra, J.: Asynchronous SGD for DNN training on shared-memory parallel architectures. Technical report, ICL-UT-20-04. Innovative Computing Laboratory, University of Tennessee (March 2020). (To appear in IPDPSW’20 proceedings)

    Google Scholar 

  32. Wolfson-Pou, J., Chow, E.: Modeling the asynchronous Jacobi method without communication delays. J. Parallel Distrib. Comput. 128, 6 (2019)

    CrossRef  Google Scholar 

  33. Yamazaki, I., Chow, E., Bouteiller, A., Dongarra, J.: Performance of asynchronous optimized Schwarz with one-sided communication. Parallel Comput. 86, 66–81 (2019)

    CrossRef  MathSciNet  Google Scholar 

  34. Courbariaux, M., Bengio, Y., David, J.-P.: Training deep neural networks with low precision multiplications (2014)

    Google Scholar 

  35. Intel Corporation: BFLOAT16–Hardware Numerics Definition. White paper. Document number 338302–001US, November 2018

    Google Scholar 

  36. Blanchard, P., Higham, N.J., Lopez, F., Mary, T., Pranesh, S.: Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores. SIAM J. Sci. Comput. 42(3), C124–C141 (2020)

    CrossRef  MathSciNet  Google Scholar 

  37. Haidar, A., Tomov, S., Dongarra, J., Higham, N.J.: Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 1–11 (2018)

    Google Scholar 

  38. Sorna, A., Cheng, X., D’Azevedo, E., Wong, K., Tomov, S.: Optimizing the Fast Fourier Transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), pp. 3–7 (2018)

    Google Scholar 

  39. Jain, A., et al.: Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1(1), 011002 (2013)

    CrossRef  Google Scholar 

  40. Gossett, E., et al.: AFLOW-ML: a RESTful API for machine-learning predictions of materials properties. Comput. Mater. Sci. 152, 134–145 (2018)

    CrossRef  Google Scholar 

  41. Kirklin, S., et al.: The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015)

    CrossRef  Google Scholar 

  42. Eisenbach, M., Zhou, C.-G., Nicholson, D.M., Brown, G., Larkin, J., Schulthess, T.C.: A scalable method for Ab Initio computation of free energies in nanoscale systems. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, New York, NY, USA, pp. 64:1–64:8. ACM (2009)

    Google Scholar 

  43. Eisenbach, M., Pei, Z., Liu, X.: First-principles study of order-disorder transitions in multicomponent solid-solution alloys. J. Phys. Condens. Matter 31, 273002 (2019)

    CrossRef  Google Scholar 

  44. Laanait, N., Borisevich, A., Yin, J.: A Database of Convergent Beam Electron Diffraction Patterns for Machine Learning of the Structural Properties of Materials (May 2019). https://doi.org/10.13139/OLCF/1510313

  45. Sayood, K.: Introduction to Data Compression. The Morgan Kaufmann Series in Multimedia Information and Systems. Elsevier Science (2017)

    Google Scholar 

  46. Rheinboldt, W.C.: On the computation of multi-dimensional solution manifolds of parametrized equations. Numer. Math. 53(1), 165–181 (1988)

    CrossRef  MathSciNet  Google Scholar 

  47. Williamson, D., Drake, J., Hack, J., Jakob, R., Swarztrauber, P.: A standard test set for numerical approximations to the shallow water equations in spherical geometry. J. Comput. Phys. 102, 211–224 (1992)

    CrossRef  MathSciNet  Google Scholar 

  48. Nair, R.D., Jablonowski, C.: Moving vortices on the sphere: a test case for horizontal advection problems. Mon. Weather Rev. 136(2), 699–711 (2008)

    CrossRef  Google Scholar 

  49. Mcdonald, A., Bates, J.R., McDonald and Bates: Semi-Lagrangian integration of a shallow water model on the sphere. Mon. Weather Rev. 117, 130 (1989)

    CrossRef  Google Scholar 

  50. Galewsky, J., Scott, R., Polvani, L.: An initial-value problem for testing numerical models of the global shallow-water equations. Tellus 56A, 429–440 (2004)

    CrossRef  Google Scholar 

  51. Abramowitz, M., Stegun, I. (eds.): Handbook of Mathematical Functions, chap. 9. Dover Publications (1972)

    Google Scholar 

  52. Sadourny, R.: Conservative finite-difference approximations of the primitive equations on quasi-uniform spherical grids. Mon. Weather Rev. 100(2), 136–144 (1972)

    CrossRef  Google Scholar 

  53. Nair, R., Thomas, S., Loft, R.: A discontinuous Galerkin global shallow water model. Mon. Weather Rev. 133(4), 876–888 (2005)

    CrossRef  Google Scholar 

  54. Ahmed, N., Natarajan, T., Rao, K.R.: Discrete Cosine transform. IEEE Trans. Comput. C–23(1), 90–93 (1974)

    CrossRef  MathSciNet  Google Scholar 

  55. Tomov, S., Ayala, A., Haidar, A., Dongarra, J.: FFT-ECP API and high-performance library prototype for 2-D and 3-D FFTs on large-scale heterogeneous systems with GPUs. ECP WBS 2.3.3.13 Milestone Report FFT-ECP STML13-27. Innovative Computing Laboratory, University of Tennessee (2020)

    Google Scholar 

  56. Lee, S., et al.: Improving scalability of parallel CNN training by adjusting mini-batch size at run-time. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 830–839 (2019)

    Google Scholar 

  57. Exascale Computing Project. https://www.exascaleproject.org

  58. Zhou, K., Zheng, Y., Li, B., Dong, W., Zhang, X.: Forecasting different types of convective weather: a deep learning approach. J. Meteorol. Res 33(5), 797–809 (2019). https://doi.org/10.1007/s13351-019-8162-6

    CrossRef  Google Scholar 

  59. Samsi, S., Mattioli, C., Mark, V.: Distributed deep learning for precipitation nowcasting. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC) (2019)

    Google Scholar 

  60. Keaney, M., Neal, T.: Comparing Deep Neural Network and Econometric Approaches to Predicting the Impact of Climate Change on Agricultural Yield. UNSW Economics Working Paper (2020)

    Google Scholar 

  61. Yi, X.X., Zhang, J., Wang, Z., Li, T., Zheng, Y.: Deep distributed fusion network for air quality prediction. In: Proceedings of KDD 2018, London, United Kingdom (2018)

    Google Scholar 

  62. Chen, K., Chen, K., Wang, Q., He, Z., Hu, J., He, J.: Short-term load forecasting with deep residual networks. arXiv abs/1805.11956v1 (2018)

  63. Kurth, T., et al.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. IEEE Press (2018)

    Google Scholar 

  64. Laanait, N., et al.: Exascale deep learning for scientific inverse problems (2019)

    Google Scholar 

  65. Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: ZeRO: memory optimization towards training a trillion parameter models. arXiv abs/1910.02054 (2019)

  66. Laanait, N., et al.: Exascale deep learning for scientific inverse problems. arXiv. abs/1909.11150 (2019)

  67. Huang, Y., et al.: GPipe: efficient training of giant neural networks using pipeline parallelism. arXiv abs/1811.06965 (2018)

Download references

Acknowledgments

This material is based upon work supported in part by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC., for the U.S. Department of Energy under Contract DE-AC05-00OR22725. The work was partly supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, with specific thanks to the FASTMath Institutes.

This work was conducted at the Joint Institute for Computational Sciences (JICS) and the Innovative Computing Laboratory (ICL), sponsored by the National Science Foundation (NSF), through NSF REU Award #1659502 and NSF Award #1709069. This work used hardware donations from NVIDIA as well as the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant number ACI-1548562. Computational Resources are available through a XSEDE education allocation award TG-ASC170031.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stanimire Tomov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Archibald, R. et al. (2020). Integrating Deep Learning in Domain Sciences at Exascale. In: Nichols, J., Verastegui, B., Maccabe, A.‘., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63393-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63392-9

  • Online ISBN: 978-3-030-63393-6

  • eBook Packages: Computer ScienceComputer Science (R0)