Abstract
This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and upcoming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. Many deep learning frameworks are targeted at data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP. Advancements are also illustrated through the use of algorithmic enhancements in reduced- and mixed-precision, as well as asynchronous optimization methods. Finally, we present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications at ORNL and UTK with AI. The approaches and future challenges are illustrated in materials science, imaging, and climate applications.
This manuscript has been authored by UT-Battelle, LLC., under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptions


References
Ise, T., Oba, Y.: Forecasting climatic trends using neural networks: an experimental study using global historical data. Front. Robot. AI 6, 32 (2019)
Wang, J., Balaprakash, P., Kotamarthi, R.: Fast domain-aware neural network emulation of a planetary boundary layer parameterization in a numerical weather forecast model. Geosci. Model Dev. 12(10), 4261–4274 (2019)
Agrawal, A., Choudhary, A.: Deep materials informatics: applications of deep learning in materials science. MRS Commun. 9(3), 779–792 (2019)
Feng, S., Zhou, H., Dong, H.: Using deep neural network with small dataset to predict material defects. Mater. Des. 162, 300–310 (2019). Citation Key: FENG2019300
Ye, W., Chen, C., Wang, Z., Chu, I.-H., Ong, S.P.: Deep neural networks for accurate predictions of crystal stability. Nat. Commun. 9, 3800 (2018)
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an indepth concurrency analysis. ACM Comput. Surv. 1, 1–37 (2019)
Han, J., Xu, L., Rafique, M.M., Butt, A.R., Lim, S.: A quantitative study of deep learning training on heterogeneous supercomputers. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–12 (2019)
You, Y., Zhang, Z., Hsieh, C.-J., Demmel, J., Keutzer, K.: ImageNet training in minutes. In: Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018. NY, USA. Association for Computing Machinery, New York (2018)
Shazeer, N., et al.: Mesh-TensorFlow: deep learning for supercomputers. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10414–10423, Curran Associates Inc. (2018)
Geng, J., Li, D., Wang, S.: ElasticPipe: an efficient and dynamic model-parallel solution to DNN training. In: Proceedings of the 10th Workshop on Scientific Cloud Computing, ScienceCloud 2019, New York, USA, pp. 5–9. Association for Computing Machinery (2019)
Huang, Y.: GPipe: efficient training of giant neural networks using pipeline parallelism. In: Advances in Neural Information Processing Systems, vol. 32, pp. 103–112. Curran Associates Inc. (2019)
Gholami, A., Azad, A., Jin, P., Keutzer, K., Buluc, A.: Integrated model, batch, and domain parallelism in training neural networks. In: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, SPAA 2018, New York, USA, pp. 77–86. Association for Computing Machinery (2018)
Chen, C.-C., Yang, C.-L., Cheng, H.-Y.: Efficient and robust parallel DNN training through model parallelism on multi-GPU platform. arXiv abs/1809.02839 (2018)
Stevens, R.: Exascale computing: the coming integration of simulation, data and machine learning. In: European Technology Platform for High-Performance Computing (ETP4HPC) ISC’18 Workshop (2018)
Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. CoRR, vol. abs/1603.04467 (2016)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8026–8037. Curran Associates Inc. (2019). Citation Key: NIPS2019_9015
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, vol. abs/1512.01274 (2015)
Nichols, D., Wong, K., Tomov, S., Ng, L., Chen, S., Gessinger, A.: MagmaDNN: accelerated deep learning using MAGMA. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC 2019. NY, USA. Association for Computing Machinery, New York (2019)
Nichols, D., Tomov, N.-S., Betancourt, F., Tomov, S., Wong, K., Dongarra, J.: MagmaDNN: towards high-performance data analytics and machine learning for data-driven scientific computing. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds.) ISC High Performance 2019. LNCS, vol. 11887, pp. 490–503. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34356-9_37
Nichols, D., Febbo, R., Lopez, F., Wong, K., Tomov, S., Dongarra, J.: MagmaDNN (Version 1.2), July 2020. https://doi.org/10.5281/zenodo.3972406
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36, 232–240 (2010)
Ozik, J., Collier, N., Wozniak, J., Spagnuolo, C.: From desktop to large-scale model exploration with Swift/T. In: Proceedings of the 2016 Winter Simulation Conference (2016)
Wong, K., Trzil, Z.: Tuple space implementation in a parallel workflow engine, OpenDIEL. In: Student Paper, PEARC 2018 (2018)
Betancourt, F., Wong, K., Asemota, E., Marshall, Q., Nichols, D., Tomov, S.: openDIEL: a parallel workflow engine and data analytics framework. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines Learning, Student Paper, PEARC 2019, NY, USA. Association for Computing Machinery, New York (2019)
Niu, F., Recht, B., Re, C., Wright, S.J.: HOGWILD! A lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS 2011, Red Hook, NY, USA, pp. 693–701. Curran Associates Inc. (2011)
Sergeev, A., Balso, M.D.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)
Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1223–1231. Curran Associates Inc. (2012)
Sallinen, S., Satish, N., Smelyanskiy, M., Sury, S.S., Ré, C.: High performance parallel stochastic gradient descent in shared memory. In: IPDPS, pp. 873–882. IEEE Computer Society (2016)
Sa, C.D., Zhang, C., Olukotun, K., Ré, C.: Taming the wild: a unified analysis of HOG WILD! -style algorithms. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, Cambridge, MA, USA, vol. 2, pp. 2674–2682. MIT Press (2015)
Lian, X., Huang, Y., Li, Y., Liu, J.: Asynchronous parallel stochastic gradient for nonconvex optimization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2737–2745. Curran Associates Inc. (2015)
Lopez, F., Chow, E., Tomov, S., Dongarra, J.: Asynchronous SGD for DNN training on shared-memory parallel architectures. Technical report, ICL-UT-20-04. Innovative Computing Laboratory, University of Tennessee (March 2020). (To appear in IPDPSW’20 proceedings)
Wolfson-Pou, J., Chow, E.: Modeling the asynchronous Jacobi method without communication delays. J. Parallel Distrib. Comput. 128, 6 (2019)
Yamazaki, I., Chow, E., Bouteiller, A., Dongarra, J.: Performance of asynchronous optimized Schwarz with one-sided communication. Parallel Comput. 86, 66–81 (2019)
Courbariaux, M., Bengio, Y., David, J.-P.: Training deep neural networks with low precision multiplications (2014)
Intel Corporation: BFLOAT16–Hardware Numerics Definition. White paper. Document number 338302–001US, November 2018
Blanchard, P., Higham, N.J., Lopez, F., Mary, T., Pranesh, S.: Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores. SIAM J. Sci. Comput. 42(3), C124–C141 (2020)
Haidar, A., Tomov, S., Dongarra, J., Higham, N.J.: Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 1–11 (2018)
Sorna, A., Cheng, X., D’Azevedo, E., Wong, K., Tomov, S.: Optimizing the Fast Fourier Transform using mixed precision on tensor core hardware. In: 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), pp. 3–7 (2018)
Jain, A., et al.: Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1(1), 011002 (2013)
Gossett, E., et al.: AFLOW-ML: a RESTful API for machine-learning predictions of materials properties. Comput. Mater. Sci. 152, 134–145 (2018)
Kirklin, S., et al.: The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015)
Eisenbach, M., Zhou, C.-G., Nicholson, D.M., Brown, G., Larkin, J., Schulthess, T.C.: A scalable method for Ab Initio computation of free energies in nanoscale systems. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, New York, NY, USA, pp. 64:1–64:8. ACM (2009)
Eisenbach, M., Pei, Z., Liu, X.: First-principles study of order-disorder transitions in multicomponent solid-solution alloys. J. Phys. Condens. Matter 31, 273002 (2019)
Laanait, N., Borisevich, A., Yin, J.: A Database of Convergent Beam Electron Diffraction Patterns for Machine Learning of the Structural Properties of Materials (May 2019). https://doi.org/10.13139/OLCF/1510313
Sayood, K.: Introduction to Data Compression. The Morgan Kaufmann Series in Multimedia Information and Systems. Elsevier Science (2017)
Rheinboldt, W.C.: On the computation of multi-dimensional solution manifolds of parametrized equations. Numer. Math. 53(1), 165–181 (1988)
Williamson, D., Drake, J., Hack, J., Jakob, R., Swarztrauber, P.: A standard test set for numerical approximations to the shallow water equations in spherical geometry. J. Comput. Phys. 102, 211–224 (1992)
Nair, R.D., Jablonowski, C.: Moving vortices on the sphere: a test case for horizontal advection problems. Mon. Weather Rev. 136(2), 699–711 (2008)
Mcdonald, A., Bates, J.R., McDonald and Bates: Semi-Lagrangian integration of a shallow water model on the sphere. Mon. Weather Rev. 117, 130 (1989)
Galewsky, J., Scott, R., Polvani, L.: An initial-value problem for testing numerical models of the global shallow-water equations. Tellus 56A, 429–440 (2004)
Abramowitz, M., Stegun, I. (eds.): Handbook of Mathematical Functions, chap. 9. Dover Publications (1972)
Sadourny, R.: Conservative finite-difference approximations of the primitive equations on quasi-uniform spherical grids. Mon. Weather Rev. 100(2), 136–144 (1972)
Nair, R., Thomas, S., Loft, R.: A discontinuous Galerkin global shallow water model. Mon. Weather Rev. 133(4), 876–888 (2005)
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete Cosine transform. IEEE Trans. Comput. C–23(1), 90–93 (1974)
Tomov, S., Ayala, A., Haidar, A., Dongarra, J.: FFT-ECP API and high-performance library prototype for 2-D and 3-D FFTs on large-scale heterogeneous systems with GPUs. ECP WBS 2.3.3.13 Milestone Report FFT-ECP STML13-27. Innovative Computing Laboratory, University of Tennessee (2020)
Lee, S., et al.: Improving scalability of parallel CNN training by adjusting mini-batch size at run-time. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 830–839 (2019)
Exascale Computing Project. https://www.exascaleproject.org
Zhou, K., Zheng, Y., Li, B., Dong, W., Zhang, X.: Forecasting different types of convective weather: a deep learning approach. J. Meteorol. Res 33(5), 797–809 (2019). https://doi.org/10.1007/s13351-019-8162-6
Samsi, S., Mattioli, C., Mark, V.: Distributed deep learning for precipitation nowcasting. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC) (2019)
Keaney, M., Neal, T.: Comparing Deep Neural Network and Econometric Approaches to Predicting the Impact of Climate Change on Agricultural Yield. UNSW Economics Working Paper (2020)
Yi, X.X., Zhang, J., Wang, Z., Li, T., Zheng, Y.: Deep distributed fusion network for air quality prediction. In: Proceedings of KDD 2018, London, United Kingdom (2018)
Chen, K., Chen, K., Wang, Q., He, Z., Hu, J., He, J.: Short-term load forecasting with deep residual networks. arXiv abs/1805.11956v1 (2018)
Kurth, T., et al.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. IEEE Press (2018)
Laanait, N., et al.: Exascale deep learning for scientific inverse problems (2019)
Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: ZeRO: memory optimization towards training a trillion parameter models. arXiv abs/1910.02054 (2019)
Laanait, N., et al.: Exascale deep learning for scientific inverse problems. arXiv. abs/1909.11150 (2019)
Huang, Y., et al.: GPipe: efficient training of giant neural networks using pipeline parallelism. arXiv abs/1811.06965 (2018)
Acknowledgments
This material is based upon work supported in part by the Laboratory Directed Research and Development program at the Oak Ridge National Laboratory, which is operated by UT-Battelle, LLC., for the U.S. Department of Energy under Contract DE-AC05-00OR22725. The work was partly supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, with specific thanks to the FASTMath Institutes.
This work was conducted at the Joint Institute for Computational Sciences (JICS) and the Innovative Computing Laboratory (ICL), sponsored by the National Science Foundation (NSF), through NSF REU Award #1659502 and NSF Award #1709069. This work used hardware donations from NVIDIA as well as the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant number ACI-1548562. Computational Resources are available through a XSEDE education allocation award TG-ASC170031.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Archibald, R. et al. (2020). Integrating Deep Learning in Domain Sciences at Exascale. In: Nichols, J., Verastegui, B., Maccabe, A.‘., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-63393-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63392-9
Online ISBN: 978-3-030-63393-6
eBook Packages: Computer ScienceComputer Science (R0)
