Abstract
Gaussian processes are a widely used alternative to neural networks for non-linear system identification. The method requires computing the inversion of a large covariance matrix. In this work, we introduce our new task-based asynchronous implementation, focusing on its most popular solver, the Cholesky decomposition. Our implementation is based on HPX, utilizing its asynchronous many-task runtime system. We can therefore investigate its scaling on multi-core hardware and for GPU offloading. Furthermore, we compare our HPX implementation against a high-level reference implementation based on PETSc. We demonstrate that the HPX implementation’s performance is directly tied to the chosen tile size. Compared to the PETSc reference, our task-based implementation is faster in the entire node-level strong scaling experiment on EPYC ROME, showing better parallel efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agullo, E., et al.: Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model. Research report, Inria Bordeaux (2016)
Balay, S., et al.: PETSc, the portable, extensible toolkit for scientific computation. Argonne National Laboratory, vol. 2 (1998)
Basak, S., Petit, S., Bect, J., Vazquez, E.: Numerical issues in maximum likelihood parameter estimation for Gaussian process interpolation. In: Nicosia, G., et al. (eds.) Machine Learning, Optimization, and Data Science. LNCS, vol. 13164, pp. 116–131. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95470-3_9
Buttari, A., et al.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 38–53 (2009)
Chen, S., Billings, S.A., Grant, P.M.: Non-linear system identification using neural networks. Int. J. Control 51(6), 1191–1214 (1990)
Daiß, G., et al.: Beyond fork-join: integration of performance portable Kokkos kernels with HPX. In: 2021 IEEE IPDPSW, pp. 377–386 (2021)
Dongarra, J., et al.: Plasma: parallel linear algebra software for multicore using OpenMP. ACM Trans. Math. Softw. 45(2), 1–35 (2019)
Dorris, J., Kurzak, J., Luszczek, P., YarKhan, A., Dongarra, J.: Task-based cholesky decomposition on knights corner using OpenMP. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) High Performance Computing. LNCS, vol. 9945, pp. 544–562. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46079-6_37
Gates, M., et al.: Slate: design of a modern distributed and accelerated linear algebra library. In: SC 2019. Association for Computing Machinery (2019)
Huck, K., et al.: An autonomic performance environment for exascale. Supercomput. Front. Innov. 2, 49–66 (2015)
Intel: Intel math kernel library (2023). https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html
Kaiser, H., et al.: HPX - the C++ standard library for parallelism and concurrency. J. Open Source Softw. 5(53), 2352 (2020)
Kocijan, J.: Gaussian process models for systems identification (2008)
Kocijan, J.: Modelling and Control of Dynamic Systems Using Gaussian Process Models. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21021-6
Marcello, D.C., et al.: Octo-Tiger: a new, 3D hydrodynamic code for stellar mergers that uses HPX parallelization. Mon. Notices Royal Astron. Soc. 504(4), 5345–5382 (2021)
Poulson, J., et al.: Elemental: a new framework for distributed memory dense matrix computations. ACM Trans. Math. Softw. 39(2), 1–24 (2012)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press (2006)
Revay, M., Wang, R., Manchester, I.: A convex parameterization of robust recurrent neural networks. IEEE Contr. Syst. Lett. 5, 1363–1368 (2021)
Särkkä, S.: The Use of Gaussian Processes in System Identification. In: Baillieul, J., Samad, T. (eds.) Encyclopedia of Systems and Control, pp. 1–10. Springer, London (2019). https://doi.org/10.1007/978-1-4471-5102-9_100087-1
Schoukens, J., Ljung, L.: Nonlinear system identification: a user-oriented road map. IEEE Control Syst. 39, 28–99 (2019)
Thoman, P., et al.: A taxonomy of task-based technologies for high-performance computing. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10778, pp. 264–274. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78054-2_25
Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. J. Mach. Learn. Res. Proc. Track 5, 567–574 (2009)
Valero-Lara, P., et al.: sLASs: a fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs library). J. Parallel Distrib. Comput. 138, 153–171 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Strack, A., Pflüger, D. (2023). Scalability of Gaussian Processes Using Asynchronous Tasks: A Comparison Between HPX and PETSc. In: Diehl, P., Thoman, P., Kaiser, H., Kale, L. (eds) Asynchronous Many-Task Systems and Applications. WAMTA 2023. Lecture Notes in Computer Science, vol 13861. Springer, Cham. https://doi.org/10.1007/978-3-031-32316-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-32316-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32315-7
Online ISBN: 978-3-031-32316-4
eBook Packages: Computer ScienceComputer Science (R0)