Optimization of Random Feature Method in the High-Precision Regime

Chen, Jingrun; E, Weinan; Sun, Yifei

doi:10.1007/s42967-024-00389-8

Optimization of Random Feature Method in the High-Precision Regime

Original Paper
Published: 30 March 2024

(2024)
Cite this article

Communications on Applied Mathematics and Computation Aims and scope Submit manuscript

270 Accesses
Explore all metrics

Abstract

Machine learning has been widely used for solving partial differential equations (PDEs) in recent years, among which the random feature method (RFM) exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency. Potentially, the optimization problem in the RFM is more difficult to solve than those that arise in traditional methods. Unlike the broader machine-learning research, which frequently targets tasks within the low-precision regime, our study focuses on the high-precision regime crucial for solving PDEs. In this work, we study this problem from the following aspects: (i) we analyze the coefficient matrix that arises in the RFM by studying the distribution of singular values; (ii) we investigate whether the continuous training causes the overfitting issue; (iii) we test direct and iterative methods as well as randomized methods for solving the optimization problem. Based on these results, we find that direct methods are superior to other methods if memory is not an issue, while iterative methods typically have low accuracy and can be improved by preconditioning to some extent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bolstering stochastic gradient descent with model building

Article Open access 15 April 2024

MultiPINN: multi-head enriched physics-informed neural networks for differential equations solving

Article 15 April 2024

CasADi: a software framework for nonlinear optimization and optimal control

Article 11 July 2018

References

Alaoui, A., Mahoney, M.W.: Fast randomized kernel ridge regression with statistical guarantees. In: advances in neural information processing systems, vol. 28, pp. 775–783. Curran Associates Inc., New York (2015)
Amestoy, P.R., Duff, I.S., Koster, J., L’Excellent, J.-Y.: A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23(1), 15–41 (2001)
Article MathSciNet Google Scholar
Anderson, E., Bai, Z., Bischof, C., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1995)
Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput. 32(3), 1217–1236 (2010)
Article MathSciNet Google Scholar
Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: Proceedings of the 26th Annual Conference on Learning Theory, PMLR, pp. 185–209 (2013)
Balay, S., Gropp, W., McInnes, L.C., Smith, B.F.: PETSc: the portable, extensible toolkit for scientific computation. Argonne National Laboratory, vol. 2, no. 17 (1998)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)
Article MathSciNet Google Scholar
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc. Natl. Acad. Sci. USA 116(32), 15849–15854 (2019)
Article MathSciNet Google Scholar
Blackford, L.S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K., Whaley, R.C. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28(2), 135–151 (2002)
Bollhöfer, M., Schenk, O., Janalik, R., Hamm, S., Gullapalli, K.: State-of-the-art sparse direct solvers. In: Grama, A., Sameh, A. (eds) Parallel Algorithms in Computational Science and Engineering. Modeling and Simulation in Science, Engineering and Technology. pp. 3–33. Birkhäuser, Cham (2020)
Calabrò, F., Fabiani, G., Siettos, C.: Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Comput. Methods Appl. Mech. Eng. 387(1), 114–188 (2021)
MathSciNet Google Scholar
Chandra, R., Dagum, L., Kohr, D., Menon, R., Maydan, D., McDonald, J.: Parallel Programming in OpenMP. Morgan Kaufmann, Burlington (2001)
Google Scholar
Chen, J., Chi, X., E, W.: Bridging traditional and machine learning-based algorithms for solving PDEs: the random feature method. J. Mach. Learn. 1(3), 268–298 (2022)
Chen, Z., Schaeffer, H.: Conditioning of random feature matrices: double descent and generalization error. arXiv:2110.11477 (2021)
Cybenko, G.V.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Article MathSciNet Google Scholar
Davis, T.A.: Direct Methods for Sparse Linear Systems. Fundamentals of Algorithms. SIAM, Philadelphia (2006)
Davis, T.A.: Algorithm 915, SuiteSparseQR: multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1), 8 (2011)
Article MathSciNet Google Scholar
Dong, S., Li, Z.: Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Eng. 387(1), 114–129 (2021)
MathSciNet Google Scholar
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J. Comput. 36(1), 158–183 (2006)
Article MathSciNet Google Scholar
E, W., Bing, Y.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018)
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5(4), 349–380 (2017)
E, W., Wang, Q.: Exponential convergence of the deep neural network approximation for analytic functions. Sci. China Math. 61(10), 1733–1740 (2018)
Elmroth, E., Gustavson, F.G.: Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. Dev. 44(4), 605–624 (2000)
Article Google Scholar
Falgout, R.D., Yang, U.M.: hypre: A library of high performance preconditioners. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science—ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol. 2331, pp. 632–641. Springer, Berlin, Heidelberg (2002)
Fong, D.C.-L., Saunders, M.: LSMR: an iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33(5), 2950–2971 (2011)
Article MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins Studies in the Mathematical Sciences, vol. 3. Johns Hopkins University Press, Baltimore (2013)
Gould, N., Scott, J.: The state-of-the-art of preconditioners for sparse linear least-squares problems. ACM Trans. Math. Softw. 43(4), 1–35 (2017)
Article MathSciNet Google Scholar
Guennebaud, G., et al.: Eigen v3. http://eigen.tuxfamily.org (2010)
Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet Google Scholar
Han, J., Jentzen, A., E, W.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34), 8505–8510 (2018)
Hénon, P., Ramet, P., Roman, J.: PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems. Parallel Comput. 28, 301–321 (2002)
Article MathSciNet Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 49(6), 409 (1952)
Article MathSciNet Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2019 (Revision of IEEE 754-2008), 1–84 (2019)
Karczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Pol. Sci. Lett. 35, 355–357 (1937)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR preprint. arXiv:1412.6980 (2014)
Lehoucq, R.B., Sorensen, D.C., Yang, C.: RPACK users’ guide—solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods. Software, environments, tools (1998)
LeVeque, R.J.: Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems. SIAM, Philadelphia (2007)
Book Google Scholar
Li, X.S.: An overview of superlu: algorithms, implementation, and user interface. ACM Trans. Math. Softw. 31, 302–325 (2003)
Article MathSciNet Google Scholar
Luo, T., Yang, H.: Two-layer neural networks for partial differential equations: optimization and generalization theory. arXiv:2006.15733 (2020)
Neal, R.M.: Bayesian learning for neural networks. Ph.D. thesis, University of Toronto, Toronto (1995)
Nguyen, V.P., Rabczuk, T., Bordas, S., Duflot, M.: Meshless methods: a review and computer implementation aspects. Math. Comput. Simul. 79(3), 763–813 (2008)
Article MathSciNet Google Scholar
Nocedal, J., Wright, S.J., Optimization, N.: Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)
Google Scholar
Owhadi, H., Zhang, L.: Metric-based upscaling. Commun. Pure Appl. Math. 60(5), 675–723 (2007)
Article MathSciNet Google Scholar
Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8(1), 43–71 (1982)
Article MathSciNet Google Scholar
Polak, E.: Introduction to linear and nonlinear programming. IEEE Trans. Autom. Control 19(3), 290–290 (1974)
Article Google Scholar
Rahaman, N., Baratin, A., Arpit, D., Dräxler, F., Lin, M., Hamprecht, F.A., Bengio, Y., Courville, A.C.: On the spectral bias of neural networks. In: International Conference on Machine Learning (2018)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems (NIPS), vol. 20. Curran Associates Inc. (2007)
Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Article MathSciNet Google Scholar
Ruder, S.: An overview of gradient descent optimization algorithms. CoRR preprint. arXiv:1609.04747 (2016)
Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 1, pp. 1657–1665. MIT Press, Cambridge (2015)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM (2003)
Shen, J., Tang, T., Wang, L.-L.: Spectral Methods: Algorithms, Analysis and Applications, vol. 41. Springer Science & Business Media, New York (2011)
Sirignano, J.A., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Article MathSciNet Google Scholar
Sutherland, D.J., Schneider, J.: On the error of random Fourier features. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (Arlington, Virginia, USA), UAI’15. pp. 862–871. AUAI Press (2015)
The Trilinos Project Team. The Trilinos Project Website (2023). https://trilinos.github.io
Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K.J., Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C.J., Polat, İ, Feng, Y., Moore, E.W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P., SciPy 1.0 Contributors: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020)
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2000)
Xu, Z.-Q.J.: Frequency principle: Fourier analysis sheds light on deep neural networks. Commun. Comput. Phys. 28(5), 1746–1767 (2020)
Article MathSciNet Google Scholar
Yang, Y., Hou, M., Luo, J.: A novel improved extreme learning machine algorithm in solving ordinary differential equations by Legendre neural network methods. Adv. Difference Equ. 2018(1), 1–24 (2018)
Article MathSciNet Google Scholar
Zang, Y., Bao, G., Ye, X., Zhou, H.: Weak adversarial networks for high-dimensional partial differential equations. J. Comput. Phys. 411, 109409 (2020)
Article MathSciNet Google Scholar
Zhang, X., Wang, Q., Zhang, Y.: Model-driven level 3 BLAS performance optimization on Loongson 3a processor. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp. 684–691 (2012)
Zienkiewicz, O.C., Taylor, R.L., Zhu, J.Z.: The Finite Element Method: Its Basis and Fundamentals. Elsevier, Amsterdam (2005)
Google Scholar
Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The work is supported by the NSFC Major Research Plan—Interpretable and General-purpose Next-generation Artificial Intelligence (No. 92370205).

Author information

Authors and Affiliations

School of Mathematical Sciences and Suzhou Institute for Advanced Research, Suzhou, 215006, Jiangsu, China
Jingrun Chen
University of Science and Technology of China, Hefei, 230026, Anhui, China
Jingrun Chen
Center for Machine Learning Research and School of Mathematical Sciences, Peking University, Beijing, 100871, China
Weinan E
AI for Science Institute, Beijing, 100084, China
Weinan E
School of Mathematical Sciences, Soochow University, Suzhou, 215006, Jiangsu, China
Yifei Sun

Authors

Jingrun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weinan E
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yifei Sun.

Ethics declarations

Conflict of Interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Additional information

Dedicated to Professor Stanley Osher on the occasion of his 80th birthday.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, J., E, W. & Sun, Y. Optimization of Random Feature Method in the High-Precision Regime. Commun. Appl. Math. Comput. (2024). https://doi.org/10.1007/s42967-024-00389-8

Download citation

Received: 01 March 2023
Revised: 12 February 2024
Accepted: 13 February 2024
Published: 30 March 2024
DOI: https://doi.org/10.1007/s42967-024-00389-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization of Random Feature Method in the High-Precision Regime

Abstract

Access this article

Similar content being viewed by others

Bolstering stochastic gradient descent with model building

MultiPINN: multi-head enriched physics-informed neural networks for differential equations solving

CasADi: a software framework for nonlinear optimization and optimal control

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Optimization of Random Feature Method in the High-Precision Regime

Abstract

Access this article

Similar content being viewed by others

Bolstering stochastic gradient descent with model building

MultiPINN: multi-head enriched physics-informed neural networks for differential equations solving

CasADi: a software framework for nonlinear optimization and optimal control

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation