Abstract
Extreme learning machine (ELM) is a type of randomized neural networks originally developed for linear classification and regression problems in the mid-2000s, and has recently been extended to computational partial differential equations (PDE). This method can yield highly accurate solutions to linear/nonlinear PDEs, but requires the last hidden layer of the neural network to be wide to achieve a high accuracy. If the last hidden layer is narrow, the accuracy of the existing ELM method will be poor, irrespective of the rest of the network configuration. In this paper we present a modified ELM method, termed HLConcELM (hidden-layer concatenated ELM), to overcome the above drawback of the conventional ELM method. The HLConcELM method can produce highly accurate solutions to linear/nonlinear PDEs when the last hidden layer of the network is narrow and when it is wide. The new method is based on a type of modified feedforward neural networks (FNN), termed HLConcFNN (hidden-layer concatenated FNN), which incorporates a logical concatenation of the hidden layers in the network and exposes all the hidden nodes to the output-layer nodes. HLConcFNNs have the interesting property that, given a network architecture, when additional hidden layers are appended to the network or when extra nodes are added to the existing hidden layers, the representation capacity of the HLConcFNN associated with the new architecture is guaranteed to be not smaller than that of the original network architecture. Here representation capacity refers to the set of all functions that can be exactly represented by the neural network of a given architecture. We present ample benchmark tests with linear/nonlinear PDEs to demonstrate the computational accuracy and performance of the HLConcELM method and the superiority of this method to the conventional ELM from previous works.
Similar content being viewed by others
Data Availability
The datasets related to this paper are available from the correpsonding author on reasonable request.
References
Alaba, P., Popoola, S., Olatomiwa, L., Akanle, M., Ohunakin, O., Adetiba, E., Alex, O., Atayero, A., Daud, W.: Towards a more efficient and cost-sensitive extreme learning machine: a state-of-the-art review of recent trend. Neurocomputing 350, 70–90 (2019)
Basdevant, C., Deville, M., Haldenwang, P., Lacroix, J., Ouazzani, J., Peyret, R., Orlandi, P., Patera, A.: Spectral and finite difference solutions of the Burgers equation. Comput. Fluids 14, 23–41 (1986)
Braake, H., Straten, G.: Random activation weight neural net (RAWN) for fast non-iterative training. Eng. Appl. Artif. Intell. 8, 71–80 (1995)
Branch, M., Coleman, T., Li, Y.: A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM J. Sci. Comput. 21, 1–23 (1999)
Byrd, R., Schnabel, R., Shultz, G.: Approximate solution of the trust region problem by minimization over two-dimensional subspaces. Math. Program. 40, 247–263 (1988)
Calabro, F., Fabiani, G., Siettos, C.: Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Comput. Methods Appl. Mech. Eng. 387, 114188 (2021)
Cortes, C., Gonzalvo, X., Kuznetsov, V., Mohri, M., Yang, S.: Adanet: adaptive structural learning of artificial neural networks. arXiv:1607.01097 (2016)
Cyr, E., Gulian, M., Patel, R., Perego, M., Trask, N.: Robust training and initialization of deep neural networks: an adaptive basis viewpoint. Proc. Mach. Learn. Res. 107, 512–536 (2020)
Dong, S., Li, Z.: Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Eng. 387, 114129 (2021)
Dong, S., Li, Z.: A modified batch intrinsic plasticity method for pre-training the random coefficients of extreme learning machines. J. Comput. Phys. 445, 110585 (2021)
Dong, S., Ni, N.: A method for representing periodic functions and enforcing exactly periodic boundary conditions with deep neural networks. J. Comput. Phys. 435, 110242 (2021)
Dong, S., Yang, J.: Numerical approximation of partial differential equations by a variable projection method with artificial neural networks. Comput. Methods Appl. Mech. Eng. 398, 115284 (2022)
Dong, S., Yang, J.: On computing the hyperparameter of extreme learning machines: algorithm and application to computational PDEs and comparison with classical and high-order finite elements. J. Comput. Phys. 463, 111290 (2022)
Driscoll, T., Hale, N., Trefethen, L.: Chebfun Guide. Pafnuty Publications, Oxford (2014)
Dwivedi, V., Srinivasan, B.: Physics informed extreme learning machine (pielm) \(-\) a rapid method for the numerical solution of partial differential equations. Neurocomputing 391, 96–118 (2020)
Dwivedi, V., Srinivasan, B.: A normal equation-based extreme learning machine for solving linear partial differential equations. J. Comput. Inf. Sci. Eng. 22, 014502 (2022)
Weinan, E., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6, 1–12 (2018)
Fabiani, G., Calabro, F., Russo, L., Siettos, C.: Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. J. Sci. Comput. 89, 44 (2021)
Fokina, D., Oseledets, I.: Growing axons: greedy learning of neural networks with application to function approximation. arXiv:1910.12686 (2020)
Freire, A., Rocha-Neto, A., Barreto, G.: On robust randomized neural networks for regression: a comprehensive review and evaluation. Neural Comput. Appl. 32, 16931–16950 (2020)
Galaris, E., Fabiani, G., Calabro, F., Serafino, D., Siettos, C.: Numerical solution of stiff ODEs with physics-informed random projection neural networks. arXiv:2108.01584 (2021)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016)
Guo, P., Chen, C., Sun, Y.: An exact supervised learning for a three-layer supervised neural network. In: Proceedings of 1995 International Conference on Neural Information Processing, pp. 1041–1044 (1995)
He, J., Xu, J.: MgNet: a unified framework for multigrid and convolutional neural network. Sci. China Math. 62, 1331–1354 (2019)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELU). arXiv:1606.08415 (2016)
Huang, G., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17, 879–892 (2006)
Huang, G., Huang, G., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.: Densely connected convolutional networks. arXiv:1608.06993 (2018)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990 (2004)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)
Igelnik, B., Pao, Y.: Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans. Neural Netw. 6, 1320–1329 (1995)
Jaeger, H., Lukosevicius, M., Popovici, D., Siewert, U.: Optimization and applications of echo state networks with leaky integrator neurons. Neural Netw. 20, 335–352 (2007)
Jagtap, A., Kharazmi, E., Karniadakis, G.: Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 365, 113028 (2020)
Karniadakis, G., Kevrekidis, G., Lu, L., Perdikaris, P., Wang, S., Yang, L.: Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021)
Karniadakis, G., Sherwin, S.: Spectral/hp Element Methods for Computational Fluid Dynamics, 2nd edn. Oxford University Press, Oxford (2005)
Katuwal, R., Suganthan, P., Tanveer, M.: Random vector functional link neural network based ensemble deep learning. arXiv:1907.00350 (2019)
Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., Mahoney, M.: Characterizing possible failure modes in physics-informed neural networks. arXiv:2109.01050 (2021)
Kuramoto, Y.: Diffusion-induced chaos in reaction systems. Prog. Theor. Phys. Suppl. 64, 346–367 (1978)
Li, J.Y., Chow, W., Igelnik, B., Pao, Y.H.: Comments on “stochastic choice of basis functions in adaptive function approximation and the functional-link net’’. IEEE Trans. Neural Netw. 8, 452–454 (1997)
Liu, H., Xing, B., Wang, Z., Li, L.: Legendre neural network method for several classes of singularly perturbed differential equations based on mapping and piecewise optimization technology. Neural Process. Lett. 51, 2891–2913 (2020)
Liu, M., Hou, M., Wang, J., Cheng, Y.: Solving two-dimensional linear partial differential equations based on Chebyshev neural network with extreme learning machine algorithm. Eng. Comput. 38, 874–894 (2021)
Lu, L., Meng, X., Mao, Z., Karniadakis, G.: DeepXDE: a deep learning library for solving differential equations. SIAM Rev. 63, 208–228 (2021)
Lukosevicius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009)
Maas, W., Markram, H.: On the computational power of recurrent circuits of spiking neurons. J. Comput. Syst. Sci. 69, 593–616 (2004)
Needell, D., Nelson, A., Saab, R., Salanevich, P.: Random vector functional link networks for function approximation on manifolds. arXiv:2007.15776 (2020)
Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
Panghal, S., Kumar, M.: Optimization free neural network approach for solving ordinary and partial differential equations. Eng. Comput. 37, 2989–3002 (2021)
Pao, Y., Park, G., Sobajic, D.: Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6, 163–180 (1994)
Pao, Y., Takefuji, Y.: Functional-link net computing: theory, system architecture, and functionalities. Computer 25, 76–79 (1992)
Rahimi, A., Recht, B.: Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 2, pp. 1316–1323 (2008)
Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)
Scardapane, S., Wang, D.: Randomness in neural networks: an overview. WIREs Data Mining Knowl. Discov. 7, e1200 (2017)
Sirignano, J., Spoliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)
Sivashinsky, G.: Nonlinear analysis of hydrodynamic instability in laminar flames—I. Derivation of basic equations. Acta Astronautica 4, 1177–1206 (1977)
Suhanthan, P., Katuwal, R.: On the origins of randomization-based feedforward neural networks. Appl. Soft Comput. 105, 107239 (2021)
Sun, H., Hou, M., Yang, Y., Zhang, T., Weng, F., Han, F.: Solving partial differential equations based on Bernstein neural network and extreme learning machine algorithm. Neural Process. Lett. 50, 1153–1172 (2019)
Tang, K., Wan, X., Liao, Q.: Adaptive deep density estimation for Fokker–Planck equations. J. Comput. Phys. 457, 111080 (2022)
Verma, B., Mulawka, J.: A modified backpropagation algorithm. In: Proceedings of 1994 IEEE International Conference on Neural Networks, vol. 2, pp. 840–844 (1994)
Wan, X., Wei, S.: VAE-KRnet and its applications to variational Bayes. Commun. Comput. Phys. 31, 1049–1082 (2022)
Wang, S., Yu, X., Perdikaris, P.: When and why PINNs fail to train: a neural tangent kernel perspective. J. Comput. Phys. 449, 110768 (2022)
Wang, Y., Lin, G.: Efficient deep learning techniques for multiphase flow simulation in heterogeneous porous media. J. Comput. Phys. 401, 108968 (2020)
Webster, C.: Alan Turing’s unorganized machines and artificial neural networks: his remarkable early work and future possibilities. Evol. Intell. 5, 35–43 (2012)
Widrow, B., Greenblatt, A., Kim, Y., Park, D.: The no-prop algorithm: a new learning algorithm for multilayer neural networks. Neural Netw. 37, 182–188 (2013)
Wilamowski, B., Yu, H.: Neural network learning without backpropagation. IEEE Trans. Neural Netw. 21, 1793–1803 (2010)
Winovich, N., Ramani, K., Lin, G.: ConvPDE-UQ: convolutional neural networks with quantified uncertainty for heterogeneous elliptic partial differential equations on varied domains. J. Comput. Phys. 394, 263–279 (2019)
Yang, Y., Hou, M., Luo, J.: A novel improved extreme learning machine algorithm in solving ordinary differential equations by Legendre neural network methods. Adv. Differ. Equ. 469, 1–24 (2018)
Yang, Z., Dong, S.: An unconditionally energy-stable scheme based on an implicit auxiliary energy variable for incompressible two-phase flows with different densities involving only precomputable coefficient matrices. J. Comput. Phys. 393, 229–257 (2019)
Yang, Z., Dong, S.: A roadmap for discretely energy-stable schemes for dissipative systems based on a generalized auxiliary variable with guaranteed positivity. J. Comput. Phys. 404, 109121 (2020)
Yang, Z., Lin, L., Dong, S.: A family of second-order energy-stable schemes for Cahn–Hilliard type equations. J. Comput. Phys. 383, 24–54 (2019)
Zhang, L., Suganthan, P.: A comprehensive evaluation of random vector functional link networks. Inf. Sci. 367–368, 1094–1105 (2016)
Zheng, X., Dong, S.: An eigen-based high-order expansion basis for structured spectral elements. J. Comput. Phys. 230, 8573–8602 (2011)
Funding
This work was partially supported by US National Science Foundation (DMS-2012415).
Author information
Authors and Affiliations
Contributions
NN: software, data acquisition, data visualization, data analysis, writing of paper. SD: conceptualization, methodology, software, data acquisition, data analysis, writing of paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proofs of Theorems from Sect. 2
Proof of Theorem 1
Consider an arbitrary \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_1,\sigma )\), where \(\varvec{\theta }\in {{\mathbb {R}}}^{N_{h1}}\) and \(\varvec{\beta }\in {{\mathbb {R}}}^{N_{c1}}\), with \(N_{h1}=\sum _{i=1}^{L-1}(m_{i-1}+1)m_i\) and \(N_{c1}=\sum _{i=1}^{L-1}m_i\). Let \(w_{kj}^{(i)}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant k\leqslant m_{i-1}\), \(1\leqslant j\leqslant m_i\)) and \(b^{(i)}_j\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)) denote the hidden-layer weight/bias coefficients of the associated HLConcFNN(\({{\textbf{M}}}_1,\sigma \)), and let \(\beta _{ij}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)) denote the output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_1,\sigma \)). \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\) is given by (7).
Consider a function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\) with \(\varvec{\vartheta }\in {{\mathbb {R}}}^{N_{h2}}\) and \(\varvec{\alpha }\in {{\mathbb {R}}}^{N_{c2}}\), where \(N_{c2}=N_{c1}+n\), and \(N_{h2}=N_{h1}+(m_{L-1}+1)n\). We will choose \(\varvec{\vartheta }\) and \(\varvec{\alpha }\) such that \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}}) = u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\). We construct \(\varvec{\vartheta }\) and \(\varvec{\alpha }\) by setting the hidden-layer and the output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) as follows.
The HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) has L hidden layers. We set the weight/bias coefficients in its last hidden layer (with n nodes) to arbitrary values. We set those coefficients that connect the output node and the n nodes in the last hidden layer to all zeros. For the rest of the hidden-layer coefficients and the output-layer coefficients in HLConcFNN(\({{\textbf{M}}}_2,\sigma \)), we use those corresponding coefficient values from the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)).
More specifically, let \(\xi _{kj}^{(i)}\) and \(\eta _j^{(i)}\) denote the weight/bias coefficients in the hidden layers, and \(\alpha _{ij}\) denote the output-layer coefficients, of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) associated with the function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). We set these coefficients by,
With the above coefficients, the last hidden layer of the network HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) may output arbitrary fields, which however have no effect on the output field of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) because \(\alpha _{Lj}=0\) (\(1\leqslant j\leqslant n\)). The rest of the hidden nodes in HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) and the output node of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) produce fields that are identical to those of the corresponding nodes in the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)). We thus conclude that \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})=v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). So \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\), and the relation (9) holds. \(\square \)
Proof of Theorem 2
We use the same strategy as that in the proof of Theorem 1. Consider an arbitrary \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_1,\sigma )\), where \(\varvec{\theta }\in {{\mathbb {R}}}^{N_{h1}}\) and \(\varvec{\beta }\in {{\mathbb {R}}}^{N_{c1}}\), with \(N_{h1}=\sum _{i=1}^{L-1}(m_{i-1}+1)m_i\) and \(N_{c1}=\sum _{i=1}^{L-1}m_i\). The hidden-layer coefficients of the associated HLConcFNN(\({{\textbf{M}}}_1,\sigma \)) are denoted by \(w_{kj}^{(i)}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant k\leqslant m_{i-1}\), \(1\leqslant j\leqslant m_i\)) and \(b^{(i)}_j\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)), and the output-layer coefficients are denoted by \(\beta _{ij}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)). \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\) is given by (7).
Consider a function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\) with \(\varvec{\vartheta }\in {{\mathbb {R}}}^{N_{h2}}\) and \(\varvec{\alpha }\in {{\mathbb {R}}}^{N_{c2}}\), where \(N_{c2}=N_{c1}+1\), and \(N_{h2}=N_{h1}+(m_{s-1}+1)+m_{s+1}\) if \(1\leqslant s\leqslant L-2\) and \(N_{h2}=N_{h1}+(m_{s-1}+1)\) if \(s=L-1\). We construct \(\varvec{\vartheta }\) and \(\varvec{\alpha }\) by setting the hidden-layer and the output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) as follows.
In HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) we set the weight coefficients that connect the extra node of layer s to those nodes in layer \((s+1)\) to all zeros, and we also set the weight coefficient that connects the extra node of layer s with the output node to zero. We set the weight coefficients that connect the nodes of layer \((s-1)\) to the extra node of layer s to arbitrary values, and also set the bias coefficient corresponding to the extra node of layer s to an arbitrary value. For the rest of the hidden-layer and output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)), we use those corresponding coefficient values from the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)).
Specifically, let \(\xi _{kj}^{(i)}\) and \(\eta _j^{(i)}\) denote the weight/bias coefficients in the hidden layers, and \(\alpha _{ij}\) denote the output-layer coefficients, of the HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) associated with \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). We set these coefficients by,
With the above coefficients, the extra node in layer s of the network HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) may output an arbitrary field, which however has no contribution to the output field of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)). The rest of the hidden nodes and the output node of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) produce identical fields as the corresponding nodes in the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)). We thus conclude that \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})=v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). So \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\) and the relation (10) holds. \(\square \)
Proof of Theorem 3
We use the same strategy as that in the proof of Theorem 1. Consider an arbitrary \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_1,\sigma ,\varvec{\theta })\), where \(\varvec{\beta }\in {{\mathbb {R}}}^{N_{c1}}\) with \(N_{c1}=\sum _{i=1}^{L-1}m_i\). We will try to construct an equivalent function from \(U({\varOmega },{{\textbf{M}}}_2,\sigma ,\varvec{\vartheta })\).
We consider another function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma ,\varvec{\vartheta })\), where \(\varvec{\alpha }\in {{\mathbb {R}}}^{N_{c2}}\) with \(N_{c2}=N_{c1}+n\), and we set the coefficients of the HLConcELM corresponding to \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\) as follows. Since \(\varvec{\vartheta }[1:N_{h1}]=\varvec{\theta }[1:N_{h1}]\), the random coefficients in the first \((L-1)\) hidden layers of the HLConcELM corresponding to \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\) are identical to those corresponding hidden-layer coefficients in the HLConcELM for \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\). We set the weight/bias coefficients in the L-th hidden layer of the HLConcELM for \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\), which contains n nodes, to arbitrary random values. For the output-layer coefficients of the HLConcELM for \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\), we set those coefficients that connect the hidden nodes in the first \((L-1)\) hidden layers and the output node to be identical to those corresponding output-layer coefficients in the HLConcELM for \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\), namely, \(\varvec{\alpha }[1:N_{c1}]=\varvec{\beta }[1:N_{c1}]\). We set those coefficients that connect the hidden nodes of the L-th hidden layer and the output node to be zeros in the HLConcELM for \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\), namely, \(\varvec{\alpha }[N_{c1}+1:N_{c2}]=0\).
With the above coefficient settings, the output fields of those nodes in the first \((L-1)\) hidden layers of HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)) are identical to those corresponding nodes of HLConcELM(\({{\textbf{M}}}_1,\sigma ,\varvec{\theta }\)). The output fields of those n nodes in the L-th hidden layer of HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)) are arbitrary, which however have no contribution to the output field of HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)). The output field of the HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)) is identical to that of the HLConcELM(\({{\textbf{M}}}_1,\sigma ,\varvec{\theta }\)), i.e. \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})=u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\). We thus conclude that \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma ,\varvec{\vartheta })\) and the relation (13) holds. \(\square \)
Appendix B. Numerical Tests with Several Activation Functions
We have employed the Gaussian activation function for all the numerical simulations in Sect. 3. This appendix provides additional HLConcELM results using several other activation functions for solving the variable-coefficient Poisson problem from Sect. 3.1. Table 7 lists the activation functions studied below, including tanh, RePU-8, sinc, GELU and swish (in addition to Gaussian), as well as the hidden magnitude vector \({{\textbf{R}}}\) employed for each activation function. Here “RePU-8” stands for the rectified power unit of degree 8, and “GELU” denotes the Gaussian error linear unit [25].
Table 8 lists the maximum, rms and \(h^1\) errors of the HLConcELM solutions obtained using these activation functions on a neural network [2, 800, 50, 1] with two uniform sets of collocation points \(Q=15\times 15\) and \(30\times 30\). Table 9 lists the maximum, rms and \(h^1\) errors of HLConcELM using these activation functions on two neural networks of the architecture [2, M, 50, 1] (with \(M=400\) and 800) with a fixed uniform set of \(Q=35\times 35\) collocation points. One can observe a general exponential decrease in the errors with these activation functions, except for the RePU-8 function in Table 8 (where the errors seem to saturate). The results with the RePU function appears markedly less accurate than those obtained with the other activation functions studied here.
Appendix C. Additional Comparisons Between HLConcELM and Conventional ELM
This appendix provides additional comparisons between the current HLConcELM method and the conventional ELM method for the variable-coefficient Poisson problem (Sect. 3.1) and the nonlinear Helmholtz problem (Sect. 3.3).
In those comparisons between HLConcELM and conventional ELM presented in Sect. 3, the base neural-network architectures for HLConcELM and conventional ELM are maintained to be the same. HLConcELM is able to harvest the degrees of freedom in all the hidden layers of the neural network, thanks to the logical connections between all the hidden nodes and the output nodes (due to the hidden-layer concatenation). On the other hand, the conventional ELM only exploits the degrees of freedom afforded by the last hidden layer of the network, while those degrees of freedom provided by the preceding hidden layers are essentially “wasted” (see the discussions in Sect. 2.1). This is why the conventional ELM exhibits a poor accuracy if the last hidden layer is narrow, irrespective of the rest of the network configuration. This also accounts for why the HLConcELM method can achieve a high accuracy when the last hidden layer is narrow and when it is wide.
Note that with HLConcELM the number of training parameters equals the total number of hidden nodes in the neural network, and with conventional ELM it equals the number of nodes in the last hidden layer. Under the same base network architecture (with multiple hidden layers), the number of training parameters in HLConcELM is larger than that in the conventional ELM, because the HLConcELM also exploits the the hidden nodes from the preceding hidden layers.
In what follows we present several additional numerical tests to compare HLConcELM and conventional ELM, under the configuration that the number of training parameters in both HLConcELM and conventional ELM is maintained to be the same. Because of their different characteristics, the base network architectures for HLConcELM and for conventional ELM in this case will inevitably not be identical. In the comparisons below we try to keep the two architectures close to each other, specifically by using the same depth, and the same width for each hidden layer except the last, for both HLConcELM and conventional ELM. The width of the last hidden layer in the HLConcELM network and in the conventional-ELM network is different, with the conventional ELM being wider (and in some cases considerably wider), while the number of training parameters is kept the same in both.
Tables 10 and 11 show comparisons of the maximum and rms errors versus the number of collocation points obtained by HLConcELM and by conventional ELM for the variable-coefficient Poisson problem from Sect. 3.1. The results in Table 10 are attained with two hidden layers in the neural network and a total of 850 training parameters. The results in Table 11 correspond to three hidden layers in the neural network with a total of 900 training parameters. The HLConcELM data in Table 10 for the networks [2, 800, 50, 1] and [2, 50, 800, 1] correspond to those in Table 1. The simulation parameter values are listed in the tables or provided in the table captions. The exponential convergence of the errors with respect to the number of collocation points is evident in all test cases. The error levels from HLConcELM and the conventional ELM are close, reaching the order around \(10^{-8}\) in terms of the maximum error and \(10^{-9}\) in terms of the rms error. The error values resulting from HLConcELM in general appear better than those from the Conventional ELM, e.g. by comparing the HLConcELM results (with [2, 800, 50, 1]) and the conventional ELM results (with [2, 800, 850, 1]) in Table 10 or comparing the HLConcELM results (with [2, 800, 50, 50, 1]) and the conventional ELM results (with [2, 800, 50, 900, 1]) in Table 11. But this is not true for every test case; see e.g. the case Q=25\(\times \)25 between HLConcELM (with [2, 50, 800, 1]) and conventional ELM (with [2, 50, 850, 1]) in Table 10 or the cases Q=15\(\times \)15 and 20\(\times \)20 between HLConcELM (with [2, 50, 50, 800, 1]) and conventional ELM (with [2, 50, 50, 900, 1]) in Table 11.
Tables 12 and 13 show the comparisons between HLConcELM and conventional ELM for the nonlinear Helmholtz problem from Sect. 3.3. The results in Table 12 correspond to two hidden layers in the neural network with a total of 530 training parameters, and those in Table 13 correspond to three hidden layers in the neural network with a total of 560 training parameters. The simulation parameter values are provided in the table captions or listed in the tables. Note that the HLConcELM data in Table 12 correspond to those in Table 4 with the networks [2, 500, 30, 1] and [2, 30, 500, 1]. The relative performance between HLConcELM and conventional ELM exhibited by these data is similar to what has been observed from Tables 10 and 11 for the variable-coefficient Poisson equation. The error levels resulting from HLConcELM and conventional ELM are quite close, on the order of \(10^{-6}\) or \(10^{-7}\) in terms of the maximum error and \(10^{-7}\) or \(10^{-8}\) in terms of the rms error. Overall the error values from HLConcELM appear slightly better than those from the conventional ELM; see e.g. those data in Table 12 and the cases between HLConcELM with [2, 30, 30, 500, 1] and conventional ELM with [2, 30, 30, 560, 1] in Table 13. But this is not consistently so for all the test cases; see e.g. the cases between HLConcELM with [2, 500, 30, 30, 1] and conventional ELM with [2, 500, 30, 560, 1] in Table 13.
It is noted that in all these test cases the neural network for the conventional ELM has a wide last hidden layer. This is consistent with the observation that the conventional ELM is only accurate when the last hidden layer is wide.
Appendix D. Laplace Equation Around a Reentrant Corner
This appendix provides a test of the HLConcELM method with the Laplace equation around a reentrant corner, where the solution is not smooth. Figure 25 is a sketch of the L-shaped domain \({\varOmega }={\overline{OABCDEO}}\) (with an reentrant corner at O) employed in this test. We consider the following problem on \({\varOmega }\),
where u(x, y) is the field to be solved for, \((r,\theta )\) denotes the polar coordinate, and \(k\geqslant 1\) is a prescribed integer. This problem has the following solution,
The integer k influences the regularity of the solution. If k is a multiple of 3, then the solution u(x, y) is smooth (\(C^{\infty }\)) on \({\varOmega }\). Otherwise, the solution is non-smooth, with its \(\lceil \frac{2k}{3} \rceil \)-th derivative being singular at the reentrant corner. We solve this problem by the HLConcELM method, and employ a set of uniform grid points in the sub-regions \({\overline{OABF}}\), \({\overline{OFCG}}\) and \({\overline{OGDE}}\) as the collocation points. Figure 25 shows a set of \(Q=3\times (10\times 10)\) uniform collocation points on the domain as an example. The Gaussian activation function is employed in the neural network. We employ a fixed seed value 10 for the random number generators.
Figure 26 shows distributions of the exact solution (38), the HLConcELM solution and its point-wise absolute error, corresponding to three different solution fields with \(k=1\), 3 and 5. The values for the simulation parameters are provided in the figure caption. The HLConcELM result is extremely accurate for the case with a smooth solution (\(k=3\)), with the maximum error on the order \(10^{-11}\) in the domain. On the other hand, the HLConcELM solution is much less accurate for the non-smooth cases (\(k=1,5\)), with the maximum error around \(10^{-1}\) for \(k=1\) and around \(10^{-4}\) for \(k=5\). One can note that the computed HLConcELM solution is more accurate for a smoother solution field (larger k).
Tables 14 and 15 illustrate the convergence behavior of the HLConcELM errors with respect to the number of hidden nodes in the neural network and the number of collocation points (Q). Several cases corresponding to smooth and non-smooth solution fields are shown. The simulation parameter values are provided in the captions of these tables. The neural network architecture is given by [2, M, 50, 1], where M is either fixed at \(M=800\) or varied systematically. The set of collocation points is either fixed at \(Q=3\times (20\times 20)\) or varied systematically. For the smooth case (\(k=3\)), the HLConcELM solution exhibits an exponential convergence with respect to M and Q. For the non-smooth cases (\(k=1,2,5\)), the convergence is much slower and in general quite slow. However, if the solution is smoother (larger k), we can generally observe an initial exponential decrease in the HLConcELM errors as M or Q increases, and that the error reduction slows down as M or Q reaches a certain level. For example, with the case \(k=5\) one can observe in Table 14 the initial exponential decrease in the errors with increasing M for \(M\leqslant 300\).
Appendix E. Kuramoto–Sivashinsky Equation
This appendix provides a test of the HLConcELM method with the Kuramoto–Sivashinsky equation [38, 55]. We consider the domain \((x,t)\in {\varOmega } = [a,b]\times [0,t_f]\), and the Kuramoto–Sivashinsky equation on \({\varOmega }\) with periodic boundary conditions,
In these equations, \((\alpha ,\beta ,\gamma )\) are constants, u(x, t) is the field function to be solved for, f is a prescribed source term, and g denotes the initial distribution. The domain parameters a, b and \(t_f\) will be specified below. We solve this problem by the locHLConcELM method (see Remark 6) together with the block time marching scheme (see Remark 5). The seed for the random number generators is set to 100 in the following tests.
Case #1: Manufactured Analytic Solution We first consider a manufactured analytic solution to (39) to illustrate the convergence behavior of HLConcELM. We employ the following parameter values,
and the analytic solution given by
The source term f and the initial distribution g are chosen such that the expression (40) satisfies the system (39). The distribution of this solution is shown in Fig. 27a.
The distributions of the HLConcELM solution and its point-wise absolute error are shown in Fig. 27b, c. We have employed 5 uniform time blocks in the HLConcELM simulation, and a neural network architecture [2, 400, 50, 1] with the Gaussian activation function within each time block. The other simulation parameter values are provided in the caption of Fig. 27. The HLConcELM method captures the solution accurately, with the maximum error on the order \(10^{-8}\) in the spatial-temporal domain.
Tables 16 and 17 illustrate the exponential convergence behavior of the HLConcELM accuracy with respect to the collocation points and the network size for the Kuramoto–Sivashinsky equation. Table 16 lists the HLConcELM errors versus the number of collocation points (Q) obtained with two neural networks, with a narrow and wide last hidden layer, respectively. Table 17 shows the HLConcELM errors versus the number of nodes (M) in the first or the last hidden layer of the neural network, obtained with a fixed set of \(Q=25\times 25\) uniform collocation points. The captions of these tables provide the parameter values in these simulations. It can be observed that the HLConcELM errors decrease exponentially as the number of collocation points or the network size increases.
Case #2: No Exact Solution and Comparison with Chebfun We next consider the following parameter values and settings:
The exact solution for this case is unknown. We will employ the result computed by the software package Chebfun [14], with a sufficient resolution, as the reference solution to compare with HLConcELM.
Figure 28 shows the solution distributions obtained by the locHLConcELM method and by Chebfun in the spatial-temporal domain for this case. With locHLConcELM, we have employed 20 uniform time blocks, 4 uniform sub-domains (along the x direction) within each time block, and a local neural network [2, 400, 1] with \(Q=25\times 25\) uniform collocation points on each sub-domain. The sine activation, \(\sigma (x)=\sin (x)\), has been employed with the local neural networks. The Chebfun solution is obtained with 400 Fourier grid points along the x direction and a time step size \({\varDelta } t=10^{-4}\). The locHLConcELM solution agrees well with the Chebfun solution qualitatively.
Figure 29 provides quantitative comparisons between locHLConcELM and Chebfun for this case. It compares the solution profiles obtained by these two methods at three time instants \(t=0.2\), 0.5 and 0.8 (top row), and also shows the corresponding profiles of the absolute error between these two methods (bottom row). No difference can be discerned from the solution profiles between locHLConcELM and Chebfun. The errors between these two methods generally increase over time, with the maximum error on the order \(10^{-6}\) at \(t=0.2\) and \(10^{-4}\) at \(t=0.5\) and 0.8. These results indicate that the current method has captured the solution quite accurately.
Case #3: Another Comparison With Chebfun We consider still another set of problem parameters as follows:
We again compare the HLConcELM result with the reference solution computed by Chebfun.
Figure 30 compares distributions of the locHLConcELM solution and the Chebfun solution. With locHLConcELM, we have employed 12 time blocks, 10 uniform sub-domains along the x direction within each time block, a local neural network [2, 300, 1] and a uniform set of \(Q=21\times 21\) collocation points on each sub-domain. The random magnitude vector is \({{\textbf{R}}}=2.5\), and the sinc activation function (\(\sigma (x)=\frac{\sin (\pi x)}{\pi x}\)) is employed. With Chebfun, we have employed 1000 Fourier grid points along the x direction and a time step size \({\varDelta } t=10^{-5}\). The distribution of the locHLConcELM solution is qualitatively similar to that of the Chebfun solution.
Figure 31 provides a quantitative comparison of the solution profiles between locHLConcELM and Chebfun at several time instants (top row), and also shows the corresponding profiles of the absolute error between the locHLConcELM solution and the Chebfun solution (bottom row). The locHLConcELM solution agrees very well with the Chebfun solution initially, and the difference between these two solutions grows over time.
Appendix F. Schrodinger Equation
This appendix provides a test of the HLConcELM method with the Schrodinger equation. We consider the domain \((x,t)\in {\varOmega }=[a,b]\times [0,t_f]\), and the Schrodinger equation on \({\varOmega }\) with periodic boundary conditions:
where h(x, t) is the complex field function to be solved for, f(x, t) is a prescribed complex source term, and g(x) is the initial distribution. Let \(h = u(x,t) + iv(x,t)\), where u and v denote the real and the imaginary parts of h, respectively. The domain parameters a, b and \(t_f\) will be specified below.
We solve this problem by the HLConcELM method, or the locHLConcELM method (see Remark 6), combined with the block time marching scheme (see Remark 5). The input layer of the neural network consists of two nodes, representing x and t, respectively. The output layer consists also of two nodes, representing the real part and the imaginary part of h(x, t), respectively. Accordingly, the system (41) is re-written into an equivalent form in terms of the real part and the imaginary part of h(x, t). The reformulated system is employed in the HLConcELM simulation. When multiple sub-domains are employed in locHLConcELM, we impose \(C^1\) continuity conditions along the x direction and \(C^0\) continuity conditions along the t direction across the shared sub-domain boundaries. The seed for the random number generators is set to 100 in the HLConcELM simulations.
Case #1: Manufactured Analytic Solution We first illustrate the convergence behavior of HLConcELM using a manufactured analytic solution. We employ the following domain parameters,
and the analytic solution \(h = u+iv\), where
The source term f(x, t) and the initial distribution g(x, t) are chosen such that the expression (42) satisfies the system (41).
Figure 32 shows distributions of the real part u, the imaginary part v, and the norm |h| of the HLConcELM solution h(x, t), as well as their point-wise absolute errors when compared with the analytic solution (42), in the spatial-temporal domain. The neural network architecture is given by \({{\textbf{M}}}=[2, 400, 30, 2]\), and the other simulation parameter values are listed in the figure caption. The HLConcELM solution is observed to be highly accurate, and the maximum error on the order \(10^{-8}\) for all of these quantities.
The exponential convergence of the HLConcELM accuracy is illustrated by the data in Tables 18 and 19. Table 18 lists the maximum and rms errors of the real part, the imaginary part, and the norm of h(x, t) as a function of the number of collocation points (Q) obtained by HLConcELM on two neural networks with a narrow and a wide last hidden layer, respectively. Table 19 lists the HLConcELM errors for the real/imaginary parts and the norm of h(x, t) on two network architectures having two hidden layers, with the number of nodes (M) in the first or the last hidden layer varied. The values for the simulation parameters are listed in the captions of these two tables. It is evident that the HLConcELM errors decrease approximately exponentially (before saturation) as the number of collocation points or the number of nodes in the neural network increases.
Case #2: No Exact Solution and Comparison with Chebfun We next consider a case with no exact solution available, and so we use the numerical result obtained by Chebfun [14] as a reference to compare with the HLConcELM solution. We employ the following parameter values for the domain and the system (41),
Figure 33 illustrates the distributions of the locHLConcELM solution and the Chebfun solution for the real part, the imaginary part, and the norm of h(x, t). The Chebfun solution is obtained on 1024 Fourier grid points along the x direction with a time step size \({\varDelta } t=10^{-4}\). For locHLConcELM with block time marching, we have employed 5 uniform time blocks, 3 sub-domains along the x direction within each time block (interior sub-domain boundaries located at \(x=-0.35\) and 0.35), and a local neural network \({{\textbf{M}}}=[2,400,2]\) with the Gaussian activation function on each sub-domain. The other simulation parameter values are listed in the figure caption. No apparent difference can be discerned between the locHLConcELM solution and the Chebfun solution qualitatively.
Figure 34 provides a comparison between the locHLConcELM solution and the Chebfun solution quantitatively. Figure 34a–c compare profiles of the locHLConcELM solution and the Chebfun solution for the real part, the imaginary part and the norm of h(x, t) at three time instants \(t=0.2\), 0.5 and 0.8. The locHLConcELM solution profiles and the Chebfun solution profiles exactly overlap with one another. Figures 34d–f show profiles of the absolute error between the locHLConcELM solution and the Chebfun solution at the same time instants. One can observe that the difference between locHLConcELM and Chebfun is on the order of \(10^{-4}\), suggesting that the locHLConcELM result agrees well with the Chebfun result for this problem.
Appendix G. Two-Dimensional Advection Equation
This appendix provides a further test of the HLConcELM method with the advection equation in two spatial dimensions plus time. Note that the numerical results in Sect. 3.2 are for the one-dimensional advection equation (plus time). We consider the spatial-temporal domain \((x,y,t)\in {\varOmega }=[0,2]\times [0,2]\times [0,10]\), and the advection equation on \({\varOmega }\) with periodic boundary conditions,
This initial/boundary value problem has the following exact solution,
We solve the problem (43) by the HLConcELM method together with block time marching (see Remark 5). We employ 20 uniform time blocks in the simulation, with a size 0.5 for each time block. Within each time block we employ a neural network architecture [3, M, 1] (M varied) with the Gaussian activation function, and a uniform set of \(Q=Q_1\times Q_1\times Q_1\) collocation points (\(Q_1\) varied). The seed for the random number generators is set to 100 in the numerical tests. After the network is trained, we evaluate the neural network on another fixed set of \(Q_{eval}=51\times 51\times 51\) uniform grid points within each time block to attain the HLConcELM solution values for all time blocks. We then compare the HLConcELM solution with the exact solution (44) on the same set of \(Q_{eval}\) points within each time block, to compute the maximum (\(l^{\infty }\)) and rms (\(l^2\)) errors over the entire spatial-temporal domain \({\varOmega }\). The error values as computed above are said to be associated with the neural network [3, M, 1] with the \(Q=Q_1\times Q_1\times Q_1\) collocation points.
Figure 35 illustrates the distributions of the exact solution (44), the HLConcELM solution, and the point-wise absolute error of the HLConcELM solution over \({\varOmega }\). The values for the simulation parameters in HLConcELM are provided in the figure caption. The HLConcELM solution is quite accurate, with the maximum error on the order \(10^{-6}\) over the entire domain \({\varOmega }\).
The exponential convergence of the HLConcELM accuracy is illustrated by Tables 20 and 21. Table 20 lists the maximum and rms errors of HLConcELM (over \({\varOmega }\)) as a function of the number of collocation points (Q), obtained with a neural network [3, 1000, 1]. Table 21 lists the maximum and rms errors of HLConcELM as a function of the number of hidden nodes (M) in the neural network, obtained on a fixed set of \(Q=15\times 15\times 15\) uniform collocation points. The other simulation parameter values are provided in the captions of these tables. It is evident that the HLConcELM errors decrease exponentially (before saturation) with increasing number of collocation points or increasing number of hidden nodes in the network.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ni, N., Dong, S. Numerical Computation of Partial Differential Equations by Hidden-Layer Concatenated Extreme Learning Machine. J Sci Comput 95, 35 (2023). https://doi.org/10.1007/s10915-023-02162-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-023-02162-0
Keywords
- Extreme learning machine
- Hidden layer concatenation
- Random weight neural networks
- Least squares
- Scientific machine learning
- Random basis