Skip to main content
Log in

Numerical Computation of Partial Differential Equations by Hidden-Layer Concatenated Extreme Learning Machine

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Extreme learning machine (ELM) is a type of randomized neural networks originally developed for linear classification and regression problems in the mid-2000s, and has recently been extended to computational partial differential equations (PDE). This method can yield highly accurate solutions to linear/nonlinear PDEs, but requires the last hidden layer of the neural network to be wide to achieve a high accuracy. If the last hidden layer is narrow, the accuracy of the existing ELM method will be poor, irrespective of the rest of the network configuration. In this paper we present a modified ELM method, termed HLConcELM (hidden-layer concatenated ELM), to overcome the above drawback of the conventional ELM method. The HLConcELM method can produce highly accurate solutions to linear/nonlinear PDEs when the last hidden layer of the network is narrow and when it is wide. The new method is based on a type of modified feedforward neural networks (FNN), termed HLConcFNN (hidden-layer concatenated FNN), which incorporates a logical concatenation of the hidden layers in the network and exposes all the hidden nodes to the output-layer nodes. HLConcFNNs have the interesting property that, given a network architecture, when additional hidden layers are appended to the network or when extra nodes are added to the existing hidden layers, the representation capacity of the HLConcFNN associated with the new architecture is guaranteed to be not smaller than that of the original network architecture. Here representation capacity refers to the set of all functions that can be exactly represented by the neural network of a given architecture. We present ample benchmark tests with linear/nonlinear PDEs to demonstrate the computational accuracy and performance of the HLConcELM method and the superiority of this method to the conventional ELM from previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Data Availability

The datasets related to this paper are available from the correpsonding author on reasonable request.

References

  1. Alaba, P., Popoola, S., Olatomiwa, L., Akanle, M., Ohunakin, O., Adetiba, E., Alex, O., Atayero, A., Daud, W.: Towards a more efficient and cost-sensitive extreme learning machine: a state-of-the-art review of recent trend. Neurocomputing 350, 70–90 (2019)

    Google Scholar 

  2. Basdevant, C., Deville, M., Haldenwang, P., Lacroix, J., Ouazzani, J., Peyret, R., Orlandi, P., Patera, A.: Spectral and finite difference solutions of the Burgers equation. Comput. Fluids 14, 23–41 (1986)

    MATH  Google Scholar 

  3. Braake, H., Straten, G.: Random activation weight neural net (RAWN) for fast non-iterative training. Eng. Appl. Artif. Intell. 8, 71–80 (1995)

    Google Scholar 

  4. Branch, M., Coleman, T., Li, Y.: A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM J. Sci. Comput. 21, 1–23 (1999)

    MathSciNet  MATH  Google Scholar 

  5. Byrd, R., Schnabel, R., Shultz, G.: Approximate solution of the trust region problem by minimization over two-dimensional subspaces. Math. Program. 40, 247–263 (1988)

    MathSciNet  MATH  Google Scholar 

  6. Calabro, F., Fabiani, G., Siettos, C.: Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Comput. Methods Appl. Mech. Eng. 387, 114188 (2021)

    MathSciNet  MATH  Google Scholar 

  7. Cortes, C., Gonzalvo, X., Kuznetsov, V., Mohri, M., Yang, S.: Adanet: adaptive structural learning of artificial neural networks. arXiv:1607.01097 (2016)

  8. Cyr, E., Gulian, M., Patel, R., Perego, M., Trask, N.: Robust training and initialization of deep neural networks: an adaptive basis viewpoint. Proc. Mach. Learn. Res. 107, 512–536 (2020)

    Google Scholar 

  9. Dong, S., Li, Z.: Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Comput. Methods Appl. Mech. Eng. 387, 114129 (2021)

    MathSciNet  MATH  Google Scholar 

  10. Dong, S., Li, Z.: A modified batch intrinsic plasticity method for pre-training the random coefficients of extreme learning machines. J. Comput. Phys. 445, 110585 (2021)

    MathSciNet  MATH  Google Scholar 

  11. Dong, S., Ni, N.: A method for representing periodic functions and enforcing exactly periodic boundary conditions with deep neural networks. J. Comput. Phys. 435, 110242 (2021)

    MathSciNet  MATH  Google Scholar 

  12. Dong, S., Yang, J.: Numerical approximation of partial differential equations by a variable projection method with artificial neural networks. Comput. Methods Appl. Mech. Eng. 398, 115284 (2022)

    MathSciNet  MATH  Google Scholar 

  13. Dong, S., Yang, J.: On computing the hyperparameter of extreme learning machines: algorithm and application to computational PDEs and comparison with classical and high-order finite elements. J. Comput. Phys. 463, 111290 (2022)

    MathSciNet  MATH  Google Scholar 

  14. Driscoll, T., Hale, N., Trefethen, L.: Chebfun Guide. Pafnuty Publications, Oxford (2014)

    Google Scholar 

  15. Dwivedi, V., Srinivasan, B.: Physics informed extreme learning machine (pielm) \(-\) a rapid method for the numerical solution of partial differential equations. Neurocomputing 391, 96–118 (2020)

    Google Scholar 

  16. Dwivedi, V., Srinivasan, B.: A normal equation-based extreme learning machine for solving linear partial differential equations. J. Comput. Inf. Sci. Eng. 22, 014502 (2022)

    Google Scholar 

  17. Weinan, E., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6, 1–12 (2018)

    MathSciNet  MATH  Google Scholar 

  18. Fabiani, G., Calabro, F., Russo, L., Siettos, C.: Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. J. Sci. Comput. 89, 44 (2021)

    MathSciNet  MATH  Google Scholar 

  19. Fokina, D., Oseledets, I.: Growing axons: greedy learning of neural networks with application to function approximation. arXiv:1910.12686 (2020)

  20. Freire, A., Rocha-Neto, A., Barreto, G.: On robust randomized neural networks for regression: a comprehensive review and evaluation. Neural Comput. Appl. 32, 16931–16950 (2020)

    Google Scholar 

  21. Galaris, E., Fabiani, G., Calabro, F., Serafino, D., Siettos, C.: Numerical solution of stiff ODEs with physics-informed random projection neural networks. arXiv:2108.01584 (2021)

  22. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  23. Guo, P., Chen, C., Sun, Y.: An exact supervised learning for a three-layer supervised neural network. In: Proceedings of 1995 International Conference on Neural Information Processing, pp. 1041–1044 (1995)

  24. He, J., Xu, J.: MgNet: a unified framework for multigrid and convolutional neural network. Sci. China Math. 62, 1331–1354 (2019)

    MathSciNet  MATH  Google Scholar 

  25. Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELU). arXiv:1606.08415 (2016)

  26. Huang, G., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17, 879–892 (2006)

    Google Scholar 

  27. Huang, G., Huang, G., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)

    MATH  Google Scholar 

  28. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.: Densely connected convolutional networks. arXiv:1608.06993 (2018)

  29. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985–990 (2004)

  30. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)

    Google Scholar 

  31. Igelnik, B., Pao, Y.: Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans. Neural Netw. 6, 1320–1329 (1995)

    Google Scholar 

  32. Jaeger, H., Lukosevicius, M., Popovici, D., Siewert, U.: Optimization and applications of echo state networks with leaky integrator neurons. Neural Netw. 20, 335–352 (2007)

    MATH  Google Scholar 

  33. Jagtap, A., Kharazmi, E., Karniadakis, G.: Conservative physics-informed neural networks on discrete domains for conservation laws: applications to forward and inverse problems. Comput. Methods Appl. Mech. Eng. 365, 113028 (2020)

    MathSciNet  MATH  Google Scholar 

  34. Karniadakis, G., Kevrekidis, G., Lu, L., Perdikaris, P., Wang, S., Yang, L.: Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021)

    Google Scholar 

  35. Karniadakis, G., Sherwin, S.: Spectral/hp Element Methods for Computational Fluid Dynamics, 2nd edn. Oxford University Press, Oxford (2005)

    MATH  Google Scholar 

  36. Katuwal, R., Suganthan, P., Tanveer, M.: Random vector functional link neural network based ensemble deep learning. arXiv:1907.00350 (2019)

  37. Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., Mahoney, M.: Characterizing possible failure modes in physics-informed neural networks. arXiv:2109.01050 (2021)

  38. Kuramoto, Y.: Diffusion-induced chaos in reaction systems. Prog. Theor. Phys. Suppl. 64, 346–367 (1978)

    Google Scholar 

  39. Li, J.Y., Chow, W., Igelnik, B., Pao, Y.H.: Comments on “stochastic choice of basis functions in adaptive function approximation and the functional-link net’’. IEEE Trans. Neural Netw. 8, 452–454 (1997)

    Google Scholar 

  40. Liu, H., Xing, B., Wang, Z., Li, L.: Legendre neural network method for several classes of singularly perturbed differential equations based on mapping and piecewise optimization technology. Neural Process. Lett. 51, 2891–2913 (2020)

    Google Scholar 

  41. Liu, M., Hou, M., Wang, J., Cheng, Y.: Solving two-dimensional linear partial differential equations based on Chebyshev neural network with extreme learning machine algorithm. Eng. Comput. 38, 874–894 (2021)

    Google Scholar 

  42. Lu, L., Meng, X., Mao, Z., Karniadakis, G.: DeepXDE: a deep learning library for solving differential equations. SIAM Rev. 63, 208–228 (2021)

    MathSciNet  MATH  Google Scholar 

  43. Lukosevicius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009)

    MATH  Google Scholar 

  44. Maas, W., Markram, H.: On the computational power of recurrent circuits of spiking neurons. J. Comput. Syst. Sci. 69, 593–616 (2004)

    MATH  Google Scholar 

  45. Needell, D., Nelson, A., Saab, R., Salanevich, P.: Random vector functional link networks for function approximation on manifolds. arXiv:2007.15776 (2020)

  46. Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  47. Panghal, S., Kumar, M.: Optimization free neural network approach for solving ordinary and partial differential equations. Eng. Comput. 37, 2989–3002 (2021)

    Google Scholar 

  48. Pao, Y., Park, G., Sobajic, D.: Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6, 163–180 (1994)

    Google Scholar 

  49. Pao, Y., Takefuji, Y.: Functional-link net computing: theory, system architecture, and functionalities. Computer 25, 76–79 (1992)

    Google Scholar 

  50. Rahimi, A., Recht, B.: Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 2, pp. 1316–1323 (2008)

  51. Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)

    MathSciNet  MATH  Google Scholar 

  52. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)

    Google Scholar 

  53. Scardapane, S., Wang, D.: Randomness in neural networks: an overview. WIREs Data Mining Knowl. Discov. 7, e1200 (2017)

    Google Scholar 

  54. Sirignano, J., Spoliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018)

    MathSciNet  MATH  Google Scholar 

  55. Sivashinsky, G.: Nonlinear analysis of hydrodynamic instability in laminar flames—I. Derivation of basic equations. Acta Astronautica 4, 1177–1206 (1977)

    MathSciNet  MATH  Google Scholar 

  56. Suhanthan, P., Katuwal, R.: On the origins of randomization-based feedforward neural networks. Appl. Soft Comput. 105, 107239 (2021)

    Google Scholar 

  57. Sun, H., Hou, M., Yang, Y., Zhang, T., Weng, F., Han, F.: Solving partial differential equations based on Bernstein neural network and extreme learning machine algorithm. Neural Process. Lett. 50, 1153–1172 (2019)

    Google Scholar 

  58. Tang, K., Wan, X., Liao, Q.: Adaptive deep density estimation for Fokker–Planck equations. J. Comput. Phys. 457, 111080 (2022)

    MATH  Google Scholar 

  59. Verma, B., Mulawka, J.: A modified backpropagation algorithm. In: Proceedings of 1994 IEEE International Conference on Neural Networks, vol. 2, pp. 840–844 (1994)

  60. Wan, X., Wei, S.: VAE-KRnet and its applications to variational Bayes. Commun. Comput. Phys. 31, 1049–1082 (2022)

    MathSciNet  MATH  Google Scholar 

  61. Wang, S., Yu, X., Perdikaris, P.: When and why PINNs fail to train: a neural tangent kernel perspective. J. Comput. Phys. 449, 110768 (2022)

    MathSciNet  MATH  Google Scholar 

  62. Wang, Y., Lin, G.: Efficient deep learning techniques for multiphase flow simulation in heterogeneous porous media. J. Comput. Phys. 401, 108968 (2020)

    MathSciNet  Google Scholar 

  63. Webster, C.: Alan Turing’s unorganized machines and artificial neural networks: his remarkable early work and future possibilities. Evol. Intell. 5, 35–43 (2012)

    Google Scholar 

  64. Widrow, B., Greenblatt, A., Kim, Y., Park, D.: The no-prop algorithm: a new learning algorithm for multilayer neural networks. Neural Netw. 37, 182–188 (2013)

    Google Scholar 

  65. Wilamowski, B., Yu, H.: Neural network learning without backpropagation. IEEE Trans. Neural Netw. 21, 1793–1803 (2010)

    Google Scholar 

  66. Winovich, N., Ramani, K., Lin, G.: ConvPDE-UQ: convolutional neural networks with quantified uncertainty for heterogeneous elliptic partial differential equations on varied domains. J. Comput. Phys. 394, 263–279 (2019)

    MathSciNet  MATH  Google Scholar 

  67. Yang, Y., Hou, M., Luo, J.: A novel improved extreme learning machine algorithm in solving ordinary differential equations by Legendre neural network methods. Adv. Differ. Equ. 469, 1–24 (2018)

    MathSciNet  MATH  Google Scholar 

  68. Yang, Z., Dong, S.: An unconditionally energy-stable scheme based on an implicit auxiliary energy variable for incompressible two-phase flows with different densities involving only precomputable coefficient matrices. J. Comput. Phys. 393, 229–257 (2019)

    MathSciNet  MATH  Google Scholar 

  69. Yang, Z., Dong, S.: A roadmap for discretely energy-stable schemes for dissipative systems based on a generalized auxiliary variable with guaranteed positivity. J. Comput. Phys. 404, 109121 (2020)

    MathSciNet  MATH  Google Scholar 

  70. Yang, Z., Lin, L., Dong, S.: A family of second-order energy-stable schemes for Cahn–Hilliard type equations. J. Comput. Phys. 383, 24–54 (2019)

    MathSciNet  MATH  Google Scholar 

  71. Zhang, L., Suganthan, P.: A comprehensive evaluation of random vector functional link networks. Inf. Sci. 367–368, 1094–1105 (2016)

    Google Scholar 

  72. Zheng, X., Dong, S.: An eigen-based high-order expansion basis for structured spectral elements. J. Comput. Phys. 230, 8573–8602 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was partially supported by US National Science Foundation (DMS-2012415).

Author information

Authors and Affiliations

Authors

Contributions

NN: software, data acquisition, data visualization, data analysis, writing of paper. SD: conceptualization, methodology, software, data acquisition, data analysis, writing of paper.

Corresponding author

Correspondence to Suchuan Dong.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs of Theorems from Sect. 2

Proof of Theorem 1

Consider an arbitrary \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_1,\sigma )\), where \(\varvec{\theta }\in {{\mathbb {R}}}^{N_{h1}}\) and \(\varvec{\beta }\in {{\mathbb {R}}}^{N_{c1}}\), with \(N_{h1}=\sum _{i=1}^{L-1}(m_{i-1}+1)m_i\) and \(N_{c1}=\sum _{i=1}^{L-1}m_i\). Let \(w_{kj}^{(i)}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant k\leqslant m_{i-1}\), \(1\leqslant j\leqslant m_i\)) and \(b^{(i)}_j\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)) denote the hidden-layer weight/bias coefficients of the associated HLConcFNN(\({{\textbf{M}}}_1,\sigma \)), and let \(\beta _{ij}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)) denote the output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_1,\sigma \)). \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\) is given by (7).

Consider a function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\) with \(\varvec{\vartheta }\in {{\mathbb {R}}}^{N_{h2}}\) and \(\varvec{\alpha }\in {{\mathbb {R}}}^{N_{c2}}\), where \(N_{c2}=N_{c1}+n\), and \(N_{h2}=N_{h1}+(m_{L-1}+1)n\). We will choose \(\varvec{\vartheta }\) and \(\varvec{\alpha }\) such that \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}}) = u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\). We construct \(\varvec{\vartheta }\) and \(\varvec{\alpha }\) by setting the hidden-layer and the output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) as follows.

The HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) has L hidden layers. We set the weight/bias coefficients in its last hidden layer (with n nodes) to arbitrary values. We set those coefficients that connect the output node and the n nodes in the last hidden layer to all zeros. For the rest of the hidden-layer coefficients and the output-layer coefficients in HLConcFNN(\({{\textbf{M}}}_2,\sigma \)), we use those corresponding coefficient values from the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)).

More specifically, let \(\xi _{kj}^{(i)}\) and \(\eta _j^{(i)}\) denote the weight/bias coefficients in the hidden layers, and \(\alpha _{ij}\) denote the output-layer coefficients, of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) associated with the function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). We set these coefficients by,

$$\begin{aligned} \xi _{kj}^{(i)}= & {} \left\{ \begin{array}{ll} w_{kj}^{(i)}, &{} \text {for}\ 1\leqslant i\leqslant L-1,\ 1\leqslant k\leqslant m_{i-1},\ 1\leqslant j\leqslant m_i; \\ \text {arbitrary value}, &{} \text {for}\ i=L,\ 1\leqslant k\leqslant m_{L-1},\ 1\leqslant j\leqslant n; \end{array} \right. \end{aligned}$$
(31)
$$\begin{aligned} \eta _j^{(i)}= & {} \left\{ \begin{array}{ll} b_j^{(i)}, &{} \text {for all}\ 1\leqslant i\leqslant L-1,\ 1\leqslant j\leqslant m_i; \\ \text {arbitrary value}, &{} \text {for}\ i=L,\ 1\leqslant j\leqslant n; \end{array} \right. \end{aligned}$$
(32)
$$\begin{aligned} \alpha _{ij}= & {} \left\{ \begin{array}{ll} \beta _{ij},&{} \text {for}\ 1\leqslant i\leqslant L-1,\ 1\leqslant j\leqslant m_i; \\ 0, &{} \text {for}\ i=L,\ 1\leqslant j\leqslant n. \end{array} \right. \end{aligned}$$
(33)

With the above coefficients, the last hidden layer of the network HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) may output arbitrary fields, which however have no effect on the output field of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) because \(\alpha _{Lj}=0\) (\(1\leqslant j\leqslant n\)). The rest of the hidden nodes in HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) and the output node of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) produce fields that are identical to those of the corresponding nodes in the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)). We thus conclude that \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})=v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). So \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\), and the relation (9) holds. \(\square \)

Proof of Theorem 2

We use the same strategy as that in the proof of Theorem 1. Consider an arbitrary \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_1,\sigma )\), where \(\varvec{\theta }\in {{\mathbb {R}}}^{N_{h1}}\) and \(\varvec{\beta }\in {{\mathbb {R}}}^{N_{c1}}\), with \(N_{h1}=\sum _{i=1}^{L-1}(m_{i-1}+1)m_i\) and \(N_{c1}=\sum _{i=1}^{L-1}m_i\). The hidden-layer coefficients of the associated HLConcFNN(\({{\textbf{M}}}_1,\sigma \)) are denoted by \(w_{kj}^{(i)}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant k\leqslant m_{i-1}\), \(1\leqslant j\leqslant m_i\)) and \(b^{(i)}_j\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)), and the output-layer coefficients are denoted by \(\beta _{ij}\) (\(1\leqslant i\leqslant L-1\), \(1\leqslant j\leqslant m_i\)). \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\) is given by (7).

Consider a function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\) with \(\varvec{\vartheta }\in {{\mathbb {R}}}^{N_{h2}}\) and \(\varvec{\alpha }\in {{\mathbb {R}}}^{N_{c2}}\), where \(N_{c2}=N_{c1}+1\), and \(N_{h2}=N_{h1}+(m_{s-1}+1)+m_{s+1}\) if \(1\leqslant s\leqslant L-2\) and \(N_{h2}=N_{h1}+(m_{s-1}+1)\) if \(s=L-1\). We construct \(\varvec{\vartheta }\) and \(\varvec{\alpha }\) by setting the hidden-layer and the output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) as follows.

In HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) we set the weight coefficients that connect the extra node of layer s to those nodes in layer \((s+1)\) to all zeros, and we also set the weight coefficient that connects the extra node of layer s with the output node to zero. We set the weight coefficients that connect the nodes of layer \((s-1)\) to the extra node of layer s to arbitrary values, and also set the bias coefficient corresponding to the extra node of layer s to an arbitrary value. For the rest of the hidden-layer and output-layer coefficients of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)), we use those corresponding coefficient values from the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)).

Specifically, let \(\xi _{kj}^{(i)}\) and \(\eta _j^{(i)}\) denote the weight/bias coefficients in the hidden layers, and \(\alpha _{ij}\) denote the output-layer coefficients, of the HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) associated with \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). We set these coefficients by,

$$\begin{aligned} \xi _{kj}^{(i)}= & {} \left\{ \begin{array}{ll} w_{kj}^{(i)}, &{} \text {for all}\ (1\leqslant i\leqslant s-1,\ \text {or}\ s+2\leqslant i\leqslant L-1),\\ &{} \quad 1\leqslant k\leqslant m_{i-1},\ 1\leqslant j\leqslant m_i; \\ w_{kj}^{(s)}, &{} \text {for}\ i=s,\ 1\leqslant k\leqslant m_{s-1},\ 1\leqslant j\leqslant m_s; \\ \text {arbitrary value}, &{} \text {for}\ i=s,\ 1\leqslant k\leqslant m_{s-1},\ j=m_{s}+1; \\ w_{kj}^{(s+1)}, &{} \text {for}\ i=s+1,\ 1\leqslant k\leqslant m_{s},\ 1\leqslant j\leqslant m_{s+1}; \\ 0, &{} \text {for}\ i=s+1,\ k=m_s+1,\ 1\leqslant j\leqslant m_{s+1}; \end{array} \right. \end{aligned}$$
(34)
$$\begin{aligned} \eta _j^{(i)}= & {} \left\{ \begin{array}{ll} b_j^{(i)}, &{} \text {for all}\ 1\leqslant i\leqslant L-1,\ i\ne s,\ 1\leqslant j\leqslant m_i; \\ b_j^{(s)}, &{} \text {for}\ i=s,\ 1\leqslant j\leqslant m_s; \\ \text {arbitrary value}, &{} \text {for}\ i=s,\ j=m_s+1; \end{array} \right. \end{aligned}$$
(35)
$$\begin{aligned} \alpha _{ij}= & {} \left\{ \begin{array}{ll} \beta _{ij},&{} \text {for all}\ 1\leqslant i\leqslant L-1,\ i\ne s,\ 1\leqslant j\leqslant m_i; \\ \beta _{sj}, &{} \text {for}\ i=s,\ 1\leqslant j\leqslant m_s; \\ 0, &{} \text {for}\ i=s,\ j=m_s+1. \end{array} \right. \end{aligned}$$
(36)

With the above coefficients, the extra node in layer s of the network HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) may output an arbitrary field, which however has no contribution to the output field of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)). The rest of the hidden nodes and the output node of HLConcFNN(\({{\textbf{M}}}_2,\sigma \)) produce identical fields as the corresponding nodes in the network HLConcFNN(\({{\textbf{M}}}_1,\sigma \)). We thus conclude that \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})=v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\). So \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma )\) and the relation (10) holds. \(\square \)

Proof of Theorem 3

We use the same strategy as that in the proof of Theorem 1. Consider an arbitrary \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_1,\sigma ,\varvec{\theta })\), where \(\varvec{\beta }\in {{\mathbb {R}}}^{N_{c1}}\) with \(N_{c1}=\sum _{i=1}^{L-1}m_i\). We will try to construct an equivalent function from \(U({\varOmega },{{\textbf{M}}}_2,\sigma ,\varvec{\vartheta })\).

We consider another function \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma ,\varvec{\vartheta })\), where \(\varvec{\alpha }\in {{\mathbb {R}}}^{N_{c2}}\) with \(N_{c2}=N_{c1}+n\), and we set the coefficients of the HLConcELM corresponding to \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\) as follows. Since \(\varvec{\vartheta }[1:N_{h1}]=\varvec{\theta }[1:N_{h1}]\), the random coefficients in the first \((L-1)\) hidden layers of the HLConcELM corresponding to \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\) are identical to those corresponding hidden-layer coefficients in the HLConcELM for \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\). We set the weight/bias coefficients in the L-th hidden layer of the HLConcELM for \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\), which contains n nodes, to arbitrary random values. For the output-layer coefficients of the HLConcELM for \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\), we set those coefficients that connect the hidden nodes in the first \((L-1)\) hidden layers and the output node to be identical to those corresponding output-layer coefficients in the HLConcELM for \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\), namely, \(\varvec{\alpha }[1:N_{c1}]=\varvec{\beta }[1:N_{c1}]\). We set those coefficients that connect the hidden nodes of the L-th hidden layer and the output node to be zeros in the HLConcELM for \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})\), namely, \(\varvec{\alpha }[N_{c1}+1:N_{c2}]=0\).

With the above coefficient settings, the output fields of those nodes in the first \((L-1)\) hidden layers of HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)) are identical to those corresponding nodes of HLConcELM(\({{\textbf{M}}}_1,\sigma ,\varvec{\theta }\)). The output fields of those n nodes in the L-th hidden layer of HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)) are arbitrary, which however have no contribution to the output field of HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)). The output field of the HLConcELM(\({{\textbf{M}}}_2,\sigma ,\varvec{\vartheta }\)) is identical to that of the HLConcELM(\({{\textbf{M}}}_1,\sigma ,\varvec{\theta }\)), i.e. \(v(\varvec{\vartheta },\varvec{\alpha },{{\textbf{x}}})=u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\). We thus conclude that \(u(\varvec{\theta },\varvec{\beta },{{\textbf{x}}})\in U({\varOmega },{{\textbf{M}}}_2,\sigma ,\varvec{\vartheta })\) and the relation (13) holds. \(\square \)

Appendix B. Numerical Tests with Several Activation Functions

Table 7 Appendix B (variable-coefficient Poisson equation): the activation functions and the corresponding hidden magnitude vector \({{\textbf{R}}}\) employed
Table 8 Appendix B (variable-coefficient Poisson equation): the max/rms/\(h^1\) errors of HLConcELM obtained with different activation functions on two uniform sets of collocation points
Table 9 Appendix B (variable-coefficient Poisson equation): the max/rms/\(h^1\) errors of HLConcELM obtained with different activation functions on two neural networks with architecture [2, M, 50, 1]

We have employed the Gaussian activation function for all the numerical simulations in Sect. 3. This appendix provides additional HLConcELM results using several other activation functions for solving the variable-coefficient Poisson problem from Sect. 3.1. Table 7 lists the activation functions studied below, including tanh, RePU-8, sinc, GELU and swish (in addition to Gaussian), as well as the hidden magnitude vector \({{\textbf{R}}}\) employed for each activation function. Here “RePU-8” stands for the rectified power unit of degree 8, and “GELU” denotes the Gaussian error linear unit [25].

Table 8 lists the maximum, rms and \(h^1\) errors of the HLConcELM solutions obtained using these activation functions on a neural network [2, 800, 50, 1] with two uniform sets of collocation points \(Q=15\times 15\) and \(30\times 30\). Table 9 lists the maximum, rms and \(h^1\) errors of HLConcELM using these activation functions on two neural networks of the architecture [2, M, 50, 1] (with \(M=400\) and 800) with a fixed uniform set of \(Q=35\times 35\) collocation points. One can observe a general exponential decrease in the errors with these activation functions, except for the RePU-8 function in Table 8 (where the errors seem to saturate). The results with the RePU function appears markedly less accurate than those obtained with the other activation functions studied here.

Appendix C. Additional Comparisons Between HLConcELM and Conventional ELM

Table 10 Appendix C (variable-coefficient Poisson equation): comparison of the maximum and rms errors versus the number of collocation points (Q) obtained by HLConcELM and conventional ELM
Table 11 Appendix C (variable-coefficient Poisson equation): comparison of the maximum and rms errors versus the number of collocation points (Q) obtained by HLConcELM and conventional ELM

This appendix provides additional comparisons between the current HLConcELM method and the conventional ELM method for the variable-coefficient Poisson problem (Sect. 3.1) and the nonlinear Helmholtz problem (Sect. 3.3).

In those comparisons between HLConcELM and conventional ELM presented in Sect. 3, the base neural-network architectures for HLConcELM and conventional ELM are maintained to be the same. HLConcELM is able to harvest the degrees of freedom in all the hidden layers of the neural network, thanks to the logical connections between all the hidden nodes and the output nodes (due to the hidden-layer concatenation). On the other hand, the conventional ELM only exploits the degrees of freedom afforded by the last hidden layer of the network, while those degrees of freedom provided by the preceding hidden layers are essentially “wasted” (see the discussions in Sect. 2.1). This is why the conventional ELM exhibits a poor accuracy if the last hidden layer is narrow, irrespective of the rest of the network configuration. This also accounts for why the HLConcELM method can achieve a high accuracy when the last hidden layer is narrow and when it is wide.

Note that with HLConcELM the number of training parameters equals the total number of hidden nodes in the neural network, and with conventional ELM it equals the number of nodes in the last hidden layer. Under the same base network architecture (with multiple hidden layers), the number of training parameters in HLConcELM is larger than that in the conventional ELM, because the HLConcELM also exploits the the hidden nodes from the preceding hidden layers.

In what follows we present several additional numerical tests to compare HLConcELM and conventional ELM, under the configuration that the number of training parameters in both HLConcELM and conventional ELM is maintained to be the same. Because of their different characteristics, the base network architectures for HLConcELM and for conventional ELM in this case will inevitably not be identical. In the comparisons below we try to keep the two architectures close to each other, specifically by using the same depth, and the same width for each hidden layer except the last, for both HLConcELM and conventional ELM. The width of the last hidden layer in the HLConcELM network and in the conventional-ELM network is different, with the conventional ELM being wider (and in some cases considerably wider), while the number of training parameters is kept the same in both.

Table 12 Appendix C (nonlinear Helmholtz equation): comparison of the maximum and rms errors versus the number of collocation points (Q) obtained by HLConcELM and conventional ELM
Table 13 Appendix C (nonlinear Helmholtz equation): comparison of the maximum and rms errors versus the number of collocation points (Q) obtained by HLConcELM and conventional ELM

Tables 10 and 11 show comparisons of the maximum and rms errors versus the number of collocation points obtained by HLConcELM and by conventional ELM for the variable-coefficient Poisson problem from Sect. 3.1. The results in Table 10 are attained with two hidden layers in the neural network and a total of 850 training parameters. The results in Table 11 correspond to three hidden layers in the neural network with a total of 900 training parameters. The HLConcELM data in Table 10 for the networks [2, 800, 50, 1] and [2, 50, 800, 1] correspond to those in Table 1. The simulation parameter values are listed in the tables or provided in the table captions. The exponential convergence of the errors with respect to the number of collocation points is evident in all test cases. The error levels from HLConcELM and the conventional ELM are close, reaching the order around \(10^{-8}\) in terms of the maximum error and \(10^{-9}\) in terms of the rms error. The error values resulting from HLConcELM in general appear better than those from the Conventional ELM, e.g. by comparing the HLConcELM results (with [2, 800, 50, 1]) and the conventional ELM results (with [2, 800, 850, 1]) in Table 10 or comparing the HLConcELM results (with [2, 800, 50, 50, 1]) and the conventional ELM results (with [2, 800, 50, 900, 1]) in Table 11. But this is not true for every test case; see e.g. the case Q=25\(\times \)25 between HLConcELM (with [2, 50, 800, 1]) and conventional ELM (with [2, 50, 850, 1]) in Table 10 or the cases Q=15\(\times \)15 and 20\(\times \)20 between HLConcELM (with [2, 50, 50, 800, 1]) and conventional ELM (with [2, 50, 50, 900, 1]) in Table 11.

Tables 12 and 13 show the comparisons between HLConcELM and conventional ELM for the nonlinear Helmholtz problem from Sect. 3.3. The results in Table 12 correspond to two hidden layers in the neural network with a total of 530 training parameters, and those in Table 13 correspond to three hidden layers in the neural network with a total of 560 training parameters. The simulation parameter values are provided in the table captions or listed in the tables. Note that the HLConcELM data in Table 12 correspond to those in Table 4 with the networks [2, 500, 30, 1] and [2, 30, 500, 1]. The relative performance between HLConcELM and conventional ELM exhibited by these data is similar to what has been observed from Tables 10 and 11 for the variable-coefficient Poisson equation. The error levels resulting from HLConcELM and conventional ELM are quite close, on the order of \(10^{-6}\) or \(10^{-7}\) in terms of the maximum error and \(10^{-7}\) or \(10^{-8}\) in terms of the rms error. Overall the error values from HLConcELM appear slightly better than those from the conventional ELM; see e.g. those data in Table 12 and the cases between HLConcELM with [2, 30, 30, 500, 1] and conventional ELM with [2, 30, 30, 560, 1] in Table 13. But this is not consistently so for all the test cases; see e.g. the cases between HLConcELM with [2, 500, 30, 30, 1] and conventional ELM with [2, 500, 30, 560, 1] in Table 13.

It is noted that in all these test cases the neural network for the conventional ELM has a wide last hidden layer. This is consistent with the observation that the conventional ELM is only accurate when the last hidden layer is wide.

Appendix D. Laplace Equation Around a Reentrant Corner

Fig. 25
figure 25

Appendix D (reentrant corner): sketch of the L-shaped domain \({\overline{OABCDEO}}\) with an reentrant corner at O. The sketch shows an example set of \(Q=3\times (10\times 10)\) uniform collocation points, with \(10\times 10\) points in each of the three regions \({\overline{OABF}}\), \({\overline{OFCG}}\) and \({\overline{OGDE}}\)

This appendix provides a test of the HLConcELM method with the Laplace equation around a reentrant corner, where the solution is not smooth. Figure 25 is a sketch of the L-shaped domain \({\varOmega }={\overline{OABCDEO}}\) (with an reentrant corner at O) employed in this test. We consider the following problem on \({\varOmega }\),

$$\begin{aligned}&\frac{\partial ^2 u}{\partial x^2} + \frac{\partial ^2 u}{\partial y^2} = 0, \quad (x,y)\in {\varOmega }, \end{aligned}$$
(37a)
$$\begin{aligned}&u(x,y) = 0, \quad (x,y)\in {\overline{OA}},\ {\overline{OE}},\end{aligned}$$
(37b)
$$\begin{aligned}&u(x,y) = r^{\frac{2k}{3}}\sin \left( \frac{2k}{3}\theta \right) , \quad (x,y)\in {\overline{AB}},\ {\overline{BC}},\ {\overline{CD}},\ {\overline{DE}}, \end{aligned}$$
(37c)

where u(xy) is the field to be solved for, \((r,\theta )\) denotes the polar coordinate, and \(k\geqslant 1\) is a prescribed integer. This problem has the following solution,

$$\begin{aligned} u(x,y) = r^{\frac{2k}{3}}\sin \left( \frac{2k}{3}\theta \right) , \quad (x,y)\in {\varOmega }. \end{aligned}$$
(38)

The integer k influences the regularity of the solution. If k is a multiple of 3, then the solution u(xy) is smooth (\(C^{\infty }\)) on \({\varOmega }\). Otherwise, the solution is non-smooth, with its \(\lceil \frac{2k}{3} \rceil \)-th derivative being singular at the reentrant corner. We solve this problem by the HLConcELM method, and employ a set of uniform grid points in the sub-regions \({\overline{OABF}}\), \({\overline{OFCG}}\) and \({\overline{OGDE}}\) as the collocation points. Figure 25 shows a set of \(Q=3\times (10\times 10)\) uniform collocation points on the domain as an example. The Gaussian activation function is employed in the neural network. We employ a fixed seed value 10 for the random number generators.

Fig. 26
figure 26

Appendix D (reentrant corner): Distributions of the exact solution (a, d, g), the HLConcELM solution (b, e, h), and the point-wise absolute error of the HLConcELM solution (c, f, i) to the Laplace equation. ac \(k=1\) (non-smooth), df \(k=3\) (smooth), and gi \(k=5\) (non-smooth) in the solution field. In HLConcELM, neural network [2, 800, 50, 1], Gaussian activation function, \(Q=3\times (20\times 20)\) uniform collocation points, \({{\textbf{R}}}=(5.0,0.1)\)

Table 14 Appendix D (reentrant corner): the maximum/rms errors of the HLConcELM solution versus the number of nodes in the first hidden layer (M) for solution fields with different regularity (k parameter)
Table 15 Appendix D (reentrant corner): the maximum/rms errors of the HLConcELM solution versus the number of collocation points (Q) for solution fields with different regularity

Figure 26 shows distributions of the exact solution (38), the HLConcELM solution and its point-wise absolute error, corresponding to three different solution fields with \(k=1\), 3 and 5. The values for the simulation parameters are provided in the figure caption. The HLConcELM result is extremely accurate for the case with a smooth solution (\(k=3\)), with the maximum error on the order \(10^{-11}\) in the domain. On the other hand, the HLConcELM solution is much less accurate for the non-smooth cases (\(k=1,5\)), with the maximum error around \(10^{-1}\) for \(k=1\) and around \(10^{-4}\) for \(k=5\). One can note that the computed HLConcELM solution is more accurate for a smoother solution field (larger k).

Tables 14 and 15 illustrate the convergence behavior of the HLConcELM errors with respect to the number of hidden nodes in the neural network and the number of collocation points (Q). Several cases corresponding to smooth and non-smooth solution fields are shown. The simulation parameter values are provided in the captions of these tables. The neural network architecture is given by [2, M, 50, 1], where M is either fixed at \(M=800\) or varied systematically. The set of collocation points is either fixed at \(Q=3\times (20\times 20)\) or varied systematically. For the smooth case (\(k=3\)), the HLConcELM solution exhibits an exponential convergence with respect to M and Q. For the non-smooth cases (\(k=1,2,5\)), the convergence is much slower and in general quite slow. However, if the solution is smoother (larger k), we can generally observe an initial exponential decrease in the HLConcELM errors as M or Q increases, and that the error reduction slows down as M or Q reaches a certain level. For example, with the case \(k=5\) one can observe in Table 14 the initial exponential decrease in the errors with increasing M for \(M\leqslant 300\).

Appendix E. Kuramoto–Sivashinsky Equation

Fig. 27
figure 27

(Appendix E) Kuramoto–Sivashinsky equation (case #1): Distributions of a the exact solution, b the HLConcELM solution and c its point-wise absolute error. In HLConcELM, 5 uniform time blocks, neural network \({{\textbf{M}}}=[2,400,50,1]\) and \(Q=25\times 25\) uniform collocation points per time block, hidden magnitude vector \({{\textbf{R}}}=(1.64,0.05)\), Gaussian activation function

This appendix provides a test of the HLConcELM method with the Kuramoto–Sivashinsky equation [38, 55]. We consider the domain \((x,t)\in {\varOmega } = [a,b]\times [0,t_f]\), and the Kuramoto–Sivashinsky equation on \({\varOmega }\) with periodic boundary conditions,

$$\begin{aligned}&\frac{\partial u}{\partial t} + \alpha u\frac{\partial u}{\partial x} + \beta \frac{\partial ^2u}{\partial x^2} + \gamma \frac{\partial ^4u}{\partial x^4} = f(x,t), \end{aligned}$$
(39a)
$$\begin{aligned}&u(a,t) = u(b,t), \quad \left. \frac{\partial u}{\partial x}\right| _{(a,t)}= \left. \frac{\partial u}{\partial x}\right| _{(b,t)}, \end{aligned}$$
(39b)
$$\begin{aligned}&\left. \frac{\partial ^2 u}{\partial x^2}\right| _{(a,t)}= \left. \frac{\partial ^2 u}{\partial x^2}\right| _{(b,t)}, \quad \left. \frac{\partial ^3 u}{\partial x^3}\right| _{(a,t)}= \left. \frac{\partial ^3 u}{\partial x^3}\right| _{(b,t)}, \end{aligned}$$
(39c)
$$\begin{aligned}&u(x,0) = g(x). \end{aligned}$$
(39d)

In these equations, \((\alpha ,\beta ,\gamma )\) are constants, u(xt) is the field function to be solved for, f is a prescribed source term, and g denotes the initial distribution. The domain parameters a, b and \(t_f\) will be specified below. We solve this problem by the locHLConcELM method (see Remark 6) together with the block time marching scheme (see Remark 5). The seed for the random number generators is set to 100 in the following tests.

Table 16 (Appendix E) Kuramoto–Sivashinsky equation (case #1): the maximum/rms errors of the HLConcELM solution versus the number of collocation points (Q) on two neural networks
Table 17 (Appendix E) Kuramoto–Sivashinsky equation (case #1): the maximum/rms errors of the HLConcELM solution versus the number of nodes (M) on two network architectures [2, M, 50, 1] and [2, 50, M, 1] (M varied)

Case #1: Manufactured Analytic Solution We first consider a manufactured analytic solution to (39) to illustrate the convergence behavior of HLConcELM. We employ the following parameter values,

$$\begin{aligned} a = 0, \quad b = 2, \quad t_f = 1, \quad \alpha = 1, \quad \beta = 1, \quad \gamma = 0.1, \end{aligned}$$

and the analytic solution given by

$$\begin{aligned} \begin{aligned} u(x,t) =&\left[ \frac{3}{2}\cos \left( \pi x + \frac{7\pi }{20} \right) + \frac{27}{20}\cos \left( 2\pi x - \frac{3\pi }{5} \right) \right] \left[ \frac{3}{2}\cos \left( \pi t + \frac{7\pi }{20} \right) \right. \\&\left. + \frac{27}{20}\cos \left( 2\pi t - \frac{3\pi }{5} \right) \right] . \end{aligned} \end{aligned}$$
(40)

The source term f and the initial distribution g are chosen such that the expression (40) satisfies the system (39). The distribution of this solution is shown in Fig. 27a.

The distributions of the HLConcELM solution and its point-wise absolute error are shown in Fig. 27b, c. We have employed 5 uniform time blocks in the HLConcELM simulation, and a neural network architecture [2, 400, 50, 1] with the Gaussian activation function within each time block. The other simulation parameter values are provided in the caption of Fig. 27. The HLConcELM method captures the solution accurately, with the maximum error on the order \(10^{-8}\) in the spatial-temporal domain.

Tables 16 and 17 illustrate the exponential convergence behavior of the HLConcELM accuracy with respect to the collocation points and the network size for the Kuramoto–Sivashinsky equation. Table 16 lists the HLConcELM errors versus the number of collocation points (Q) obtained with two neural networks, with a narrow and wide last hidden layer, respectively. Table 17 shows the HLConcELM errors versus the number of nodes (M) in the first or the last hidden layer of the neural network, obtained with a fixed set of \(Q=25\times 25\) uniform collocation points. The captions of these tables provide the parameter values in these simulations. It can be observed that the HLConcELM errors decrease exponentially as the number of collocation points or the network size increases.

Fig. 28
figure 28

(Appendix E) Kuramoto–Sivashinsky equation (case #2): Distributions of a the locHLConcELM solution and b the Chebfun solution. In Chebfun, 400 Fourier grid points in x, time step size \({\varDelta } t=1e-4\). In locHLConcELM, 20 uniform time blocks, 4 uniform sub-domains along x within each time block, neural network \({{\textbf{M}}}=[2,400,1]\) and \(Q=25\times 25\) uniform collocation points on each sub-domain, hidden magnitude vector \({{\textbf{R}}}=8.0\), sine activation function

Fig. 29
figure 29

(Appendix E) Kuramoto–Sivashinsky equation (case #2): Comparison of solution profiles between locHLConcELM and Chebfun at \(t=0.2\) (a), \(t=0.5\) (b), and \(t=0.8\) (c). Profiles of the absolution error between the Chebfun and the HLConcELM solutions at \(t=0.2\) (d), \(t=0.5\) (e), and \(t=0.8\) (f). Simulation settings and parameters follow those of Fig. 28

Case #2: No Exact Solution and Comparison with Chebfun We next consider the following parameter values and settings:

$$\begin{aligned} \begin{aligned}&a = -1, \quad b = 1, \quad t_f = 1, \quad \alpha = 5, \quad \beta = 0.5, \quad \gamma = 0.005, \\&f(x,t) = 0, \quad g(x) = -\sin (\pi x). \end{aligned} \end{aligned}$$

The exact solution for this case is unknown. We will employ the result computed by the software package Chebfun [14], with a sufficient resolution, as the reference solution to compare with HLConcELM.

Figure 28 shows the solution distributions obtained by the locHLConcELM method and by Chebfun in the spatial-temporal domain for this case. With locHLConcELM, we have employed 20 uniform time blocks, 4 uniform sub-domains (along the x direction) within each time block, and a local neural network [2, 400, 1] with \(Q=25\times 25\) uniform collocation points on each sub-domain. The sine activation, \(\sigma (x)=\sin (x)\), has been employed with the local neural networks. The Chebfun solution is obtained with 400 Fourier grid points along the x direction and a time step size \({\varDelta } t=10^{-4}\). The locHLConcELM solution agrees well with the Chebfun solution qualitatively.

Figure 29 provides quantitative comparisons between locHLConcELM and Chebfun for this case. It compares the solution profiles obtained by these two methods at three time instants \(t=0.2\), 0.5 and 0.8 (top row), and also shows the corresponding profiles of the absolute error between these two methods (bottom row). No difference can be discerned from the solution profiles between locHLConcELM and Chebfun. The errors between these two methods generally increase over time, with the maximum error on the order \(10^{-6}\) at \(t=0.2\) and \(10^{-4}\) at \(t=0.5\) and 0.8. These results indicate that the current method has captured the solution quite accurately.

Fig. 30
figure 30

(Appendix E) Kuramoto–Sivashinsky equation (case #3): Distributions of a the locHLConcELM solution and b the Chebfun solution. In Chebfun, 1000 Fourier grid points in x, time step size \({\varDelta } t=1e-5\). In locHLConcELM, 12 time blocks (time block size: 0.025 for the first 8 time blocks, 0.0125 for the last 4 time blocks), 10 uniform sub-domains along x within each time block, neural network \({{\textbf{M}}}=[2,300,1]\) and \(Q=21\times 21\) uniform collocation points within each sub-domain, hidden magnitude vector \({{\textbf{R}}}=2.5\), sinc activation function

Fig. 31
figure 31

(Appendix E) Kuramoto–Sivashinsky equation (case #3): Comparison of solution profiles between locHLConcELM and Chebfun at a \(t=0.05\), b \(t=0.1\), c \(t=0.15\), and d \(t=0.2\). Profiles of the absolute error between the locHLConcELM solution and the Chebfun solution at e \(t=0.05\), f \(t=0.1\), g \(t=0.15\), and h \(t=0.2\). Simulation settings and parameters follow those of Fig. 30

Case #3: Another Comparison With Chebfun We consider still another set of problem parameters as follows:

$$\begin{aligned} \begin{aligned}&a = -1, \quad b = 1, \quad t_f = 0.25, \quad \alpha = 6, \quad \beta = 0.5, \quad \gamma = 0.001, \\&f(x,t) = 0, \quad g(x) = -\sin (\pi x). \end{aligned} \end{aligned}$$

We again compare the HLConcELM result with the reference solution computed by Chebfun.

Figure 30 compares distributions of the locHLConcELM solution and the Chebfun solution. With locHLConcELM, we have employed 12 time blocks, 10 uniform sub-domains along the x direction within each time block, a local neural network [2, 300, 1] and a uniform set of \(Q=21\times 21\) collocation points on each sub-domain. The random magnitude vector is \({{\textbf{R}}}=2.5\), and the sinc activation function (\(\sigma (x)=\frac{\sin (\pi x)}{\pi x}\)) is employed. With Chebfun, we have employed 1000 Fourier grid points along the x direction and a time step size \({\varDelta } t=10^{-5}\). The distribution of the locHLConcELM solution is qualitatively similar to that of the Chebfun solution.

Figure 31 provides a quantitative comparison of the solution profiles between locHLConcELM and Chebfun at several time instants (top row), and also shows the corresponding profiles of the absolute error between the locHLConcELM solution and the Chebfun solution (bottom row). The locHLConcELM solution agrees very well with the Chebfun solution initially, and the difference between these two solutions grows over time.

Appendix F. Schrodinger Equation

This appendix provides a test of the HLConcELM method with the Schrodinger equation. We consider the domain \((x,t)\in {\varOmega }=[a,b]\times [0,t_f]\), and the Schrodinger equation on \({\varOmega }\) with periodic boundary conditions:

$$\begin{aligned}&i\frac{\partial h}{\partial t} + \frac{1}{2}\frac{\partial ^2h}{\partial x^2} + |h|^2 h = f(x,t), \end{aligned}$$
(41a)
$$\begin{aligned}&h(a,t) = h(b,t), \quad \left. \frac{\partial h}{\partial x}\right| _{(a,t)} = \left. \frac{\partial h}{\partial x}\right| _{(b,t)}, \end{aligned}$$
(41b)
$$\begin{aligned}&h(x,0) = g(x), \end{aligned}$$
(41c)

where h(xt) is the complex field function to be solved for, f(xt) is a prescribed complex source term, and g(x) is the initial distribution. Let \(h = u(x,t) + iv(x,t)\), where u and v denote the real and the imaginary parts of h, respectively. The domain parameters a, b and \(t_f\) will be specified below.

We solve this problem by the HLConcELM method, or the locHLConcELM method (see Remark 6), combined with the block time marching scheme (see Remark 5). The input layer of the neural network consists of two nodes, representing x and t, respectively. The output layer consists also of two nodes, representing the real part and the imaginary part of h(xt), respectively. Accordingly, the system (41) is re-written into an equivalent form in terms of the real part and the imaginary part of h(xt). The reformulated system is employed in the HLConcELM simulation. When multiple sub-domains are employed in locHLConcELM, we impose \(C^1\) continuity conditions along the x direction and \(C^0\) continuity conditions along the t direction across the shared sub-domain boundaries. The seed for the random number generators is set to 100 in the HLConcELM simulations.

Fig. 32
figure 32

(Appendix F) Schrodinger equation (case #1): Distributions of the real part (a) and its point-wise absolute error (d), the imaginary part (b) and its point-wise absolute error (e), the norm (c) and its point-wise absolute error (f), of the HLConcELM solution h(xt). Neural network [2, 400, 30, 2], Gaussian activation function, \(Q=25\times 25\) uniform collocation points, hidden magnitude vector \({{\textbf{R}}}=(1.7,0.01)\)

Table 18 (Appendix F) Schrodinger equation (case #1): the maximum/rms errors of the HLConcELM solution versus the number of collocation points (Q) on two neural networks
Table 19 (Appendix F) Schrodinger equation (case #1): the maximum/rms errors of the HLConcELM solution versus the number of nodes (M) on two network architectures [2, M, 30, 2] and [2, 30, M, 2] (M varied)

Case #1: Manufactured Analytic Solution We first illustrate the convergence behavior of HLConcELM using a manufactured analytic solution. We employ the following domain parameters,

$$\begin{aligned}\small a = -1,\quad b = 1, \quad t_f = 1.5, \end{aligned}$$

and the analytic solution \(h = u+iv\), where

$$\begin{aligned} \left\{ \begin{aligned} u(x,t) =&\left[ \frac{3}{2}\sin \left( \pi x + \frac{7\pi }{20} \right) + \frac{27}{20}\sin \left( 2\pi x - \frac{3\pi }{5} \right) \right] \left[ \frac{3}{2}\sin \left( \pi t + \frac{7\pi }{20} \right) \right. \\&\left. + \frac{27}{20}\sin \left( 2\pi t - \frac{3\pi }{5} \right) \right] , \\ v(x,t) =&\left[ \frac{5}{4}\cos \left( \pi x + \frac{7\pi }{20} \right) + \frac{3}{2}\cos \left( 2\pi x - \frac{3\pi }{5} \right) \right] \left[ \frac{5}{4}\cos \left( \pi t + \frac{7\pi }{20} \right) \right. \\&\left. + \frac{3}{2}\cos \left( 2\pi t - \frac{3\pi }{5} \right) \right] . \end{aligned} \right. \end{aligned}$$
(42)

The source term f(xt) and the initial distribution g(xt) are chosen such that the expression (42) satisfies the system (41).

Figure 32 shows distributions of the real part u, the imaginary part v, and the norm |h| of the HLConcELM solution h(xt), as well as their point-wise absolute errors when compared with the analytic solution (42), in the spatial-temporal domain. The neural network architecture is given by \({{\textbf{M}}}=[2, 400, 30, 2]\), and the other simulation parameter values are listed in the figure caption. The HLConcELM solution is observed to be highly accurate, and the maximum error on the order \(10^{-8}\) for all of these quantities.

The exponential convergence of the HLConcELM accuracy is illustrated by the data in Tables 18 and 19. Table 18 lists the maximum and rms errors of the real part, the imaginary part, and the norm of h(xt) as a function of the number of collocation points (Q) obtained by HLConcELM on two neural networks with a narrow and a wide last hidden layer, respectively. Table 19 lists the HLConcELM errors for the real/imaginary parts and the norm of h(xt) on two network architectures having two hidden layers, with the number of nodes (M) in the first or the last hidden layer varied. The values for the simulation parameters are listed in the captions of these two tables. It is evident that the HLConcELM errors decrease approximately exponentially (before saturation) as the number of collocation points or the number of nodes in the neural network increases.

Fig. 33
figure 33

(Appendix F) Schrodinger equation (case #2): Distributions of a, b the real part, c, d the imaginary part, and e, f the norm, of h(xt) obtained by HLConcELM (a, c, e) and by Chebfun (b, d, f). In Chebfun, 1024 Fourier grid points in x, time step size \({\varDelta } t=10^{-4}\). In HLConcELM, 5 uniform time blocks, 3 sub-domains along the x direction within each time block (sub-domain boundary points \({\mathcal {X}}=[-1,-0.35,0.35,1]\)), local neural network [2, 400, 2] and \(Q=25\times 25\) uniform collocation points on each sub-domain, \({{\textbf{R}}}=2.0\), Gaussian activation function

Fig. 34
figure 34

(Appendix F) Schrodinger equation (case #2): Comparison of the solution profiles between locHLConcELM and Chebfun at a \(t=0.2\), b \(t=0.5\), and c \(t=0.8\). Profiles of the absolute error between the locHLConcELM solution and the Chebfun solution at d \(t=0.2\), e \(t=0.5\), and f \(t=0.8\). Simulation settings and parameters follow those of Fig. 33

Case #2: No Exact Solution and Comparison with Chebfun We next consider a case with no exact solution available, and so we use the numerical result obtained by Chebfun [14] as a reference to compare with the HLConcELM solution. We employ the following parameter values for the domain and the system (41),

$$\begin{aligned} a = -1, \quad b = 1, \quad t_f = 1, \quad f(x,t) = 0, \quad g(x) = \frac{7}{4}\left[ \cos (\pi x) + 1 \right] . \end{aligned}$$

Figure 33 illustrates the distributions of the locHLConcELM solution and the Chebfun solution for the real part, the imaginary part, and the norm of h(xt). The Chebfun solution is obtained on 1024 Fourier grid points along the x direction with a time step size \({\varDelta } t=10^{-4}\). For locHLConcELM with block time marching, we have employed 5 uniform time blocks, 3 sub-domains along the x direction within each time block (interior sub-domain boundaries located at \(x=-0.35\) and 0.35), and a local neural network \({{\textbf{M}}}=[2,400,2]\) with the Gaussian activation function on each sub-domain. The other simulation parameter values are listed in the figure caption. No apparent difference can be discerned between the locHLConcELM solution and the Chebfun solution qualitatively.

Figure 34 provides a comparison between the locHLConcELM solution and the Chebfun solution quantitatively. Figure 34a–c compare profiles of the locHLConcELM solution and the Chebfun solution for the real part, the imaginary part and the norm of h(xt) at three time instants \(t=0.2\), 0.5 and 0.8. The locHLConcELM solution profiles and the Chebfun solution profiles exactly overlap with one another. Figures 34d–f show profiles of the absolute error between the locHLConcELM solution and the Chebfun solution at the same time instants. One can observe that the difference between locHLConcELM and Chebfun is on the order of \(10^{-4}\), suggesting that the locHLConcELM result agrees well with the Chebfun result for this problem.

Appendix G. Two-Dimensional Advection Equation

This appendix provides a further test of the HLConcELM method with the advection equation in two spatial dimensions plus time. Note that the numerical results in Sect. 3.2 are for the one-dimensional advection equation (plus time). We consider the spatial-temporal domain \((x,y,t)\in {\varOmega }=[0,2]\times [0,2]\times [0,10]\), and the advection equation on \({\varOmega }\) with periodic boundary conditions,

$$\begin{aligned}&\frac{\partial u}{\partial t} - \frac{\partial u}{\partial x} - \frac{\partial u}{\partial y} = 0, \end{aligned}$$
(43a)
$$\begin{aligned}&u(0,y,t) = u(2,y,t), \quad u(x,0,t) = u(x,2,t), \end{aligned}$$
(43b)
$$\begin{aligned}&u(x,y,0) = \cos \left[ \pi \left( x+y-1 \right) \right] . \end{aligned}$$
(43c)

This initial/boundary value problem has the following exact solution,

$$\begin{aligned} u(x,y,t) = \cos \left[ \pi \left( x+y+t-1 \right) \right] . \end{aligned}$$
(44)
Fig. 35
figure 35

(Appendix G) 2D Advection equation: Distributions of a the exact solution, b the HLConcELM solution and c its point-wise absolute error in the spatial-temporal domain. In HLConcELM (with block time marching), 20 uniform time blocks, neural network [3, 1000, 1] and \(Q=15\times 15\times 15\) uniform collocation points within each time block, hidden magnitude vector \({{\textbf{R}}}=0.6\), Gaussian activation function

Table 20 (Appendix G) 2D Advection equation: the maximum and rms errors of the HLConcELM solution versus the number of uniform collocation points
Table 21 (Appendix G) 2D Advection equation: the maximum and rms errors of the HLConcELM solution versus the number of nodes in hidden layer

We solve the problem (43) by the HLConcELM method together with block time marching (see Remark 5). We employ 20 uniform time blocks in the simulation, with a size 0.5 for each time block. Within each time block we employ a neural network architecture [3, M, 1] (M varied) with the Gaussian activation function, and a uniform set of \(Q=Q_1\times Q_1\times Q_1\) collocation points (\(Q_1\) varied). The seed for the random number generators is set to 100 in the numerical tests. After the network is trained, we evaluate the neural network on another fixed set of \(Q_{eval}=51\times 51\times 51\) uniform grid points within each time block to attain the HLConcELM solution values for all time blocks. We then compare the HLConcELM solution with the exact solution (44) on the same set of \(Q_{eval}\) points within each time block, to compute the maximum (\(l^{\infty }\)) and rms (\(l^2\)) errors over the entire spatial-temporal domain \({\varOmega }\). The error values as computed above are said to be associated with the neural network [3, M, 1] with the \(Q=Q_1\times Q_1\times Q_1\) collocation points.

Figure 35 illustrates the distributions of the exact solution (44), the HLConcELM solution, and the point-wise absolute error of the HLConcELM solution over \({\varOmega }\). The values for the simulation parameters in HLConcELM are provided in the figure caption. The HLConcELM solution is quite accurate, with the maximum error on the order \(10^{-6}\) over the entire domain \({\varOmega }\).

The exponential convergence of the HLConcELM accuracy is illustrated by Tables 20 and 21. Table 20 lists the maximum and rms errors of HLConcELM (over \({\varOmega }\)) as a function of the number of collocation points (Q), obtained with a neural network [3, 1000, 1]. Table 21 lists the maximum and rms errors of HLConcELM as a function of the number of hidden nodes (M) in the neural network, obtained on a fixed set of \(Q=15\times 15\times 15\) uniform collocation points. The other simulation parameter values are provided in the captions of these tables. It is evident that the HLConcELM errors decrease exponentially (before saturation) with increasing number of collocation points or increasing number of hidden nodes in the network.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, N., Dong, S. Numerical Computation of Partial Differential Equations by Hidden-Layer Concatenated Extreme Learning Machine. J Sci Comput 95, 35 (2023). https://doi.org/10.1007/s10915-023-02162-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-023-02162-0

Keywords

Mathematics Subject Classification

Navigation