1 Introduction

Over the last decades, several neural network approaches for solving differential equations have been developed [1,2,3]. The application and extension of these approaches are a topic of recent research, including work on different network architectures like Legendre [4] and polynomial neural networks [5] as well as computational studies [6, 7].

One of the early proposed methods [8] introduced a trial solution (TS) in order to define a cost function using one feedforward neural network. The TS is supposed to satisfy given initial or boundary values by construction. It is also referred to as neural form (NF) in this context [8, 9] which we will adopt from here on. Let us note that such NFs represent a general tool that enable to solve ordinary differential equations (ODEs), partial differential equations (PDEs) and systems of ODEs/PDEs alike. We will refer here to this approach as the trial solution method (TSM). Later, the initial method from [8] has been extended by a NF with two feedforward neural networks, which allows to deal with boundary value problems for irregular boundaries [10] and yields broader possibilities for constructing the TS [9]. In the latter context, let us also mention [11] where an algorithm is proposed in order to create a TS based on grammatical evolution. Focusing on initial value problems (IVPs), one approach employs to learn solution bundles [12], making the trained neural forms reusable for enquired initial values.

A technique related to TSM that avoids the explicit construction of trial solutions has been proposed in [13]. The given initial or boundary values from the underlying differential equation are included in the cost function as additional terms, so that the NF can be set to equal the neural network output. We will refer to this approach as modified trial solution method (mTSM).

The fact that the neural network output computation resembles a linear combination of basis functions leads to a network architecture as presented in [14] (for PDEs). In that work, one hidden layer incorporates two sets of activation functions, one of which is supposed to satisfy the PDE and the second dealing with boundary conditions. The basis function coefficients are set to be the connecting weights from the hidden layer to the output neuron, and the sum over all basis functions and coefficients makes up the NF.

Motivated by the construction principle of collocation methods in numerical analysis, we propose in this paper a novel extension of the NF approach. Our neural form extension is based on the observation that the NF using one feedforward neural network as employed in [8] may be interpreted as a first-order collocation polynomial. We propose to extend the corresponding polynomial order of the neural form. The novel construction includes several feedforward neural networks, one for each order. Compared to a collocation method from standard numerics, the networks take on the role of coefficients in the collocation polynomial expansion.

Furthermore, we aim to approximate initial value problems on fairly large domains. Therefore, and based on the NF structures, we also propose a fragmentation of the computational domain into subdomains. In each subdomain, we solve the initial value problem with a collocation neural form. This is done proceeding in time from one domain fragment to the adjacent subdomain. The interfacing grid points in any subdomain provide the initial value for the next subdomain. On a first glance, one may think of similarities to domain decomposition methods for PDEs in numerical analysis, cf. [15, 16]. We also show how to combine the domain fragmentation with the newly developed collocation polynomial neural forms.

Let us note that this article is a substantial extension of our conference paper [17]. There we briefly introduced just the principle of the collocation-based polynomial NF. Here, we present a much more detailed investigation of the collocation-based NF, and as a substantial novelty the expansion and combination with our novel domain fragmentation method for solving initial value problems. That enables us to discuss both methods in a merged context, which we find highly suitable. Moreover, we updated the notation to a more suitable form.

Related work and problem statement In our previous work [18], we have shown that even simple feedforward neural networks (5 hidden layer neurons) are capable of solving a stiff initial value problem. The investigated and studied neural forms approaches are based on [8, 13]. Please find the additional related work to be addressed in the previous introduction paragraphs. The best results in the abovementioned computational study were provided by random weight initialisation, but with the drawback of a spread between the results with different random initialisation for unchanged computational parameters. There we now want to improve constant weight initialisation. The main advantage of the latter is that results are exactly reproducible with unchanged computational parameters. Based on the neural forms approach [8], we have already shown that a polynomial extension leads to a significant increase in numerical accuracy for constant weight initialisation [17]. In the present work, we now combine the polynomial neural forms and employ domain fragmentation to split the solution domain into smaller subdomains. This technique solves the neural forms on such subdomains with the initial values provided by previous subdomains. The inherent flexibility enables the approach to achieve useful results, even on fairly large domains.

2 Setting up the neural form (NF)

In this section, we first recall the TSM and its modified version mTSM, respectively, compare [8, 13]. Then, we proceed with details on the feedforward neural networks that we employ, followed by a description of the novel collocation-based neural form and the subdomain approach.

2.1 Construction of the neural form

Consider an initial value problem written in a general form as

$$\begin{aligned} G\left( t,u(t),{\dot{u}}(t)\right) =0,\quad u(t_0)=u_0,\quad t\in D\subset {\mathbb {R}} \end{aligned}$$
(1)

with given initial value \(\displaystyle u(t_0)=u_0\). In order to connect \(\displaystyle G\) with a neural network, several approaches introduce a NF as a differentiable function \(\displaystyle {\tilde{u}}(t,\mathbf {p})\), where the vector \(\displaystyle \mathbf {p}\) contains the network weights. With the collocation method, we discretise the domain \(\displaystyle D\) by a uniform grid with \(\displaystyle n+1\) grid points \(\displaystyle t_i\) (\(\displaystyle t_0<t_1<\ldots <t_n\)), so that the initial value problem (1) leads to the formulation

$$\begin{aligned} G\left( t_i,{\tilde{u}}(t_i,\mathbf {p}),\dot{{\tilde{u}}} (t_i,\mathbf {p})\right) ={\mathbf {0}} \end{aligned}$$
(2)

Let us note that, in a slight abuse of notation, we identify \(\displaystyle G\left( t_i,{\tilde{u}}(t_i,\mathbf {p}),\dot{{\tilde{u}}}(t_i,\mathbf {p})\right) \) with the vector of corresponding entries (grid points), since this enables to give many formula a more elegant, compact notation.

In order to satisfy the given initial value, TSM [8] employs the NF as a sum of two terms

$$\begin{aligned} {\tilde{u}}(t,\mathbf {p})=A(t)+F(t,N(t,\mathbf {p})) \end{aligned}$$
(3)

where \(\displaystyle A(t)\) is supposed to match the initial condition (with the simplest choice to be \(\displaystyle A(t)=u(t_0)\)), while \(\displaystyle F(t,N(t,\mathbf {p}))\) is constructed to eliminate the impact of \(\displaystyle N(t,\mathbf {p})\) at \(\displaystyle t_0\). The choice of \(\displaystyle F(t,N(t,\mathbf {p}))\) determines the influence of \(\displaystyle N(t,\mathbf {p})\) over the domain.

Since the NF as used in this work satisfies given initial values by construction, we define the corresponding cost function incorporating Eq. (3) as

$$\begin{aligned} E[\mathbf {p}]=\frac{1}{2}\bigg \Vert G\left( t_i,{\tilde{u}}(t_i,\mathbf {p}), \dot{{\tilde{u}}}(t_i,\mathbf {p})\right) \bigg \Vert _2^2 \end{aligned}$$
(4)

which is subject to minimisation. Although Eq. (2) denotes a system of equations which may be solvable w.r.t. \(\displaystyle \mathbf {p}\), the actual equation of interest is Eq. (4) and its optimisation will return suitable neural network weights.

Let us now turn to the mTSM approach after [13]. The mTSM approach chooses the NF to be equivalent to the neural network output directly

$$\begin{aligned} {\tilde{u}}(t,\mathbf {p})=N(t,\mathbf {p}) \end{aligned}$$
(5)

Since no condition is imposed by the initial value on the NF in this way, the conditions are added to the cost function when relying on Eq. (5):

$$\begin{aligned} E[\mathbf {p}]= & {} \frac{1}{2}\bigg \Vert G\left( t_i,{\tilde{u}}(t_i,\mathbf {p}),\dot{{\tilde{u}}}(t_i,\mathbf {p})\right) \bigg \Vert _2^2\nonumber \\&+\frac{1}{2}\bigg \Vert N(t_0,\mathbf {p})-u(t_0)\bigg \Vert _2^2 \end{aligned}$$
(6)

2.2 Neural network architecture and optimisation

In this section, we will describe how a feedforward neural network with one hidden layer operates in our setting. Specific variants will be addressed in the corresponding sections.

As depicted in Fig. 1, we employ one hidden layer, with H hidden layer neurons supplemented by one bias neuron. Having in addition one bias neuron in the input layer and a linear output layer neuron, the neural network output reads

$$\begin{aligned} N(t,\mathbf {p})=\sum _{j=1}^{H}\rho _j\sigma _j+\gamma \end{aligned}$$
(7)

Thereby, \(\displaystyle \sigma _j=\sigma (z_j)=1/(1+e^{-z_j})\) represents the sigmoid activation function with the weighted sum \(\displaystyle z_j=\nu _jt+\eta _j\). Here, \(\displaystyle \nu _j\) (input layer neuron), \(\displaystyle \eta _j\) (input layer bias neuron), \(\displaystyle \rho _j\) (hidden layer neurons) and \(\displaystyle \gamma \) (hidden layer bias neuron) denote the weights which are stored in the weight vector \(\displaystyle \mathbf {p}\). The input layer passes the domain data t (that is in practice \(t_i\)), weighted by \(\nu _j\) and \(\eta _j\), to the hidden layer for processing. The neural network output \(N(t,\mathbf {p})\) is again a weighted sum of the values \(\displaystyle \rho _j \sigma (z_j)\) with \(\displaystyle \gamma \) added. With \(\displaystyle N(t,\mathbf {p})\) given, the neural forms and cost functions in Eqs. (4),(6) are obtained.

Fig. 1
figure 1

The architecture of our incorporated feedforward neural network with one hidden layer bias

As usual, the cost function gradient is used to update \(\displaystyle \mathbf {p}\) in order to find a (local) minimum in the weight space. One training cycle is called an epoch and consists of a full iteration over all training points. The gradient, with respect to the network weights, determines their influence on the network output. With this information, each incorporated weight can be adjusted individually to lead the network into a local minimum during the training process. For optimising the cost function, we consider here Adam (adaptive moment estimation) [19] which is a stochastic gradient descent method, using adaptive learning for every weight.

If a weight update is performed after the gradient computation for a single grid point, we call this method single batch training (SBtraining) here. An alternative proceeding, performing the weight update after a complete iteration over all grid points, averaging the cost function gradient and training error, is denoted here as full batch training (FBtraining).

Let us comment in some detail on the relation between grid points and training points. Our setting is an unsupervised learning framework, where grid points are used for domain discretisation, and where the unknowns are values of the ODE solution at exactly these grid points. Thus, in our setting, the grid points are identical with the training points. Let us stress in this context, that the grid points by themselves stay fixed during network optimisation.

3 The novel collocation neural form (CNF)

Making Eq. (3) precise for our grid-based setting, a suitable choice for the neural form of TSM with given \(\displaystyle u(0)=u_0\) is

$$\begin{aligned} {\tilde{u}}(t,\mathbf {p})=u_0+N(t,\mathbf {p})t \end{aligned}$$
(8)

where t will be evaluated at grid points \(t_i\). Please note, if the initial point is different from \(\displaystyle t_0=0\), this results in a shift of \(\displaystyle t\rightarrow (t-t_0)\) in Eq. (8). Compared to a first-order polynomial \(\displaystyle q_1(t)=a_0+a_1t\), one may find similarities in the structure. Motivated by the expansion of an m-th order collocation function polynomial [20]

$$\begin{aligned} q_m(t)=a_0+\sum _{k=1}^ma_kt^k \end{aligned}$$
(9)

we are lead to set up our collocation-based NF (CNF) approach for TSM:

$$\begin{aligned} {\tilde{u}}_C(t,\mathbf{P }_m)=u_0+\sum _{k=1}^mN_k(t,\mathbf {p}_k)t^k \end{aligned}$$
(10)

The weight vector is denoted by \(\displaystyle \mathbf {p}_k\) and we define the matrix \(\displaystyle \mathbf{P }_m\) of \(\displaystyle m\) weight vectors \(\displaystyle \mathbf{P }_m=(\mathbf {p}_1,\ldots ,\mathbf {p}_m)\).

The use of higher-order monomial powers \(\displaystyle t^k\) as in Eq. (10) not only generalises previous methods, but may also enable better stability and accuracy properties, as we show in this paper. Let us also observe, that the neural networks take on the roles of coefficient functions for the values of \(\displaystyle t^k\). We conjecture at this point that this construction makes sense since in this way, several possible multipliers (not only \(\displaystyle t\) as in Eq. (8)) are included for neural form construction. It is important to mention that the new neural form construction Eq. (10) fulfils the initial condition.

Let us stress that the proposed ansatz (Eq. (10)) includes \(\displaystyle m\) neural networks, where \(\displaystyle N_k(t,\mathbf {p}_k)\) represents the \(\displaystyle k\)-th neural network

$$\begin{aligned} N_k(t,\mathbf {p}_k)=\sum _{j=1}^H\rho _{j,k}\sigma (\nu _{j,k}t+\eta _{j,k})+\gamma _k \end{aligned}$$
(11)

The corresponding cost function is then given as in Eq. (4).

We extend the mTSM method in a similar way as we obtained the TSM extension in Eq. (10):

$$\begin{aligned} {\tilde{u}}_C(t,\mathbf{P }_m)=N_1(t,\mathbf {p}_1)+\sum _{k=2}^mN_k(t,\mathbf {p}_k)t^{k-1} \end{aligned}$$
(12)

Thereby, the first neural network \(\displaystyle N_1(t,\mathbf {p}_1)\) is set to learn the initial condition in the same way as stated in Eq. (6).

From now on, we will refer to the number of neural networks in the neural form as the collocation neural form order, denoted by \(\displaystyle m\).

4 The novel subdomain collocation neural form (SCNF)

The previous described TSM and mTSM approaches use the IVP structure together with the given initial value in order to train the neural networks on a certain domain. In a prior experimental study [7], we figured out that especially TSM tends to struggle with approximating the solution on larger domains. However, on small domains, the numerical error tends to remain small. Since the domain variable t effectively acts as a scaling of \(\displaystyle N(t, \mathbf {p})\), we conjecture that a large domain size variation may introduce the need for a higher amount of training points or the use of a more complex neural network architecture.

These circumstances motivate us to introduce a second stage of discretising the domain. That is, we split the solution domain \(\displaystyle D\) in subdomains \(\displaystyle D_l,l=1,\ldots ,h\), with \(\displaystyle n+1\) grid points \(\displaystyle t_{i,l}\) in each subdomain. Now, the CNF is solved separately in each subdomain. The interfacing grid points overlap, i.e. the computed value \(\displaystyle {\tilde{u}}_C(t_{n,l-1},\mathbf{P }_{m,l-1})\) at the last grid point of any subdomain \(D_{l-1}\) is set to be the new initial value \(\displaystyle {\tilde{u}}_C(t_{0,l},\mathbf{P }_{m,l})\) for the next subdomain \(\displaystyle D_l\).

Since the CNF for TSM is constructed in order to satisfy the given initial values, we force the subdomain CNF (SCNF) to also hold that characteristic. Therefore, the SCNF is constructed to satisfy the new initial values in each subdomain, namely

$$\begin{aligned} {\tilde{u}}_C(t_{i,l},\mathbf{P }_{m,l})= & {\tilde{u}}_C(t_{0,l},\mathbf{P }_{m,l} )\nonumber \\&+\sum _{k=1}^mN_k(t_{i,l},\mathbf {p}_{k,l})(t_{i,l}-t_{0,l})^k \end{aligned}$$
(13)

The neural networks are now scaled by \(\displaystyle (t_{i,l}-t_{0,l})^k\), which in fact may avoid higher scaling factors, depending on the subdomain size. The arising cost function, similar to Eq. (4), is

$$\begin{aligned} E_l[\mathbf{P }_{m,l}]=\frac{1}{2}\bigg \Vert G\left( t_{i,l},{\tilde{u}}_C (t_{i,l},\mathbf{P }_{m,l}),\dot{{\tilde{u}}}_C(t_{i,l},\mathbf{P }_{m,l}) \right) \bigg \Vert _2^2 \end{aligned}$$
(14)

Proceeding to mTSM, we also adopt the CNF approach and set the first neural network to learn the new initial values in each subdomain. That is, the SCNF reads

$$\begin{aligned} {\tilde{u}}_C(t_{i,l},\mathbf{P }_{m,l})= & N_1(t_{i,l},\mathbf {p}_{1,l})\nonumber \\&+\sum _{k=2 }^mN_k(t_{i,l},\mathbf {p}_{k,l})(t_{i,l}-t_{0,l})^{k-1} \end{aligned}$$
(15)

and the corresponding cost function

$$\begin{aligned} E_l[\mathbf{P }_{m,l}]&=\frac{1}{2}\bigg \Vert G\left( t_{i,l},{\tilde{u} }_C(t_{i,l},\mathbf{P }_{m,l}),\dot{ {\tilde{u}}}_C(t_{i,l},\mathbf{P }_{m,l}) \right) \bigg \Vert _2^2 \\&\quad +\frac{1}{2}\bigg \Vert N(t_{0,l},\mathbf {p}_{1,l})-{\tilde{u}}_C(t_{0,l},\mathbf{P }_{m,l})\bigg \Vert _2^2 \nonumber \end{aligned}$$
(16)

Let us note at this point, that \(\displaystyle G\) in Eqs. (14) and (16) shares the same structure as the general problem in Eq. (1). However, the original solution function \(\displaystyle u(t)\) has been replaced by the SCNF \(\displaystyle {\tilde{u}}_C(t_{i,l},\mathbf{P }_{m,l})\). Therefore, \(\displaystyle G\) involved in the cost function now relies on one or more neural networks, depending on the neural forms order. Once trained, each subdomain has its unique learned weight matrix \(\displaystyle \mathbf{P }_{m,l}\) which can later be used to recreate the solution or evaluate the solution at grid points intermediate to the training points.

In order to keep the overview of all terms and indices, we sum them up again: The \(\displaystyle i\)-th grid point in the \(\displaystyle l\)-th subdomain is denoted by \(\displaystyle t_{i,l}\), while \(\displaystyle t_{0,l}\) is the initial point in the subdomain \(\displaystyle D_l\) with the initial value \(\displaystyle {\tilde{u}}_C(t_{0,l},\mathbf{P }_{m,l})\). That is, \(\displaystyle t_{n,l-1}\) and \(\displaystyle t_{0,l}\) are overlapping grid points. In \(\displaystyle D_1\), \({\tilde{u}}_C(t_{0,1},\mathbf{P }_{m,1})=u(t_0)\) holds. The matrix \(\displaystyle \mathbf{P }_{m,l}\) contains the set of the m neural network weights in the corresponding subdomain \(\displaystyle l\). Finally, \(\displaystyle N_k(t_{i,l},\mathbf {p}_{k,l})\) denotes the \(\displaystyle k\)-th neural network in \(\displaystyle D_l\).

5 Experiments and results

This section is divided into experiments on the collocation neural form (CNF), followed by experiments on the subdomain collocation neural form (SCNF). Prior to this, we will provide detailed information about how the weight initialisation for the different neural networks is realised. The discussion of constant weight initialisation is also one of the main subjects in the experimental section. As stated before, the specific neural network configurations will be addressed in the subsequent experiments.

Weight initialisation with \(\displaystyle \mathbf {p}^{init}_{const}\) applies to all corresponding neural networks so that they use the same initial values. Increasing the m for the initialisation with \(\displaystyle \mathbf {p}^{init}_{rnd}\) works systematically. For \(\displaystyle m=1\), a set of random weights for the neural network is generated. For \(\displaystyle m=2\) (now with two neural networks), the first neural network is again initialised with the generated weights from \(m=1\), while for neural network number two, a new set of weights is generated. This holds for all \(\displaystyle m\) for higher orders, subsequently, in all experiments. To achieve comparability, the same random initialised weights are used in all experiments.

For optimisation, we use Adam, which parameters are fixed with, as employed in [19], \(\displaystyle \alpha =1\hbox {e}{-}3\), \(\displaystyle \beta _1=9\hbox {e}{-}1\), \(\displaystyle \beta _2=9.99\hbox {e}{-}1\) and \(\displaystyle \epsilon =1\hbox {e}{-}8\).

5.1 Experiments on the collocation neural form (CNF)

In this section, we want to test our novel CNF approach with the initial value problem

$$\begin{aligned} {\dot{u}}(t)+5u(t)=0,\quad u(0)=1 \end{aligned}$$
(17)

which has the analytical solution \(\displaystyle u(t)=e^{-5t}\) and is solved over the entire domain \(\displaystyle D=[0,2]\) (without domain fragmentation). Equation (17) involves a damping mechanism, making this a simple model for stiff phenomena [21].

The numerical error \(\displaystyle \varDelta u\) shown in subsequent diagrams in this section is defined as the \(\displaystyle l_1\)-norm of the difference between the exact solution and the corresponding CNF

$$\begin{aligned} \varDelta u=\frac{1}{n+1}\big \Vert u(t_i)-{\tilde{u}}_C(t_i,\mathbf{P }_m)\big \Vert _1 \end{aligned}$$
(18)

If we do not say otherwise, the fixed computational parameters in the subsequent experiments are: 1 input layer bias, 1 hidden layer with 5 sigmoid neurons, 1e5 training epochs, 10 training points, \(\displaystyle D=[0,2]\) and the weight initialisation values which are \(\displaystyle \mathbf {p}^{init}_{const}=-10\) and \(\displaystyle \mathbf {p}^{init}_{rnd}\in [-10.5,-9.5]\). These values may seem arbitrarily chosen, but we found them to work well for both TSM and mTSM so that a useful comparison is possible.

Weight initialisation Let us comment in some more detail on weight initialisation. The weight initialisation plays an important role and determines the starting point for gradient descent. Poorly chosen, the optimisation method may fail to find a suitable local minimum. The initial neural network weights are commonly chosen as small random values [22]. Let us note that this is sometimes considered as a computational characteristic of the stochastic gradient descent optimisation. Another option is to choose the initialisation to be constant. This method is not commonly used for the optimisation of neural networks since random weight initialisation may lead to better results. However, constant initialisation returns reliably results of reasonable quality if the computational parameters in the network remain unchanged.

As previous experiments have documented [7, 8, 13], both TSM and mTSM are able to solve differential equations up to a certain degree of accuracy. However, an example illustrating the accuracy of five computations with random weights \(\displaystyle \mathbf {p}^{init}_{rnd}\) , respectively, constant weights \(\displaystyle \mathbf {p}^{init}_{const}\) shows that the quality of approximations may vary considerably, see Table 1. As observed in many experiments, even a small discrepancy in the initialisation with several sets of random weights in the same range may lead to a significant difference in accuracy. On the other hand, the network initialisation with constant values very often gives reliable results by the proposed novel approach. This motivates us to study in detail the effects of constant network initialisation.

Table 1 Results for five different realisations during optimisation (mTSM, \(\displaystyle m=2\))

5.1.1 CNF experiment: number of training epochs

The first experiment shows for different m how the numerical error \(\displaystyle \varDelta u\) behaves depending on the number of training epochs. The diagrams only display every hundredth data point.

In Fig. 2a with TSM and \(\displaystyle \mathbf {p}^{init}_{const}\) results for \(\displaystyle m=1\) (blue) do not provide any useful approximation, independent of the batch training method selected. With a second neural network for \(\displaystyle m=2\) (orange) in the neural form, \(\displaystyle \varDelta u\) approximately lowers by one order of magnitude so that we now obtain a solution which can be considered to rank at the lower end of reliability. However, the most interesting result in Fig. 2a is \(\displaystyle m=5\) (green) with the best accuracy at the end of the optimisation process but with the drawback of occurring oscillations. These may arise by the chosen optimisation method.

Fig. 2
figure 2

Experiment in 5.1.1 Number of training epochs, (blue) \(\displaystyle m=1\), (orange) \(\displaystyle m=2\), (yellow) \(\displaystyle m=3\), (purple) \(\displaystyle m=4\), (green) \(\displaystyle m=5\) (color figure online)

Table 2 shows the numeric error \(\displaystyle \varDelta u(t_i)\) at individual grid points \(\displaystyle t_i\). For both CNF and Runge-Kutta 4 (RK4), the results were computed with ten grid points, resulting in the CNF approach with \(\displaystyle \mathbf {p}^{init}_{const}\) performing better over Runge-Kutta 4. However, further refining the grid for Runge-Kutta 4 will result in significantly lower \(\displaystyle \varDelta u(t_i)\).

Table 2 Numerical error comparison at individual grid points with \(m=5\) and \(\displaystyle \mathbf {p}^{init}_{const}\)

For mTSM with SBtraining and \(\displaystyle \mathbf {p}^{init}_{const}\), already \(\displaystyle m=1\) converges to a solution accuracy that can be considered reliable. However, we observe within Fig. 2b that only the transition from \(\displaystyle m=1\) (blue) to \(\displaystyle m=2\) (orange) affects \(\displaystyle \varDelta u\) with increasing accuracy, while heavy oscillations start to occur.

In not documented results with \(\displaystyle \mathbf {p}^{init}_{rnd}\), m has only minor influence on the accuracy. In particular, FBtraining for mTSM shows the same trend for both initialisation methods with only minor differences in the last epochs.

Let us note that the displayed results show the best approximations using constant or random initialisation. This means, we obtain the best results for TSM with FBtraining, \(\displaystyle m=5\) (green) and for mTSM with SBtraining, \(\displaystyle m\ge 2\), respectively.

Concluding this experiment, we were able to get better results with \(\mathbf {p}^{init}_{const}\) over \(\displaystyle \mathbf {p}^{init}_{rnd}\). Increasing the m to at least order five seems to be a good option for TSM and FBtraining, whereas further m may provide even better approximations. For mTSM, we can not observe benefits for m above order 2.

Moreover, we see especially that the increase in the order of the neural form in (10) appears to have a similar impact on solution accuracy as the discretisation order in classical numerical analysis.

5.1.2 CNF experiment: domain size variation

Investigating the methods concerning different domain sizes provides information on the reliability of computations on larger domains. The domains in this experiment read as \(\displaystyle D=[0,t_{end}]\) and we directly compare in this experiment \(\displaystyle \mathbf {p}^{init}_{const}\) with \(\displaystyle \mathbf {p}^{init}_{rnd}\).

In Fig. 3a, b, we observe TSM from around \(t_{end}=3.5\) to incrementally plateau to unreliable approximations. Increasing m improves \(\displaystyle \varDelta u\) on small domains and shifts the observable step-like accuracy degeneration towards larger domains.

Fig. 3
figure 3

Experiment in 5.1.2 Domain size variation, (blue) \(m=1\), (orange) \(m=2\), (yellow) \(m=3\), (purple) \(m=4\), (green) \(m=5\) (color figure online)

However, even with \(m=5\) (green), the results starting from domain size \(t_{end}=3.5\) towards larger sizes are unreliable. Previous to the first plateau higher m provide significant better \(\displaystyle \varDelta u\) for \(\displaystyle \mathbf {p}^{init}_{const}\), while there are only minor changes for \(\displaystyle \mathbf {p}^{init}_{rnd}\) for the TSM method. This holds for both SBtraining and FBtraining, and one can say that in this experiment, TSM works better with \(\displaystyle \mathbf {p}^{init}_{rnd}\), even without increasing m.

Turning to the mTSM extension, we observe in Fig. 3c with SBtraining the existence of a certain point from where different m return equal values, whereas FBtraining returns (close to) equal results for all the investigated domain sizes. However, we see some evidence for the use of \(m=2\) (orange) over \(\displaystyle m=1\) (blue) to show an overall good performance. A further increase of m is not necessary with this approach, confirming results from Experiment 5.1.1.

Let us also note that, with mTSM, we find that a small domain seems to favour \(\displaystyle \mathbf {p}^{init}_{const}\) which then provides better results than \(\displaystyle \mathbf {p}^{init}_{rnd}\).

5.1.3 CNF experiment: number of training points variation

The behaviour of numerical methods highly depends on the chosen amount of grid points, so that in this experiment, we analogously investigate the influence of the number of training points (nTP). In every computation, the domain \(\displaystyle D\) is discretised by equidistant grid points.

As in the previous experiments, the m shows a major influence on the results with TSM, and the best approximations are provided by \(\displaystyle \mathbf {p}^{init}_{const}\) with \(\displaystyle m=5\) (green) as shown in Fig. 4a. An interesting behaviour (observed also in a different context in Fig. 2a) is the equivalence between \(\displaystyle m=3\) (yellow) and \(\displaystyle m=4\) (purple). Both converge to almost exactly the same \(\displaystyle \varDelta u\), where one may assume a saturation for the m. However, another increase in the order decreases the numerical error again by one order of accuracy.

Fig. 4
figure 4

Experiment in 5.1.3 Number of training points variation, (blue) \(\displaystyle m=1\), (orange) \(\displaystyle m=2\), (yellow) \(\displaystyle m=3\), (purple) \(\displaystyle m=4\), (green) \(\displaystyle m=5\) (color figure online)

Turning to mTSM in Fig. 4b, we again find a major increase in accuracy after a transition from \(\displaystyle m=1\) (blue) to \(\displaystyle m=2\). For nTP \(=50\), values for \(\displaystyle m\ge 2\) converge to the same results as provided by TSM with \(\displaystyle m=5\).

Concluding this experiment, we again find evidence that increasing m in the proposed approach provides an improved accuracy for \(\displaystyle \mathbf {p}^{init}_{const}\). However, increasing nTP seems not to improve the accuracy from a certain point on, unlike for numerical methods. But one could argue that the analogy between the number of grid points for numerical methods here is the number of epochs.

5.2 Experiments on the subdomain collocation neural form (SCNF)

In Sect. 5.1, while the test equation is stiff, its solution is at the same time very smooth and the equation is solved on a small domain. However, Fig. 3 in Experiment 5.1.2 shows that TSM does not provide reliable solutions on larger domains. Hence, we want to show that the novel SCNF approach is able to work even on a fairly large domain with a different initial value problem. Therefore, we use the following test equation

$$\begin{aligned} {\dot{u}}(t)-t\sin (10t)+u(t)=0,\quad u(0)=-1 \end{aligned}$$
(19)

with the analytical solution

$$\begin{aligned} u(t)= & {} \sin (10t)\bigg (\frac{99}{10201}+\frac{t}{101}\bigg )\nonumber \\&+\cos (10t)\bigg (\frac{20}{10201}-\frac{10t}{101}\bigg )-\frac{10221}{10201}e^{-t} \end{aligned}$$
(20)

The solution is shown in Fig. 5 for \(t\in [0,15]\) and incorporates heavily oscillating and increasing characteristics, similar to instabilities. Although our approach is not limited to certain types of IVPs, we find Eq. (19) to represent possible real-world behaviour and find it suitable to serve as an example IVP.

The numerical error \(\displaystyle \varDelta u_l\) is now defined as the averaged \(\displaystyle l_1\)-norm of the difference between the exact solution and the corresponding SCNF in each subdomain

$$\begin{aligned} \varDelta u_l=\frac{1}{n+1}\big \Vert u(t_{i,l})-{\tilde{u}}_C(t_{i,l},\mathbf{P }_{m,l})\big \Vert _1 \end{aligned}$$
(21)

whereas \(\displaystyle \varDelta u\) averages the numerical error of the subdomains

$$\begin{aligned} \varDelta u=\frac{1}{h}\sum _{l=1}^h\varDelta u_l \end{aligned}$$
(22)

The weight initialisation works as employed in Sect. 5, and the values are fixed to \(\displaystyle \mathbf {p}^{init}_{const}=0\) and \(\displaystyle \mathbf {p}^{init}_{rnd}\in [-0.5,0.5]\). In the subsequent experiments, the solution domain is kept constant to \(\displaystyle D=[0,15]\) , and the neural networks are training with 1e5 epochs.

Fig. 5
figure 5

Analytical solution for initial value problem (19)

In addition, we use the method of training the neural networks incrementally which has been employed in [13]. That is, we initially train the neural networks for the first grid point, afterwards for the first two grid points. We continue the procedure up to a FBtraining of all grid points in each subdomain. The initial weight initialisation is the same in each subdomain.

Please note at this point, that we provide an explicit comparison to the Runge-Kutta 4 method only in the last experiment of this section. We find that our approach provides by construction a very high flexibility, to deal with many types of initial value problems that may require specific constructions in classic numerical analysis (e.g. by symplectic integration). However, in simple settings, we will usually find the network-based approach at the moment not competitive to numerical state-of-the-art solvers (w.r.t. computational efficiency) which have been developed and refined over decades. We think that because of the much higher flexibility of the network-based tool, this comparison would not be entirely adequate.

As an example for the flexibility, a recent neural network approach [12] makes it superficial to restart the computation of numerical solutions when considering multiple, different initial conditions. We also demonstrate the flexibility in this paper in Sect. 5.2.5 and show how invariants can be simply added to the cost function. Regarding numerical methods for handling those problems, special constructions are often needed.

In our test example in Eq. (19), we find a graphical comparison in the subsequent experiments to not provide further information since the visual differences between analytical solution and solution with Runge-Kutta-4 with adaptive time stepping are minor.

A scaling experiment The original TSM neural form (Eq. (3)) is theoretically capable of approximating every continuous function, according to the universal approximation theorem [23]. However, Table 3 shows results for a TSM neural form with a single neural network. For different domains, we scaled the number of hidden layer neurons linearly and averaged ten computations for each domain with the same computational parameters.

Table 3 TSM neural form, \(\displaystyle \mathbf {p}^{init}_{rnd}\)

The results in Table 3 provide the following message. Increasing the domain size forces the neural network to incorporate more hidden layer neurons and grid points. Indeed, to reach, e.g. (averaged) \(\displaystyle \varDelta u=9.5873\hbox {e}{-}\)4 for \(\displaystyle D=[0,3]\), learning the neural network required 50 hidden layer neurons and 75 grid points. In general, determining a suitable architecture in terms of the number of hidden layer neurons and training points is a challenging task.

In subsequent experiments, we find the SCNF to be able to solve the initial value problem with neural networks including a small fixed amount of hidden layer neurons and training points in each subdomain. At the same time, this allows to define various important parameters in a simple and straightforward way.

5.2.1 SCNF experiment: CNF versus SCNF

In the first experiment, we compare results of a SCNF with a CNF that is solved over the entire domain. For comparability the total number of training points is constant, namely nTP \(=1000\) for the red line and nTP \(=10\) with 100 subdomains for the black/dotted line. However, the comparison of two CNFs with the same architecture would not be meaningful because the domain size has a significant influence. Therefore, we decided to realise the CNF (red) with a neural network incorporating 1 input layer bias and 100 sigmoid neurons with 1 bias. The SCNF (black/dotted) features neural networks with 1 input layer bias and 5 sigmoid neurons with 1 bias per subdomain. Both CNF and SCNF incorporate \(\displaystyle m=3\). In addition, we did not increase the domain size incrementally for this experiment, to reduce the number of parameters that prevent comparability.

The CNF solution (red) shows throughout all experiments in Fig. 6 no useful approximation. In total, the number of hidden layer neurons and training points that would be needed to obtain a useful approximation seems to be much higher. Nonetheless, the SCNF approach (black/dotted) working with the same number of training points was able to solve the initial value problem in a satisfactory way. From a qualitative perspective, both TSM and mTSM together with \(\displaystyle \mathbf {p}^{init}_{const}\) and \(\displaystyle \mathbf {p}^{init}_{rnd}\) provide similar results.

Fig. 6
figure 6

Experiment in 5.2.1 CNF versus SCNF, (orange) analytical solution, (red) CNF solution, (black/dotted) SCNF solution (color figure online)

Concluding this experiment, we see that the SCNF method provides a useful solution to the initial value problem. In addition, the incorporated small number of hidden layer neurons enables a much more effective training of the neural networks.

5.2.2 SCNF experiment: variation of the polynomial neural forms order (m)

The ability to approximate the initial value problem with SCNF, depending on different m, is subject to this experiment. Here, the SCNFs include 1 input layer bias and 5 sigmoid neurons with 1 bias. The solution domain is split into 60 subdomains with 10 grid points in each subdomain. Here, we employ incremental learning in the subdomains.

Results for TSM with \(\displaystyle m=1\) in Fig. 7a and mTSM with \(\displaystyle m=1\) in Fig. 7b indicate that the original TSM and mTSM methods are not useful over larger domains, even when employing domain fragmentation. However, the SCNF of first order is able to get back on the solution trend, although several subdomains do not provide correct approximations. In total, both solutions for \(\displaystyle m=1\) (especially mTSM) cannot be considered to be reliable.

That changes for \(\displaystyle m=2\), at least for TSM in Fig. 7c. Here, we find, with the exception of some local extreme points, the SCNF to be a reasonable approximation of the initial value problem. This statement, however, does not hold for mTSM. Although the general trend now is much closer to the analytical solution, there are still subdomains which do not approximate the solution well.

Results shown in Table 4 represent the \(\displaystyle l_{\infty }\)-norm of differences between analytical and computed solution, for mTSM as displayed Fig. 7, measured at the last grid points in \(\displaystyle D_{l-1}\), namely \(\displaystyle t_{n,l-1}\), and the corresponding initial points in \(\displaystyle D_l\), \(\displaystyle t_{0,l}\). We propose to consider this measure, since it indicates how well the solution can be met over the subdomains. We find that increasing m has a major influence on the accuracy.

Fig. 7
figure 7

Experiment 5.2.2 m variation, \(\displaystyle \mathbf {p}^{init}_{const}\), (orange) analytical solution, (black/dotted) SCNF solution (color figure online)

Table 4 \(\displaystyle l_{\infty }\)-norm for mTSM interface grid points, \(\displaystyle \mathbf {p}^{init}_{const}\)

We conjecture that learning the subdomain initial values becomes easier for mTSM, the more neural networks are incorporated. That is mainly because the first neural network can so to say focus on learning the initial values, while the other networks are more engaged with the IVP structure. We think that this conjecture can be confirmed by the decreasing discrepancy between the overlapping at the interfaces for higher orders of m.

The overall best solutions here are provided by \(\displaystyle m=3\) (Fig. 7e, f) for both TSM and mTSM in this experiment.

We tend to favour TSM over mTSM, since the initial value in each subdomain is satisfied by the corresponding SCNF (where the learned value at \(\displaystyle t_{n,l-1}\) is set to be the initial value for \(\displaystyle t_{0,l}\)) and does not have to be learned again.

5.2.3 SCNF experiment: number of subdomain variation

In this experiment, we investigate the influence of the total number of subdomains on the numeric error \(\displaystyle \varDelta u\). Figure  8 shows the behaviour for \(\displaystyle \mathbf {p}^{init}_{rnd}\) (blue) and \(\displaystyle \mathbf {p}^{init}_{const}\) (yellow). The SCNF incorporate \(\displaystyle m=3\), 1 input layer bias, 5 sigmoid neurons with 1 bias and 10 grid points in each subdomain. We again employ incremental learning in the subdomains.

Fig. 8
figure 8

Experiment in 5.2.3 Number of subdomain variation, (blue) \(\displaystyle \mathbf {p}^{init}_{rnd}\), (yellow) \(\displaystyle \mathbf {p}^{init}_{const}\) (color figure online)

Let us first comment on the SCNF for TSM in Fig. 8a. Despite minor differences between the solutions corresponding to \(\displaystyle \mathbf {p}^{init}_{rnd}\) and \(\displaystyle \mathbf {p}^{init}_{const}\) for smaller numbers of subdomains, both initialisation methods show a very similar trend. A saturation regime seems to appear for around 350 subdomains with \(\displaystyle \varDelta u\approx 1\hbox {e}{-}5\).

Turning to mTSM in Fig. 8b, we again observe a similar behaviour between the methods with \(\displaystyle \mathbf {p}^{init}_{rnd}\) and \(\displaystyle \mathbf {p}^{init}_{const}\). Although the differences almost disappear for larger numbers of subdomains, we find that even at 400 subdomains, the numerical error \(\Delta u\) cannot compete with TSM here.

Let us note again, that the chosen weight initialisation approach for \(\displaystyle \mathbf {p}^{init}_{rnd}\) (see Sect. 5) means that the random weights are initialised in the same way in each subdomain. In undocumented tests, we observed that the results may show slight to significant variations, when the random weights are generated independently for each network over the subdomains. However, the results we have shown here using \(\displaystyle \mathbf {p}^{init}_{rnd}\) represent a rather typical trend observed in the results.

In conclusion, one can obtain very good approximations with the TSM SCNF approach for both weight initialisation methods. That means, choosing \(\displaystyle \mathbf {p}^{init}_{const}\) over \(\displaystyle \mathbf {p}^{init}_{rnd}\) has no downsides, which leads us to again support the use of constant weight initialisation.

5.2.4 SCNF experiment: numerical error in the subdomains

The last experiment investigates the numeric error \(\varDelta u_l\) in each subdomain \(D_l\), depending on different m. Again, the SCNFs feature 1 input layer bias and 5 sigmoid neurons with 1 bias. The solution is computed with 100 subdomains together with 10 grid points each and incremental learning in the subdomains.

Throughout Fig. 9, \(\displaystyle m=1\) shows the least good results. Although, if we compare mTSM with \(\displaystyle \mathbf {p}^{init}_{const}\) in Fig. 9b to results for 60 subdomains in Fig. 7b, increasing the domain fragments by 40 subdomains seems to prevent the solution from diverging. Random weight initialisation works better for \(\displaystyle m=1\), especially with TSM.

Fig. 9
figure 9

Experiment in 5.2.4 Numerical error in the subdomains, (blue) \(\displaystyle m=1\), (orange) \(\displaystyle m=3\), (yellow) \(\displaystyle m=5\)

Solutions provided by \(\displaystyle m=3\) and \(\displaystyle m=5\) are much better than for \(\displaystyle m=1\), and increasing the order clearly tends to increase the accuracy. For TSM with both \(\displaystyle m=3\) and \(\displaystyle m=5\), as well as for mTSM with \(\displaystyle m=5\) from a certain subdomain on, the numerical error saturates. Let us note, that for both \(\displaystyle \mathbf {p}^{init}_{const}\) and \(\displaystyle \mathbf {p}^{init}_{rnd}\), the differences in the overall numerical error \(\displaystyle \varDelta u\) are not significant in these cases.

In this experiment, we again tend to favour TSM with \(\displaystyle \mathbf {p}^{init}_{const}\). Although \(\displaystyle m=1\) does not work well, the other shown higher orders provide good approximations with saturation regimes. The results confirm our preference of constant initialisation, because \(\displaystyle \mathbf {p}^{init}_{const}\) does not depend on a good generation of random weights by chance.

5.2.5 SCNF experiment: system of initial value problems

In this section, we study a non-linear system of initial value problems. The example system we consider reads

$$\begin{aligned} {\dot{u}}(t)&=a_uv(t)w(t) \end{aligned}$$
(23)
$$\begin{aligned} {\dot{v}}(t)&=a_vw(t)u(t) \end{aligned}$$
(24)
$$\begin{aligned} {\dot{w}}(t)&=a_wu(t)v(t) \end{aligned}$$
(25)

with

$$\begin{aligned} a_u=\frac{I_v-I_w}{I_vI_w},\quad a_v=\frac{I_w-I_u}{I_wI_u},\quad a_w=\frac{I_u-I_v}{I_uI_v} \end{aligned}$$
(26)

where \(\displaystyle I_u,I_v,I_w\) are non-zero real numbers, and given initial values for \(\displaystyle u,v,w\). The equations describe the angular momentum of a free rigid body [24, 25] with the centre of mass at the origin. These coupled initial value problems are often denoted as Euler equations and feature time invariant characteristics since the independent variable \(\displaystyle t\) (time) does not explicitly appear on the right-hand side. The quadratic invariant expression

$$\begin{aligned} R^2=u^2(t)+v^2(t)+w^2(t) \end{aligned}$$
(27)

conserves the so-called magnitude and describes a sphere, while another quadratic invariant

$$\begin{aligned} H=\frac{1}{2}\bigg (\frac{u^2(t)}{I_u}+\frac{v^2(t)}{I_v}+\frac{w^2(t)}{I_w}\bigg ) \end{aligned}$$
(28)

conserves the kinetic energy and describes an ellipsoid. Both invariant quantities force the solution to stay on the intersection formed by the sphere and the ellipsoid. Since Eqs. (27) and (28) remain unchanged over time along the solutions, we have

$$\begin{aligned} H&=\frac{1}{2}\bigg (\frac{u^2(0)}{I_u}+\frac{v^2(0)}{I_v}+\frac{w^2(0)}{I_w}\bigg ) \end{aligned}$$
(29)
$$\begin{aligned} R^2&=u^2(0)+v^2(0)+w^2(0) \end{aligned}$$
(30)

The corresponding initial values \(\displaystyle u(0), v(0), w(0)\) are given as in Fig. 10, and the fixed principle moments of interior (see [24], Section 14.3) have the values

$$\begin{aligned} I_u=2,\quad I_v=1,\quad I_w=\frac{2}{3} \end{aligned}$$
(31)

assigned. In general, the neural network approach allows given invariant expressions to be directly added to the cost function, due to its flexibility. For systems of initial value problems, the cost function can be obtained by assigning each solution function its own SCNF and sum up the retrieved \(\displaystyle l_2\)-norms, here together with the invariant quantities:

$$\begin{aligned} E[\mathbf {p}]&=\frac{1}{2}\bigg \Vert \dot{{\tilde{u}}}_C-a_u{\tilde{v}}_C{\tilde{w}}_C\bigg \Vert _2^2 +\frac{1}{2}\bigg \Vert \dot{{\tilde{v}}}_C-a_v{\tilde{w}}_C {\tilde{u}}_C\bigg \Vert _2^2\\&\quad +\frac{1}{2}\bigg \Vert \dot{{\tilde{w}}}_C-a_w{\tilde{u}}_C{\tilde{v}}_C\bigg \Vert _2^2\nonumber \\&\quad +\frac{1}{2}\bigg \Vert {\tilde{u}}_C^2+{\tilde{v}}^2_C +{\tilde{w}}^2_C-R^2\bigg \Vert _2^2\nonumber \\&\quad +\frac{1}{2}\bigg \Vert \frac{{\tilde{u} }_C^2}{2I_u}+\frac{{\tilde{v}}^2_C}{2I_v}+\frac{{\tilde{w}}^2_C}{2I_w} -H\bigg \Vert _2^2 \nonumber \end{aligned}$$
(32)

In order to visualise the invariant behaviour, the computed results in Fig. 10 are obtained for \(\displaystyle t\in [0,30]\), which allows the solution to pass its own initial points more than once. The solution domain is fragmented into 40 subdomains with ten training points in each subdomain. Due to overlapping grid points at the subdomain intersections, the total number of unique training points is nTP \(=361\), and the same amount was used to obtain computational result with Runge-Kutta 4. Each SCNF has \(\displaystyle m=3\), and therefore, features three neural networks involved. The latter are initialised with zeros and other training parameters and methods remain unchanged (see Sect. 5.2).

In Fig. 10, we display both the Runge-Kutta 4 (coloured/solid) and the SCNF (black/dots) solution. As mentioned above, the curves lay on intersections formed between a sphere and an ellipsoid. Changing the point of view in Fig. 10 reveals the contours of intersections between the corresponding spheres and ellipsoids for the different initial conditions (Fig. 11).

Fig. 10
figure 10

Experiment in 5.2.5 Runge-Kutta 4 (coloured/solid) and SCNF (black/dots) solution of the free rigid body problem in Eq. (23) for different initial values, (blue) {\(u(0) = \cos (1.1)\); \(v(0)=0.6\); \(w(0) = \sin (1.1)\)}, (orange) {\(u(0) = \cos (1)\); \(v(0)=0.7\); \(w(0) = \sin (1)\)}, (yellow) {\(u(0) = \cos (1.2)\); \(v(0)=0.5\); \(w(0) = \sin (1.2)\)}

Fig. 11
figure 11

Experiment in 5.2.5 Top view of Fig. 10 to visualise the spherical and ellipsoidal intersections, colours are adopted from Fig. 10

Figure  12 shows the training error arising from the cost function in Eq. (32) over the incorporated subdomains. Since the SCNF solution in each subdomain is computed independently (except for the provided initial value), the training error can differ significantly, even between two adjacent subdomains. Additionally, the flexibility of those independent computations enables this decreasing behaviour of the training error, although the previous subdomain may have shown large errors.

Fig. 12
figure 12

Experiment in 5.2.5 Training error over the subdomains for {\(u(0)=\cos (1.2)\); \(v(0)=0.5\); \(w(0)=\sin (1.2)\)}

In conclusion, the SCNF approach can be easily extended to handle systems of initial value problems, even with nonlinear and time invariant characteristics. A qualitative comparison between the SCNF solution and the Runge-Kutta 4 solution shows only very minor visual differences. As mentioned in Sect. 5.2, a quantitative comparison may favour Runge-Kutta 4 in terms of the numerical error in this experiment. However, the Runge-Kutta 4 method is not a symplectic integrator and may not preserve quadratic invariants over time.

6 Conclusion and future work

The proposed CNF and SCNF approaches merging collocation polynomial basis functions with neural networks and domain fragmentation show clear benefits over the previous neural form constructions. We have studied in detail the constant weight initialisation for our novel CNF approach with a basic stiff initial value problem. Depending on the batch learning methods, the collocation-based extension seems to have some benefits for both TSM and mTSM. For the TSM CNF, this effect is more significant than observed for the mTSM extension.

Focussing on mTSM and the CNF approach, using two neural networks, one for learning the initial value and one multiplied by t, seems to have some advantages over other possible mTSM settings. Considering approximation quality as most imperative, we find mTSM with \(\displaystyle m=2\) to provide the overall best results for the investigated initial value problem.

We find that the proposed SCNF approach combines many advantages of the new developments. Employing higher-order CNF methods, it is possible to solve initial value problems over large domains with very high accuracy, and at the same time with reasonable optimisation effort. Moreover, many computational parameters can be fixed easily for this setting, which is a significant issue with other TSM and mTSM variations.

As another important conclusion, in the experiments, we were able to show that we can favour constant weight initialisation over random weight initialisation. Nonetheless, the complexity of the IVP has an impact on the architecture and on the values of the initial constant weights. As we have pointed out in a computational study [18], DE and network-related parameters may not be independent of each other. However, the underlying relation between problem complexity and necessary neural network architecture is yet part of future work.

When focussing on constant weight initialisation, we find a further investigation on how to find suitable (constant) initial weights to be of interest. The same holds for the sensitivity of the neural network parameters.

Future research may also include work on other possible collocation functions and on combining the networks with other discretisation methods. In addition to that, let us note that we find optimal control problems to be a possible and relevant potential field of applications for our method, see, for instance, [26] for recent research in that area.