Collocation polynomial neural forms and domain fragmentation for solving initial value problems

Several neural network approaches for solving differential equations employ trial solutions with a feedforward neural network. There are different means to incorporate the trial solution in the construction, for instance, one may include them directly in the cost function. Used within the corresponding neural network, the trial solutions define the so-called neural form. Such neural forms represent general, flexible tools by which one may solve various differential equations. In this article, we consider time-dependent initial value problems, which require to set up the neural form framework adequately. The neural forms presented up to now in the literature for such a setting can be considered as first-order polynomials. In this work, we propose to extend the polynomial order of the neural forms. The novel collocation-type construction includes several feedforward neural networks, one for each order. Additionally, we propose the fragmentation of the computational domain into subdomains. The neural forms are solved on each subdomain, whereas the interfacing grid points overlap in order to provide initial values over the whole fragmentation. We illustrate in experiments that the combination of collocation neural forms of higher order and the domain fragmentation allows to solve initial value problems over large domains with high accuracy and reliability.


Introduction
Over the last decades several neural network approaches for solving differential equations have been developed [1,2,3].The application and extension of these approaches is a topic of recent research, including work on different network architectures like Legendre [4] and polynomial neural networks [5] as well as computational studies [6,7].
One of the early proposed methods [8] introduced a trial solution (TS) in order to define a cost function using one feedforward neural network.The TS is supposed to satisfy given initial or boundary values by construction.It is also referred to as neural form (NF) in this context [8,9] which we will adopt from here on.Let us note that such NFs represent a general tool that enable to solve ordinary ordinary differential equations (ODEs), partial differential equations (PDEs) and systems of ODEs/PDEs alike.We will refer here to this approach as the trial solution method (TSM).Later, the initial method from [8] has been extended by a NF with two feedforward neural networks, which allows to deal with boundary value problems for irregular boundaries [10] and yields broader possibilities for constructing the TS [9].In the latter context, let us also mention [11] where an algorithm is proposed in order to create a TS based on grammatical evolution.Focusing on initial value problems (IVPs), one approach employs to learn solution bundles [12], making the trained neural forms reusable for enquired initial values.
A technique related to TSM that avoids the explicit construction of trial solutions has been proposed in [13].The given initial or boundary values from the underlying differential equation are included in the cost function as additional terms, so that the NF can be set to equal the neural network output.We will refer to this approach as modified trial solution method (mTSM).
The fact that the neural network output computation resembles a linear combination of basis functions leads to a network architecture as presented in [14] (for PDEs).In that work one hidden layer incorporates two sets of activation functions, one of which is supposed to satisfy the PDE and the second dealing with boundary conditions.The basis function coefficients are set to be the connecting weights from the hidden layer to the output neuron, and the sum over all basis functions and coefficients makes up the NF.
Motivated by the construction principle of collocation methods in numerical analysis, we propose in this paper a novel extension of the NF approach.Our neural form extension is based on the observation, that the NF using one feedforward neural network as employed in [8] may be interpreted as a first order collocation polynomial.We propose to extend the corresponding polynomial order of the neural form.The novel construction includes several feedforward neural networks, one for each order.Compared to a collocation method from standard numerics, the networks take on the role of coefficients in the collocation polynomial expansion.
Furthermore, we aim to approximate initial value problems on fairly large domains.Therefore, and based on the NF structures, we also propose a fragmentation of the computational domain into subdomains.In each subdomain, we solve the initial value problem with a collocation neural form.This is done proceeding in time from one domain fragment to the adjacent subdomain.The interfacing grid points in any subdomain provide the initial value for the next subdomain.On a first glance one may think of similarities to domain decomposition methods for PDEs in numerical analysis, cf.[15,16].We also show how to combine the domain fragmentation with the newly developed collocation polynomial neural forms.
Let us note that this article is a substantial extension of our conference paper [17].There we briefly introduced just the principle of the collocationbased polynomial NF.Here we present a much more detailed investigation of the collocation-based NF, and as a substantial novelty the expansion and combination with our novel domain fragmentation method for solving initial value problems.That enables us to discuss both methods in a merged context, which we find highly suitable.Moreover, we updated the notation to a more suitable form.

Related work and problem statement
In our previous work [18] we have shown that even simple feedforward neural networks (5 hidden layer neurons) are capable of solving a stiff initial value problem.The investigated and studied neural forms approaches are based on [8,13].Please find the additional related work to be addressed in the previous introduction paragraphs.The best results in the above mentioned computational study were provided by random weight initialisation, but with the drawback of a spread between the results with different random initialisation for unchanged computational parameters.There we now want to improve constant weight initialisation.The main advantage of the latter is that results are exactly reproducible with unchanged computational parameters.Based on the neural forms approach [8] we have already shown that a polynomial extension leads to a significant increase of numerical accuracy for constant weight initialisation [17].In the present work we now combine the polynomial neural forms and employ domain fragmentation to split the solution domain into smaller subdomains.This technique solves the neural forms on such subdomains with the initial values provided by previous subdomains.The inherent flexibility enables the approach to achieve useful results, even on fairly large domains.

Setting up the neural form (NF)
In this section, we first recall the TSM and its modified version mTSM, respectively, compare [8,13].Then we proceed with details on the feedforward neural networks we employ, followed by a description of the novel collocation-based neural form and the subdomain approach.

Construction of the neural form
Consider an initial value problem written in a general form as with given initial value u(t 0 ) = u 0 .In order to connect G with a neural network, several approaches introduce a NF as a differentiable function ũ(t, p), where the vector p contains the network weights.With the collocation method we discretise the domain D by a uniform grid with n + 1 grid points t i (t 0 < t 1 < . . .< t n ), so that the initial value problem (1) leads to the formulation Let us note that, in a slight abuse of notation, we identify G t i , ũ(t i , p), u(t i , p) with the vector of corresponding entries (grid points), since this enables to give many formula a more elegant, compact notation.
In order to satisfy the given initial value, TSM [8] employs the NF as a sum of two terms where A(t) is supposed to match the initial condition (with the simplest choice to be A(t) = u(t 0 )), while F (t, N (t, p)) is constructed to eliminate the impact of N (t, p) at t 0 .The choice of F (t, N (t, p)) determines the influence of N (t, p) over the domain.
Since the NF as used in this work satisfies given initial values by construction, we define the corresponding cost function incorporating Eq. ( 3) as which is subject to minimisation.Although Eq. (2) denotes a system of equations which may be solvable w.r.t.p, the actual equation of interest is Eq. ( 4) and its optimisation will return suitable neural network weights.
Let us now turn to the mTSM approach after [13].The mTSM approach chooses the NF to be equivalent to the neural network output directly Since no condition is imposed by the initial value on the NF in this way, the conditions are added to the cost function when relying on Eq. ( 5):

Neural network architecture and optimisation
In this section we will describe how a feedforward neural network with one hidden layer operates in our setting.Specific variants will be addressed in the corresponding sections.
As depicted in Fig. 1 we employ one hidden layer, with H hidden layer neurons supplemented by one bias neuron.Having in addition one bias neuron in the input layer and a linear output layer neuron, the neural network output reads Thereby σ j = σ(z j ) = 1/(1 + e −zj ) represents the sigmoid activation function with the weighted sum z j = ν j t + η j .Here, ν j (input layer neuron), η j (input layer bias neuron), ρ j (hidden layer neurons) and γ (hidden layer bias neuron) denote the weights which are stored in the weight vector p.The input layer passes the domain data t (that is in practice t i ), weighted by ν j and η j , to the hidden layer for processing.The neural network output N (t, p) is again a weighted sum of the values ρ j σ(z j ) with γ added.With N (t, p) given, the neural forms and cost functions in Eqs. ( 4), (6), are obtained.
The architecture of our incorporated feedforward neural network with one hidden layer bias.
As usual, the cost function gradient is used to update p in order to find a (local) minimum in the weight space.One training cycle is called an epoch and consists of a full iteration over all training points points.The gradient, with respect to the network weights, determines their influence on the network output.With this information, each incorporated weight can be adjusted individually to lead the network into a local minimum during the training process.For optimising the cost function we consider here Adam (adaptive moment estimation) [19] which is a stochastic gradient descent method, using adaptive learning for every weight.
If a weight update is performed after the gradient computation for a single grid point we call this method single batch training (SBtraining) here.An alternative proceeding, performing the weight update after a complete iteration over all grid points, averaging the cost function gradient and training error, is denoted here as full batch training (FBtraining).
Let us comment in some detail on the relation between grid points and training points.Our setting is an unsupervised learning framework, where grid points are used for domain discretisation, and where the unknowns are values of the ODE solution at exactly these grid points.Thus, in our setting, the grid points are identical with the training points.Let us stress in this context, that the grid points by themselves stay fixed during network optimisation.

The novel collocation neural form (CNF)
Making Eq. (3) precise for our grid-based setting, a suitable choice for the neural form of TSM with given u(0) = u 0 is where t will be evaluated at grid points t i .Please note, if the initial point is different from t 0 = 0, this results in a shift of t → (t − t 0 ) in Eq. (8).Compared to a first order polynomial q 1 (t) = a 0 + a 1 t one may find similarities in the structure.Motivated by the expansion of an m-th order collocation function polynomial [20] q m (t we are lead to set up our collocation-based NF (CNF) approach for TSM: The weight vector is denoted by p k and we define the matrix P m of m weight vectors P m = (p 1 , . . ., p m ).
The use of higher order monomial powers t k as in Eq. ( 10) not only generalises previous methods, but may also enable better stability and accuracy properties, as we show in this paper.Let us also observe, that the neural networks take on the roles of coefficient functions for the values of t k .We conjecture at this point that this construction makes sense since in this way several possible multipliers (not only t as in Eq. ( 8)) are included for neural form construction.It is important to mention that the new neural form construction Eq. ( 10) fulfills the initial condition.
Let us stress that the proposed ansatz (Eq.( 10)) includes m neural networks, where N k (t, p k ) represents the k-th neural network The corresponding cost function is then given as in Eq. ( 4).We extend the mTSM method in a similar way as we obtained the TSM extension in Eq. ( 10): Thereby the first neural network N 1 (t, p 1 ) is set to learn the initial condition in the same way as stated in Eq. ( 6).
From now on we will refer to the number of neural networks in the neural form as the collocation neural form order, denoted by m.

The novel subdomain collocation neural form (SCNF)
The previous described TSM and mTSM approaches use the IVP structure together with the given initial value in order to train the neural networks on a certain domain.In a prior experimental study [7] we figured out that especially TSM tends to struggle with approximating the solution on larger domains.However, on small domains the numerical error tends to remain small.Since the domain variable t effectively acts as a scaling of N (t, p), we conjecture that a large domain size variation may introduce the need for a higher amount of training points or the use of a more complex neural network architecture.These circumstances motivate us to introduce a second stage of discretising the domain.That is, we split the solution domain D in subdomains D l , l = 1, . . ., h, with n + 1 grid points t i,l in each subdomain.Now the CNF is solved separately in each subdomain.The interfacing grid points overlap, i.e. the computed value ũC (t n,l−1 , P m,l−1 ) at the last grid point of any subdomain D l−1 is set to be the new initial value ũC (t 0,l , P m,l ) for the next subdomain D l .
Since the CNF for TSM is constructed in order to satisfy the given initial values, we force the subdomain CNF (SCNF) to also hold that characteristic.Therefore the SCNF is constructed to satisfy the new initial values in each subdomain, namely The neural networks are now scaled by (t i,l − t 0,l ) k , which in fact may avoid higher scaling factors, depending on the subdomain size.The arising cost function, similar to Eq. ( 4), is Proceeding to mTSM, we also adopt the CNF approach and set the first neural network to learn the new initial values in each subdomain.That is, the SCNF reads and the corresponding cost function Let us note at this point, that G in Eq. ( 14) and ( 16) shares the same structure as the general problem in Eq. ( 1).However, the original solution function u(t) has been replaced by the SCNF ũC (t i,l , P m,l ).Therefore, G involved in the cost function now relies on one or more neural networks, depending on the neural forms order.Once trained, each subdomain has its unique learned weight matrix P m,l which can later be used to recreate the solution or evaluate the solution at grid points intermediate to the training points.
In order to keep the overview of all terms and indices, we sum them up again: The i-th grid point in the l-th subdomain is denoted by t i,l , while t 0,l is the initial point in the subdomain D l with the initial value ũC (t 0,l , P m,l ).That is, t n,l−1 and t 0,l are overlapping grid points.In D 1 , ũC (t 0,1 , P m,1 ) = u(t 0 ) holds.The matrix P m,l contains the set of the m neural network weights in the corresponding subdomain l.Finally, N k (t i,l , p k,l ) denotes the k-th neural network in D l .

Experiments and results
This section is divided into experiments on the collocation neural form (CNF), followed by experiments on the subdomain collocation neural form (SCNF). Prior to this, we will provide detailed information about how the weight initialisation for the different neural networks are realised.The discussion of constant weight initialisation is also one of the main subjects in the experimental section.As stated before, the specific neural network configurations will be addressed in the subsequent experiments.
Weight initialisation with p init const applies to all corresponding neural networks so that they use the same initial values.Increasing the m for the initialisation with p init rnd works systematically.For m = 1, a set of random weights for the neural network is generated.For m = 2 (now with two neural networks), the first neural network is again initialised with the generated weights from m = 1, while for neural network number two, a new set of weights is generated.This holds for all m for higher orders, subsequently, in all experiments.To achieve comparability, the same random initialised weights are used in all experiments.

Experiments on the collocation neural form (CNF)
In this section, we want to test our novel CNF approach with the initial value problem u(t) + 5u(t) = 0, u(0) = 1 (17) which has the analytical solution u(t) = e −5t and is solved over the entire domain D = [0, 2] (without domain fragmentation).The Eq. ( 17) involves a damping mechanism, making this a simple model for stiff phenomena [21].
The numerical error ∆u shown in subsequent diagrams in this section is defined as the l 1 -norm of the difference between the exact solution and the corresponding CNF If we do not say otherwise, the fixed computational parameters in the subsequent experiments are: Weight initialisation Let us comment in some more detail on weight initialisation.The weight initialisation plays an important role and determines the starting point for gradient descent.Poorly chosen, the optimisation method may fail to find a suitable local minimum.The initial neural network weights are commonly chosen as small random values [22].
Let us note that this is sometimes considered as a computational characteristic of the stochastic gradient descent optimisation.Another option is to choose the initialisation to be constant.This method is not commonly used for the optimisation of neural networks since random weight initialisation may lead to better results.However, constant initialisation returns reliably results of reasonable quality if the computational parameters in the network remain unchanged.
As previous experiments have documented [7,8,13], both TSM and mTSM are able to solve differential equations up to a certain degree of accuracy.However, an example illustrating the accuracy of five computations with random weights p init rnd respectively constant weights p init const shows that the quality of approximations may vary considerably, see Table 1.As observed in many experiments, even a small discrepancy in the initialisation with several sets of random weights in the same range, may lead to a significant difference in accuracy.On the other hand, the network initialisation with constant values very often gives reliable results by the proposed novel approach.This motivates us to study in detail the effects of constant network initialisation.

CNF Experiment: number of training epochs
The first experiment shows for different m how the numerical error ∆u behaves depending on the number of training epochs.The diagrams only display every hundredth data point.
In Fig. 2(a) with TSM and p init const results for m = 1 (blue) do not provide any useful approximation, independent of the batch training method selected.With a second neural network for m = 2 (orange) in the neural form, ∆u approximately lowers by one order of magnitude so that we now obtain a solution which can be considered to rank at the lower end of reliability.However, the most interesting result in Fig. 2(a) is m = 5 (green) with the best accuracy at the end of the optimisation process but with the drawback of occurring oscillations.These may arise by the chosen optimisation method.
Table 2 shows the numeric error ∆u(t i ) at individual grid points t i .For both CNF and Runge-Kutta 4 (RK4) the results were computed with ten grid points, resulting in the CNF approach with p init const performing better over Runge-Kutta 4.However, further refining the grid for Runge-Kutta 4 will result in significantly lower ∆u(t i ).
For mTSM with SBtraining and p init const , already m = 1 converges to a solution accuracy that can be considered reliable.However, we observe within Fig. 2(b) that only the transition from m = 1 (blue) to m = 2 (orange) affects ∆u with increasing accuracy, while heavy oscillations start to occur.In not documented results with p init rnd , m has only minor influence on the accuracy.Especially FBtraining for mTSM shows the same trend for both initialisation methods with only minor differences in the last epochs.
Let us note that the displayed results show the best approximations using constant or random initialisation.This means, we obtain the best results for TSM with FBtraining, m = 5 (green) and for mTSM with SBtraining, m ≥ 2, respectively.
Concluding this experiment, we were able to get better results with p init const over p init rnd .Increasing the m to at least order five seems to be a good option for TSM and FBtraining, whereas further m may provide even better approximations.For mTSM we can not observe benefits for m above order 2.
Moreover, we see especially that the increase in the order of the neural form in (10) appears to have a similar impact on solution accuracy as the discretisation order in classical numerical analysis.Investigating the methods concerning different domain sizes provides information on the reliability of computations on larger domains.The domains in this experiment read as D = [0, t end ] and we directly compare in this experiment p init const with p init rnd .In Fig. 3(a), 3(b), we observe TSM from around t end = 3.5 to incrementally plateau to unreliable approximations.Increasing m improves ∆u on small domains and shifts the observable step-like accuracy degeneration towards larger domains.
However, even with m = 5 (green) the results starting from domain size t end = 3.5 towards larger sizes are unreliable.Previous to the first plateau higher m provide significant better ∆u for p init const , while there are only minor changes for p init rnd for the TSM method.This holds for both SBtraining and FBtraining, and one can say that in this experiment TSM works better with p init rnd , even without increasing m.Turning to the mTSM extension, we observe in Fig. 3(c) with SBtraining the existence of a certain point from where different m return equal values, whereas FBtraining returns (close to) equal results for all the investigated domain sizes.However, we see some evidence for the use of m = 2 (orange) over m = 1 (blue) to show an overall good performance.A further increase of m is not necessary with this approach, confirming results from Experiment 5.1.1.
Let us also note that, with mTSM we find that a small domain seems to favour p init const which then provides better results than p init rnd .The behaviour of numerical methods highly depend on the chosen amount of grid points, so that in this experiment we analogously investigate the influence of the number of training points (nTP).In every computation, the domain D is discretised by equidistant grid points.

CNF Experiment: number of training points variation
As in the previous experiments, the m shows a major influence on the results with TSM, and the best approximations are provided by p init const with m = 5 (green) as seen in Fig. 4(a).An interesting behaviour (observed also in a different context in Fig. 2(a)) is the equivalence between m = 3 (yellow) and m = 4 (purple).Both converge to almost exactly the same ∆u, where one may assume a saturation for the m.However, another increase in the order decreases the numerical error again by one order of accuracy.
Turning to mTSM in Fig. 4(b) we again find a major increase in accuracy after a transition from m = 1 (blue) to m = 2.For nTP = 50, values for m ≥ 2 converge to the same results as provided by TSM with m = 5.
Concluding this experiment, we again find evidence that increasing m in the proposed approach provides an improved accuracy for p init const .However, increasing nTP seems not to improve the accuracy from a certain point on, unlike for numerical methods.But one could argue, that the analogy between the number of grid points for numerical methods here is the number of epochs.

Experiments on the subdomain collocation neural form (SCNF)
In Section 5.1, while the test equation is stiff, its solution is at the same time very smooth and the equation is solved on a small domain.However, Fig. 3 The solution is shown in Fig. 5 for t ∈ [0, 15] and incorporates heavily oscillating and increasing characteristics, similar to instabilities.Although our approach is not limited to certain types of IVPs, we find Eq.( 19) to represent possible real-world behaviour and find it suitable to serve as an example IVP.The numerical error ∆u l is now defined as the averaged l 1 -norm of the difference between the exact solution and the corresponding SCNF in each subdomain whereas ∆u averages the numerical error of the subdomains The weight initialisation works as employed in Section 5 and the values are fixed to p init const = 0 and p init rnd ∈ [−0.5, 0.5].In the subsequent experiments, the solution domain is kept constant to D = [0, 15] and the neural networks are training with 1e5 epochs.
In addition we use the method of training the neural networks incrementally which has been employed in [13].That is, we initially train the neural networks for the first grid point, afterwards for the first two grid points.We continue the procedure up to a FBtraining of all grid points in each subdomain.The initial weight initialisation is the same in each subdomain.
Please note at this point, that we provide an explicit comparison to the Runge-Kutta 4 method only in the last experiment of this section.We find that our approach provides by construction a very high flexibility, to deal with many types of initial value problems that may require specific constructions in classic numerical analysis (e.g. by symplectic integration).However, in simple settings we will usually find the network based approach at the moment not competitive to numerical state-of-the-art solvers (w.r.t.computational efficiency) which have been developed and refined over decades.We think that because of the much higher flexibility of the network based tool, this comparison would not be entirely adequate.
As an example for the flexibility, a recent neural network approach [12] makes it superficial to restart the computation of numerical solutions when considering multiple, different initial conditions.We also demonstrate the flexibility in this paper in Section 5.2.5 and show how invariants can be simply added to the cost function.Regarding numerical methods for handling those problems, special constructions are often needed.
In our test example in Eq. ( 19), we find a graphical comparison in the subsequent experiments to not provide further information since the visual differences between analytical solution and solution with Runge-Kutta-4 with adaptive time stepping are minor.
A scaling experiment The original TSM neural form (Eq. ( 3)) is theoretically capable of approximating every continuous function, according to the universal approximation theorem [23].However, Table 3 shows results for a TSM neural form with a single neural network.For different domains we scaled the number of hidden layer neurons linearly and averaged ten computations for each domain with the same computational parameters.The results in Table 3 provide the following message.Increasing the domain size forces the neural network to incorporate more hidden layer neurons and grid points.Indeed, to reach e.g.(averaged) ∆u = 9.5873e-4 for D = [0, 3], learning the neural network required 50 hidden layer neurons and 75 grid points.In general, determining a suitable architecture in terms of the number of hidden layer neurons and training points is a challenging task.
In subsequent experiments we find the SCNF to be able to solve the initial value problem with neural networks including a small fixed amount of hidden layer neurons and training points in each subdomain.At the same time, this allows to define various important parameters in a simple and straightforward way.In the first experiment we compare results of a SCNF with a CNF that is solved over the entire domain.For comparability the total number of training points is constant, namely nTP = 1000 for the red line and nTP = 10 with 100 subdomains for the black/dotted line.However, the comparison of two CNFs with the same architecture would not be meaningful because the domain size has a significant influence.Therefore we decided to realise the CNF (red) with a neural network incorporating 1 input layer bias and 100 sigmoid neurons with 1 bias.The SCNF (black/dotted) features neural networks with 1 input layer bias and 5 sigmoid neurons with 1 bias per subdomain.Both CNF and SCNF incorporate m = 3.In addition we did not increase the domain size incrementally for this experiment, to reduce the number of parameters that prevent comparability.

SCNF Experiment: CNF versus SCNF
The CNF solution (red) shows throughout all experiments in Fig. 6 no useful approximation.In total, the number of hidden layer neurons and training points that would be needed to obtain a useful approximation seems to be much higher.Nonetheless, the SCNF approach (black/dotted) working with the same number of training points was able to solve the initial value problem in a satisfactory way.From a qualitative perspective both TSM and mTSM together with p init const and p init rnd provide similar results.Concluding this experiment, we see that the SCNF method provides a useful solution to the initial value problem.In addition, the incorporated small number of hidden layer neurons enables a much more effective training of the neural networks.
(c) TSM, CNFo=2 (e) TSM, CNFo=3 (f) mTSM, CNFo=3 The ability to approximate the initial value problem with SCNF, depending on different m, is subject to this experiment.Here the SCNFs include 1 input layer bias and 5 sigmoid neurons with 1 bias.The solution domain is split into 60 subdomains with 10 grid points in each subdomain.Here, we employ incremental learning in the subdomains.m t n,l−1 t 0,l 1 2.7834 4.8758 2 0.5763 0.7478 3 0.0846 0.0848 Table 4: l ∞ -norm for mTSM interface grid points, p init const Results for TSM with m = 1 in Fig. 7(a) and mTSM with m = 1 in Fig. 7(b) indicate that the original TSM and mTSM methods are not useful over larger domains, even when employing domain fragmentation.However, the SCNF of first order is able to get back on the solution trend, although several subdomains do not provide correct approximations.In total, both solutions for m = 1 (especially mTSM) cannot be considered to be reliable.
That changes for m = 2, at least for TSM in Fig. 7(c).Here we find, with the exception of some local extreme points, the SCNF to be a reasonable approximation of the initial value problem.This statement however, does not hold for mTSM.Although the general trend now is much closer to the analytical solution, there are still subdomains which do not approximate the solution well.
Results shown in Table 4 represent the l ∞ -norm of differences between analytical and computed solution, for mTSM as displayed Fig. 7, measured at the last grid points in D l−1 , namely t n,l−1 , and the corresponding initial points in D l , t 0,l .We propose to consider this measure, since it indicates how well the solution can be met over the subdomains.We find that increasing m has a major influence on the accuracy.
We conjecture that learning the subdomain initial values becomes easier for mTSM, the more neural networks are incorporated.That is mainly because the first neural network can so to say focus on learning the initial values, while the other networks are more engaged with the IVP structure.We think that this conjecture can be confirmed by the decreasing discrepancy between the overlapping at the interfaces for higher orders of m.
The overall best solutions here are provided by m = 3 (Fig. 7(e),7(f)) for both TSM and mTSM in this experiment.
We tend to favor TSM over mTSM, since the initial value in each subdomain is satisfied by the corresponding SCNF (where the learned value at t n,l−1 is set to be the initial value for t 0,l ) and does not have to be learned again.

SCNF Experiment: number of subdomain variation
In this experiment we investigate the influence of the total number of subdomains on the numeric error ∆u.Fig. 8 shows the behaviour for p init rnd (blue) and p init const (yellow).The SCNF incorporate m = 3, 1 input layer bias, 5 sigmoid neurons with 1 bias and 10 grid points in each subdomain.We again employ incremental learning in the subdomains.
Let us first comment on the SCNF for TSM in Fig. 8(a).Despite minor differences between the solutions corresponding to p init rnd and p init const for smaller numbers of subdomains, both initialisation methods show a very similar trend.A saturation regime seems to appear for around 350 subdomains with ∆u ≈ 1e-5.Turning to mTSM in Fig. 8(b), we again observe a similar behaviour between the methods with p init rnd and p init const .Although the differences disappear not before larger numbers of subdomains.We find that even at 400 subdomains the numerical error ∆u can not compete with TSM here.
Let us note again, that the chosen weight initialisation approach for p init rnd (see Section 5) means that the random weights are initialised in the same way in each subdomain.In undocumented tests we observed that the results may show slight to significant variations, when the random weights are generated independently for each network over the subdomains.However, the results we have shown here using p init rnd represent a rather typical trend observed in the results.
In conclusion, one can obtain very good approximations with the TSM SCNF approach for both weight initialisation methods.That means, choosing p init const over p init rnd has no downsides, which leads us to again support the use of constant weight initialisation.

SCNF Experiment: numerical error in the subdomains
The last experiment investigates the numeric error ∆u l in each subdomain D l , depending on different m.Again, the SCNFs feature 1 input layer bias and 5 sigmoid neurons with 1 bias.The solution is computed with 100 subdomains together with 10 grid points each and incremental learning in the subdomains.
Throughout Fig. 9, m = 1 shows the least good results.Although, if we compare mTSM with p init const in Fig. 9(b) to results for 60 subdomains in Fig. 7(b), increasing the domain fragments by 40 subdomains seems to prevent the solution from diverging.Random weight initialisation works better for m = 1, especially with TSM.
Solutions provided by m = 3 and m = 5 are much better than for m = 1, and increasing the order clearly tends to increase the accuracy.For TSM with both m = 3 and m = 5, as well as for mTSM with m = 5 from a certain subdomain on, the numerical error saturates.Let us note, that for both p init and p init rnd the differences in the overall numerical error ∆u are not significant in these cases.
In this experiment, we again tend to favour TSM with p init const .Although m = 1 does not work well, the other shown higher orders provide good approximations with saturation regimes.The results confirm our preference of constant initialisation, because p init const does not depend on a good generation of random weights by chance.

SCNF Experiment: system of initial value problems
In this section we study a non-linear system of initial value problems.The example system we consider reads with where I u , I v , I w are non-zero real numbers, and given initial values for u, v, w.
The equations describe the angular momentum of a free rigid body [24,25] with the centre of mass at the origin.These coupled initial value problems are often denoted as Euler equations and feature time invariant characteristics since the independent variable t (time) does not explicitly appear on the right-hand side.
The quadratic invariant expression conserves the so-called magnitude and describes a sphere, while another quadratic invariant conserves the kinetic energy and describes an ellipsoid.Both invariant quantities force the solution to stay on the intersection formed by the sphere and the ellipsoid.Since Eqs. ( 27) and (28) remain unchanged over time along the solutions, we have The corresponding initial values u(0), v(0), w(0) are given as in Fig. 10 and the fixed principle moments of interior (see [24], Section 14.3) have the values assigned.In general, the neural network approach allows given invariant expressions to be directly added to the cost function, due to its flexibility.For systems of initial value problems, the cost function can be obtained by assigning each solution function its own SCNF and sum up the retrieved l 2 -norms, here together with the invariant quantities: In order to visualise the invariant behaviour, the computed results in Fig. 10 are obtained for t ∈ [0, 30], which allows the solution to pass its own initial points more than once.The solution domain is fragmented into 40 subdomains with ten training points in each subdomain.Due to overlapping grid points at the subdomain intersections, the total number of unique training points is nTP = 361, the same amount was used to obtain computational result with Runge-Kutta 4. Each SCNF has m = 3 and therefore features three neural networks involved.The latter are initialised with zeros and other training parameters and methods remain unchanged (see Section 5.2).
In Fig. 10, we display both the Runge-Kutta 4 (coloured/solid) and the SCNF (black/dots) solution.As mentioned above, the curves lay on intersections formed between a sphere and an ellipsoid.Changing the point of view in Fig. 10 reveals the contours of intersections between the corresponding spheres and ellipsoids for the different initial conditions.In conclusion, the SCNF approach can be easily extended to handle systems of initial value problems, even with nonlinear and time invariant characteristics.A qualitative comparison between the SCNF solution and the Runge-Kutta 4 solution shows only very minor visual differences.As mentioned in Section 5.2, a quantitative comparison may favour Runge-Kutta 4 in terms of the numerical error in this experiment.However, the Runge-Kutta 4 method is not a symplectic integrator and may not preserve quadratic invariants over time.

Conclusion and future work
The proposed CNF and SCNF approaches merging collocation polynomial basis functions with neural networks and domain fragmentation show clear benefits over the previous neural form constructions.We have studied in detail the constant weight initialisation for our novel CNF approach with a basic stiff initial value problem.Depending on the batch learning methods, the collocation-based extension seems to have some benefits for both TSM and mTSM.For the TSM CNF, this effect is more significant than observed for the mTSM extension.Focusing on mTSM and the CNF approach, using two neural networks, one for learning the initial value and one multiplied by t, seems to have some advantages over other possible mTSM settings.Considering approximation quality as most imperative, we find mTSM with m = 2 to provide the overall best results for the investigated initial value problem.
We find that the proposed SCNF approach combines many advantages of the new developments.Employing higher order CNF methods, it is possible to solve initial value problems over large domains with very high accuracy, and at the same time with reasonable optimisation effort.Moreover, many computational parameters can be fixed easily for this setting, which is a significant issue with other TSM and mTSM variations.
As another important conclusion, in the experiments we were able to show that we can favour constant weight initialisation over random weight initialisation.Nonetheless, the complexity of the IVP has an impact on the architecture and on the values of the initial constant weights.As we have pointed out in a computational study [18], DE and network related parameters may not be independent of each other.However, the underlying relation between problem complexity and necessary neural network architecture is yet part of future work.
When focusing on constant weight initialisation, we find a further investigation on how to find suitable (constant) initial weights to be of interest.The same holds for the sensitivity of the neural network parameters.
Future research may also include work on other possible collocation functions and on combining the networks with other discretisation methods.In addition to that, let us note that we find optimal control problems to be a possible and relevant potential field of applications for our method, see for instance [26] for recent research in that area.

Figure 11 :
Figure 11: Experiment in 5.2.5 Top view of Fig. 10 to visualise the spherical and ellipsoidal intersections, colours are adopted from Fig. 10

Table 2 :
Numerical error comparison at individual grid points with m = 5 and p init const

Table 3 :
TSM neural form, p init rnd