Optimal Parameters for Numerical Solvers of PDEs

In this paper we introduce a procedure for identifying optimal methods in parametric families of numerical schemes for initial value problems in partial differential equations. The procedure maximizes accuracy by adaptively computing optimal parameters that minimize a defect-based estimate of the local error at each time step. Viable refinements are proposed to reduce the computational overheads involved in the solution of the optimization problem, and to maintain conservation properties of the original methods. We apply the new strategy to recently introduced families of conservative schemes for the Korteweg-de Vries equation and for a nonlinear heat equation. Numerical tests demonstrate the improved efficiency of the new technique in comparison with existing methods.


Introduction
Many highly effective methods for initial value problems in partial differential equations appear as parametric families of numerical schemes.These include exponential splittings [8,23], where the free parameters constitute the coefficients of the splitting, and rational Krylov methods [19], where the free parameters are the poles of rational approximants.
A new technique that uses symbolic algebra to develop bespoke finite difference methods that preserve multiple local conservation laws has been recently introduced in [14].This approach has been further refined in [15], and new families of conservative schemes have been introduced for a range of partial differential equations (PDEs) in [13][14][15][16].These numerical schemes feature certain free parameters that can be arbitrarily chosen without compromising the preservation of the conservation laws.
A convenient choice of the free parameters yields numerical solutions with superior accuracy in all these cases.Coefficients of exponential splittings are typically determined a-priori using algebraic means in the pursuit of high order accuracies [25] and may be specialized for specific PDEs [27].Optimal pole selection for rational Krylov methods remains an active area of research and strategies include a-priori choices based on analytical reasoning [17] and a-posteriori fitting [7].Optimal parameters for the finite difference methods in [13][14][15][16] are identified using a brute force sweep through the entire parameter space, and comparisons against reference solutions show that suitable choices of the parameters yield errors up to 20 times smaller than existing methods for the proposed benchmark tests.
In practice, the optimal choice of such free parameters depends heavily on initial conditions and may also vary with time-step.Consequently, while the results in [13][14][15][16] do highlight the potential advantages of choosing good parameters, there is no known algorithm for identifying them.In order to overcome this issue we propose here a new approach for adaptively identifying optimal parameters for families of numerical schemes for PDEs, where convenient values are not known a priori.
For obtaining estimates of the optimal parameters we adaptively minimize an estimate of the local error introduced by the time integrator.In order for this approach to be effective, we assume throughout the paper that the spatial approximation is accurate and that the error in the solution is mainly due to the time discretization.This is not too restrictive an assumption, as in many instances PDEs are approximated very accurately in space, using for example spectral semidiscretizations.In the case of finite difference schemes, this amounts to either considering higher order discretizations in space, or restricting attention to cases where ∆t ≫ ∆x.Large time-steps reduce computational expenses and are generally desirable, except for potential stability concerns.In particular, ∆t ≫ ∆x is a typical setting when using implicit schemes.
In the proposed approach, at each time-step of a single step numerical scheme, we seek to compute the optimal parameters that minimize the local error.This requires a reasonably accurate but inexpensive estimate of the local error and its dependence on the parameters.For an a posteriori estimate of the local error, we resort to the "defect" based approach outlined in [6].In the context of backward error analysis, the defect measures the discrepancy between the differential equation satisfied by the numerical solution and the original equation [26].Defect based error estimates have been utilized widely in the development of time-adaptive methods for ordinary differential equations (ODEs) [12,21] and PDEs [3,4,6], but, to the best of our knowledge, these have not been employed for the estimation of optimal parameters.Unlike adaptive techniques for choosing time-steps, where the local error can be assumed to decrease monotonically with the time-step, in the proposed approach an optimization problem needs to be solved for finding the values of the parameters.The optimization problem for minimizing the local error estimate is solved in an efficient manner by computing the defect on a coarser (but still accurate) spatial grid, and utilizing an iterative method with a Gauss-Newton approximation to the Hessian for achieving locally quadratic convergence.
We apply the new procedure to families of schemes introduced in [14] for the Korteweg de Vries (KdV) and a nonlinear heat (NLH) equation.The main feature of these schemes is that each of them preserves a specific discretization of a conservation laws.However, since the discrete conservation laws also depend on the parameters, these cannot be preserved by using an adaptive approach.Where conservation of these properties is of paramount importance, we suggest a more conservative version of the algorithm that uses fixed parameters derived from a sequence of values obtained adaptively.The resulting schemes have significantly higher accuracy with moderate overheads.Despite the defect based approach being asymptotic in the time-step, ∆t, in practice the procedure also works well in the large time-steps regime and, in some cases, also confers a notable stability advantage.
In Section 2 we discuss the validity of a defect based approximation of the local error for the purpose of adaptively identifying optimal parameters.In Section 3 we outline the defect based approach used for finding optimal values of free parameters in a numerical scheme, and introduce the two algorithms briefly outlined above.In Section 4, we apply the new techniques to families of conservative schemes introduced in [14] for the KdV equation and the NLH equation, giving explicit expressions for the defect.In Section 5, we show numerical results that demonstrate the effectiveness of the proposed algorithms in finding good estimates of the optimal parameters, together with their higher accuracy and efficiency in comparison to a default choice of the parameters and other schemes from the literature.

Defect based approximation of local error
We consider a PDE, written as an Initial Value Problem on the Hilbert space H, where A : H → H. Boundary conditions and non-autonomous PDEs can also be incorporated into our approach in a straightforward manner, as demonstrated with concrete examples in Section 4.2.Following spatial discretization, the solution of (2.1) is approximated by the solution of the system of ODEs, where here and henceforth Dz denotes the total derivative with respect to z, and u(t) represents a finite dimensional approximation of u(t).For instance, this could involve a finite difference approximation on a uniform grid on the domain [a, b] with Dirichlet boundaries, Let T be the final time of integration, tn = n∆t, n = 0, . . ., N, and ∆t = T /N be the time nodes and stepsize, respectively, um,n an approximation of u(xm, tn), and un the column vector whose m-th entry is um,n.
The exact solution of (2.2) is described by the flow Similarly, a single step numerical scheme for (2.2) can be described by the numerical flow, un+1 = Φ(∆t, un).
Note that the numerical flow Φ also exists for implicit methods, even if not specified in an explicit form.
In this manuscript, we consider numerical schemes in the form where Ω is a compact subset of R K , and Φ depends on a vector of free parameters χ ∈ Ω that effect the accuracy of the scheme.In our theoretical discussion we assume that the vector field A, the exact flow E and the numerical flow Φ are smooth with respect to all arguments.The local error in the numerical method (2.3) is defined as In general, L(∆t, un, χ) is not a computable quantity since the exact solution E (∆t, un) is not available in practice.Consequently, we resort to defect-based approximations [5,6] to obtain a posteriori estimates.The defect or residual of Φ, quantifies the extent to which the numerical flow Φ fails to satisfy (2.2).For nonlinear parabolic PDEs and time-reversible equations, the following nonlinear variationof-constant formula holds true [4,11].Henceforth, ∂ k f denotes the Fréchet derivative of a function f with respect to the k-th argument.
Lemma 1 (Gröbner-Alekseev formula) The analytical solutions of the following initial value problems u(0) = u0, are related through the nonlinear variation-of-constants formula Applying Lemma 1 to (2.2) and (2.5) yields the following formula for the local error (2.4), The following theorem shows that if (2.3) is a method of order p, then the quantity is an asymptotically correct estimator of the local truncation error (2.4), uniformly for χ ∈ Ω. Theorem 2 The error estimate (2.7) is such that uniformly for any value χ ∈ Ω, i.e. with C independent of χ.

Defect based identification of optimal parameters
In this section, we propose the use of the defect based error estimate (2.7) for finding, at every time-step, optimal parameters χ * n ∈ Ω ⊂ R K , defined as where un and ∆t are fixed.The result of Theorem 2 assures us that and guarantees that the choice of parameters χ * n keeps the true local error, L, close to L(∆t, un, χ * n ) in the asymptotic limit ∆t → 0 and, therefore, small.Remark 1 The application of defect based error estimates for choosing optimal parameters differs from their application in context of time-adaptivity in a couple of crucial aspects.
1. Since L is neither monotonous in χ, nor are we interested in asymptotic limits for small χ (unlike the case of ∆t in context of time-adaptivity), the defect based estimate L(∆t, un, χ) needs to be computed for multiple values of χ within an optimization routine.
2. The perturbation of L by a term ρ independent of χ has no effect on χ * .This is in contrast to time-adaptivity where we seek the largest ∆t * such that L(∆t * , un, χ) 2 < δ for some user specified tolerance δ, and ρ = 0 effects the choice of ∆t * .
The first observation in Remark 1 suggests that the application of defect based estimates for choosing optimal parameters can be prohibitively expensive.However, the second suggests that we can resort to inexpensive approximations of the defect and still hope to arrive at a good choice of parameters.In Section 3.2, we see that under reasonable assumptions, the number of optimization steps is not expected to be large and just a few steps of Gauss-Newton iteration are ever required.In the large time-step regime the approximation of defect on a coarse spatial grid also proves to be sufficient for the purposes described here.Overall, this leads to a procedure for identifying optimal parameters with very reasonable overheads, producing highly efficient schemes.

Optimization procedure
In practice, we minimize the square of the defect, using the gradient, g = (DχR(∆t, un, χ)) ⊤ R(∆t, un, χ), where DχR is the Jacobian of the defect with respect to χ, and a Gauss-Newton approximation to the Hessian, ∇ 2 f ≈ HGN := (DχR(∆t, un, χ)) ⊤ (DχR(∆t, un, χ)) . (3.3) Utilizing the Gauss-Newton Hessian in the context of trust region algorithms yields a sequence of parameters χ k n that quadratically converge to the optimal χ * n , with reliable global convergence properties [24].At the same time, the procedure remains relatively inexpensive for a small number of parameters, K, since we only need to compute the first derivatives of the defect with regards to χ.These can be computed either analytically or approximately using finite differences.
The defect, R(∆t, un, χ k n ), is computed using (2.5).This requires the computation of a temporary solution, u k n+1 = Φ(∆t, un, χ k n ), and of D∆tΦ(∆t, un, χ k n ), the latter of which can be computed analytically as outlined with concrete examples in Section 4.
Note that, in general, at each iteration a trust region algorithm may be used to compute R at candidate parameters χ = χ k+1 n before deciding to accept or reject the candidate and/or update the trust region radius ∆ k n .For a detailed introduction to trust region algorithms, we refer the reader to [24].

Practical considerations for efficiency
The evaluations of defect can be very expensive, as they require the computation of the temporary solution u k n+1 at every iteration.Each of these is as expensive as a step of the original numerical solver Φ.However, in practice we identify χ * n by optimizing the defect on a coarse spatial grid, resorting to the fine computational grid only for evaluating un+1 once χ * n has been identified.
The coarse grid is obtained as a subgrid of the fine grid with resolution r∆x, with r an integer that divides M + 1.Let Pr denote the projection operator from the fine grid to the coarse grid.At the k-th iteration of Gauss-Newton algorithm, we evaluate the defect (2.5) on the coarse grid, as This requires the computation of u k n+1 = Φ(∆t, un, χ k n ).On a grid with resolution r∆x, the dimension of the problem is reduced by a factor r.This typically leads to a significant speedup in the computation of χ * n .This speedup is expected to be particularly pronounced in 2 or 3 dimensional problems, where the coarse grid is smaller by factors of r 2 and r 3 , respectively, than the finer computational grid.
For a method of order p in space and time, a O(r p ∆x p ) term of error is introduced in the evaluation of the defect.This error is negligible if r∆x ≪ ∆t and is not expected to have a significant effect on the estimate of the optimal parameters χ * n in light of Remark 1.
Remark 2 With larger values of r, the additional cost of identifying χ * n becomes marginal, while the advantages of identifying good parameters could still be significant.We can expect, however, that once r is large enough such that r∆x ≪ ∆t is no longer valid, spatial discretization errors will start to dominate and the computation of defect may become too inaccurate to be useful.In light of these observations, as a rule of thumb, the largest r we recommend is the largest divisor of M + 1 such that r < ∆t/∆x.Further gains can be obtained by exploiting temporal smoothness of the optimal parameters χ * .If the solution u(t) is smooth in time, it is reasonable to assume that the optimal parameters are described by a Lipschitz function χ Thus, the previous value of the optimal parameter serves as a good first guess for the next time-step, χ 0 n = χ * n−1 .With C and ∆t small enough, χ 0 n can be expected close enough to χ * n so that conditions of trust-region are guaranteed to be satisfied and quadratic convergence is guaranteed.In such a case, it suffices to use the simple Newton-type iteration, in place of the trust-region algorithm, where g and HGN are given by (3.2) and (3.3), respectively.In practice, just a couple of steps of (3.4) suffice, with the exception of the first iteration (n = 0), when the arbitrary value of the initial guess may be very far from the optimal one, requiring more optimization steps and the use of trust-regions.Gauss-Newton algorithm is iterated until the stopping criterion, is satisfied for a suitable tolerance.Then we set χ * n = χ k+1 n and the solution at the next time-step is obtained on the finer computational grid as The overall procedure introduced in this section is summarized in Algorithm 3.1.

Modifications for conservative schemes
Algorithm 3.1 is very effective in improving accuracy of standard numerical methods with free parameters and of geometric integrators that preserve a structure that is independent of the parameters.However, the schemes described in [14] preserve some conservation laws that depend on the parameters.Therefore, a time adaptive approach undermines the preservation of these conservation laws.
In order to obtain numerical solutions that satisfy the physical constraints given by these conservation laws, we modify Algorithm 3.1 as follows.
1. Project the initial condition on a grid with resolution r∆x.
2. Find the optimal χ * 0 and use it to advance a step in time on the coarse grid.Iterate this step till the final time, obtaining the optimal parameters χ * n at n = 0, . . ., N − 1 steps.
We summarize this conservative version of our procedure in Algorithm 3.2.Note that the numerical solution obtained on the coarse grid, un, is used only for estimating the optimal parameters and later discarded.
Algorithm 3.2: Identifying fixed optimal parameters Since our estimate of the optimal parameters in Algorithm 3.2 relies on the solution of the problem on a coarser grid, this algorithm is also affected by the accumulation of errors in space, which may be particularly pronounced for large values of r.On the other hand, the parameters χ * n need not be identified with too high an accuracy since we are only interested in a good average choice, χ * .
Considering the average parameters for solving the problem on the fine grid is a good option particularly for solutions with simple and smooth dynamics (e.g.travelling waves) as the values obtained at the different time-step are all reasonably close to each other.
Remark 3 Note that at the end of step 2, the full sequence of optimal parameters, χ * 0 , . . ., χ * N−1 , and local error estimates are available to the user, and it is possible to consider alternative options than the average to derive different single fixed values of the parameters.

Approximation of defect for specific schemes
In this section we consider the application of the proposed approach to two partial differential equations -the KdV equation and a nonlinear heat equation -with suitable initial and boundary conditions.In particular, we present the computation of defect (2.5) for numerical schemes introduced in [14], which is required in Algorithms 3.1 and 3.2.
We define the forward shifts in space and time, respectively, the forward difference and average operators, and the centred difference operator, approximating the space derivatives of degree 2k and 2k − 1, respectively, with second order of accuracy.Action of these operators on vectors is defined entrywise.Moreover, we denote with • the Hadamard product whose action is entrywise multiplication of vectors.
For the two equations considered here, families of second order finite difference methods that depend on one or more free parameters have been introduced in [14].We consider in this section some of these families of schemes.

Remark 4
We restrict attention to the setting where ∆t > ∆x, and the parameters χ featured in the numerical schemes from [14] are O(∆t 2 ).These small parameters χ correspond to perturbation terms that have no counterpart in the continuous problem, and whose contributions vanish in the limit ∆t → 0. Thus, it is reasonable to restrict our search to a neighbourhood of 0 of size O(∆t 2 ), Ω ⊂ B C∆t 2 (0; R K ), for some constant C. Therefore, when applying the algorithms outlined in Section 3 we can expect that χ = 0 is a reasonable initial guess, and we can find convenient values of the parameters by using the simple iteration (3.4) without the aid of trust region methods.

Remark 5
The schemes considered here are all implicit.We assume that J , the Jacobian operator defined by the partial derivatives of the scheme with respect to un+1, is never singular.Solutions can then be obtained by iteration of Newton's method.
An important property of the numerical schemes considered here is that they preserve some conservation laws.Conservation laws are defined as total divergences, DxF + DtG, that vanish on solutions of the PDE.The functions F and G are the flux and the density of the conservation law and depend on x, t, u and its derivatives.The methods in [14] preserve second order approximations of specific conservation laws in the form D∆x F (x, un, un+1, χ) + D∆t G(x, un, χ) = 0, where here and henceforth tildes represent approximations of the corresponding continuous terms, and x is the column vector whose m-th entry is xm.

Schemes for KdV equation
In this section we apply the approach outlined in Section 3 to parametric families of schemes for the KdV equation, ut + 1 2 u 2 + uxx x = 0, (4.1) with initial condition u(x, 0) = u0(x).For simplicity, we restrict attention to periodic or zero boundary conditions.However, as shown in Section 4.2, the entire discussion can also be adapted to boundary conditions of a different type.The KdV equation has infinitely many independent conservation laws.The first three, in increasing order, , describe the local conservation laws of mass, momentum and energy, respectively.For this equation we define the semidiscrete operator A in (2.2) in the most natural way, as The results obtained are independent of the particular form of this operator under spatial discretization, as we work under the assumption that the leading source of error is given by the time integration.
Optimal values of α are then computed according to (3.1), where the defect is calculated according to (2.5) with (4.3) and (4.6).Therefore, EC(α) also preserves the following approximations of the global mass and energy: These two global invariants are independent of α, and therefore they are conserved by both algorithms introduced in Section 3.

Momentum conserving methods
A two-parameter family of mass and momentum conserving schemes described in [14] is Solutions of MC(β, γ) satisfy the local mass conservation law given by (4.7), and the local momentum conservation law, where Note that these schemes are fully implicit.Substituting (2.3) in (4.7) and differentiating in time gives, The defect and the local error estimate (2.7) are then computed using (4.3) and (4.9).

Schemes for a nonlinear heat equation
In this section we consider the nonlinear heat equation, with initial condition and Dirichlet boundary conditions, u(x, 0) = u0(x), u(a, t) = ϕL(t), u(b, t) = ϕR(t).
Equation (4.10) has only two independent conservation laws, We denote with CS(λ) the one-parameter family of methods described in [14], given by D∆t(un + λD2,∆xun) = 1 2 D2,∆x(un • un+1). (4.12) The scheme CS(λ) has the following discrete versions of the conservation laws (4.11), D∆x F2 + D∆t G2 = 0, (4.14) In order to evaluate the defect, we consider a centred difference space discretization of (4.10) in the form (2.2) with where ej is the j-th unit vector, and we define D2,∆x = D2,∆x|ϕ L =ϕ R =0 in order to isolate the contribution of the boundary conditions.The methods in (4.12) are linearly implicit so they can be written in the form (2.

Numerical examples
In this section we consider a range of benchmark problems for the KdV equation (4.1) and the nonlinear heat equation (4.10), and investigate the performance of the methods described in Section 4 with optimal parameters obtained by the two algorithms introduced in Section 3.
Comparisons between different numerical schemes are based on: • Relative error in the solution at the final time t = T , defined as where • denotes the discrete L 2 norm and uexact is the solution of (2.1).
• Error in the variation of the global densities.If the method preserves the k-th conservation law, D∆x F k (x, un, un+1, χ) + D∆t G k (x, un, χ) = 0, the error in the global variation of G k is defined as where ej denotes the j-th column vector of the standard basis of R M +1 , and 1 ∈ R M +1 is the column vector with all entries equal to one.When the boundary conditions are periodic, we consider instead the maximum error on the k-th global invariant, defined as • Computational cost, measured in terms of computation time in seconds.For each of the family of schemes described in Section 4, we consider the following choices of the vector of free parameters: • χ = 0, default choice, fixed at each step.
• χ = χ * , globally optimal fixed value that minimizes the solution error for this specific problem.This value is obtained by brute force search based on empirical comparisons with the exact solution and is not available a priori.Thus, this choice of parameters does not constitute a reasonable numerical algorithm and we do not provide the computation time for it since it is excessive.
• {χ * r,n } n 0 , sequence of values that minimize the local error at each time-step obtained using Algorithm 3.1 projecting on a grid with resolution r∆x.
• χ = χ * r , fixed value obtained using Algorithm 3.2 with projection on a coarse grid with spatial resolution r∆x.
As discussed in Remark 4, we apply both Algorithm 3.1 and Algorithm 3.2 with the simplified Gauss-Newton step (3.4).
For all the experiments in this section, ∆t and ∆x are such that 4 < ∆t/∆x < 10.In order to verify the validity of the observations in Remark 2, we apply the proposed algorithms with r = 1, 2, 4, 10.

KdV equation
In this section we solve the KdV equation (4.1), comparing schemes in Section 4.1 with different choices of the parameters against schemes known in literature -namely, the multisymplectic method, and the narrow box scheme, . Both of these schemes preserve a discrete conservation law of the mass, given by their definition, but not of the momentum or the energy.These schemes are more compact than those defined in Section 4.1 and not centred on the grid.Therefore, we evaluate the error in the global momentum and energy according to (5.2) with For methods EC(α) (resp.MC(β, γ)) we evaluate the error (5.2) in the conservation of the momentum (resp.the energy) with F2 and G2 (resp.We first find estimates for the optimal parameters of the schemes EC(α) and MC(β, γ) by applying the two new algorithms.In Figure 5.1 we show the sequences of optimal values given by Algorithm 3.1 and Algorithm 3.2, respectively, for the scheme EC.Similarly, in Figure 5.2 we show the sequences obtained for the scheme MC.
Since the solution only travels along the same direction with constant speed, after a few initial steps the sequences of parameters stabilize around constant values.
The sequences obtained by the two algorithms with r = 1, 2, 4, are all very close to each other (within a maximum distance of 5 • 10 −3 ).In agreement with Remark 2, those obtained with r = 10 > ∆t/∆x show the effect of the space error on the coarser grid.This shows that the accuracy in the solution can be compromised when r is too large.Nevertheless, these parameters are not too far from the values obtained on the computational grid, and still show a good level of accuracy in all cases, further reducing the computational overheads involved in the solution of the optimization problem.
In Table 5.1 we compare different schemes.The results obtained show that:   • Schemes EC and MC with fixed values of the parameters exactly preserve two conservation laws, and therefore two global invariants.
• According to Remark 6, due to the conservative boundary conditions, the sequence of schemes EC(α * r,n ) n 0 preserves the global mass and the global energy.Similarly, according to Remark 7, schemes MC(β * r,n , γ * r,n ) n 0 preserve the global mass but not the global momentum.
• The computational cost of both of the proposed algorithms decreases as r increases.With respect to solving the optimization problem on the same grid of the differential problem (r = 1), the overall computational time reduces to less than a half when r = 2, and to less than a fourth when r = 4, making the computation cost of the methods proposed in this paper comparable to that of the other schemes in literature.As expected, setting r = 10, the computational time further reduces.
• All the approximations obtained with the sequences of parameters given by Algorithm 3.1 and Algorithm 3.2 are more accurate than the solutions of the schemes from the literature and of EC(0) and MC(0,0).However, schemes EC(α * ) and MC(β * , γ * ), where the parameters are obtained by brute force, are more precise.This shows that there exist sequences of parameters yielding higher accuracy than those obtained using the new al- gorithms.This is due to the fact that our approach is based on a minimization of (2.7) as an estimate of the local error.This improves local error but does not necessarily lead to a minimization of the global error.
• The accuracy of the solutions obtained using the two new algorithms is only marginally affected by the different choices of r < ∆t/∆x.However, for r = 10 > ∆t/∆x the spatial error has a visible effect on the sequences of parameters obtained, as is evident in Figures 5.1 and 5.2.In this case, the accuracy in the solution is only marginally affected but in general this may not be the case.In agreement with Remark 2, setting r = 4 gives the best compromise between reliability and speed of computation.
• Algorithm 3.1 and Algorithm 3.2 with r = 4 are significantly more efficient than the schemes from the literature: while computation times are comparable, the solution error is roughly three times lower.
In the left plot of Figure 5.3 we show the initial profile and, as an example, the solution of EC(α * 4 ) at the final time.In the right plot we show the absolute error given by EC(α * 4 ) at every point, in comparison with EC(0), EC(α * ), and the narrow box scheme.For all schemes, the bulk of the error is detected around the final location of the soliton and is due to a delay introduced by all numerical schemes.However, the maximum error introduced by EC(α * 4 ) is less than the 40% of that given by EC(0) and by the narrow box scheme, and only slightly larger than the error given by EC(α * ).
As a second numerical test we consider the interaction of two solitons over [a, b] = [−30, 30] and till time T = 15.The initial condition is obtained from the exact solution on R, In Figures 5.4 and 5.5 we show the sequences of parameters obtained for the schemes EC and MC, respectively.For sake of clarity, we omit in Figure 5.5 the sequences obtained for r = 2 as these would be indistinguishable from those obtained setting r = 1.Again, all the sequences obtained by solving the optimization problems on a grid up to four times coarser are very close (within a distance of 6 • 10 −3 ) to those obtained on the computational grid.
We notice that the parameters rapidly change when the solitons interact, but only slowly variate before and after the interaction.
In Table 5.2 we compare the accuracy, efficiency and conservation properties of different schemes.The same observations done for the case of the motion of a solitary wave hold also in this case.As before, the computational times of the new algorithms with r = 4 are comparable to those of the methods from the literature, and their greater efficiency is evident through solution errors that are much lower.
In the left plot of Figure 5.6, we show the initial condition together with the solution of the sequence MC(β * 4,n , γ * 4,n ) n≥0 .In the right plot, we show the pointwise error of MC(β * 4,n , γ * 4,n ) n≥0 , MC(0,0), and MC(β * , γ * ) at the final time.We omit the results for the multisymplectic and the narrow box scheme, as they are similar to that of MC(0,0).The methods obtained using the approaches introduced in this paper are very accurate around the final location of the faster soliton, where the bulk of the error given by MC(0, 0) and by the schemes from the literature is located.However, the error around the slower soliton is larger and small oscillations can be seen far from the solitons.MC(β * , γ * ) gives a slightly smaller error around the slower soliton, but more oscillations are introduced.

Nonlinear heat equation
In this section we apply the new algorithms to the nonlinear heat equation (4.10) using the methods CS(λ) from Section 4.2.We consider here two benchmark problems with (weak) energy solutions that are not classic solutions, having at least one point of non differentiability.Although in Section 3 smoothness of the solution is assumed, we show that the strategies introduced in this paper are also effective in this setting.
In order to converge, explicit and implicit finite difference methods for (4.10) in literature typically require ∆t = O(∆x 2 ) and ∆t = O(∆x), respectively [9,10,18,20,22].Under such small time-step restrictions, the Crank-Nicolson method applied to the semidiscretization (4.15) turns out to be very accurate and efficient.However, it fails to converge for the two benchmark tests in this section with ∆t > ∆x.Such instabilities may also occur when using a CS(λ) method with a default choice of the parameter, λ = 0.In contrast, we find that the two proposed algorithms in Section 3, based on optimization of defect based error estimate, are able to avoid these instabilities.
The first benchmark problem is given by equation (4.10) with initial and boundary conditions, 3) The solution of this problem is a linear wave travelling in an undisturbed medium with unit We discretize the initial boundary value problem described by (4.10) and (5.3), choosing ∆x = 0.025 and ∆t = 0.12.The graphs in Figure 5.7 show the sequences of parameters obtained from Algorithm 3.1 and Algorithm 3.2.Although the values are different for the first time-steps, for the smaller values of r the two procedures converge to the same value.In Table 5.3 we compare the different choices for the parameter λ.The error in the conservation laws is calculated according to (5.1) and the figures in the table show that these are preserved to machine accuracy by all the methods that use a fixed value of the free parameter.
The two algorithms introduced in this paper give equally accurate solutions.By increasing r, the computation time decreases, while the accuracy in the solution is only marginally affected.Moreover, both new algorithms avoid instabilities that occur when λ = 0.This is shown on the left of Figure 5.8 where, as an example, we show the solution of CS(λ * 4 ) and CS(0) at the final time.
On the right of Figure 5.8, we plot the pointwise errors given by CS(λ * ) and CS(λ * 4 ).The error obtained with λ = λ * 4 is almost entirely located at the interface where the solution is non differentiable.Fixing λ = λ * , the L 2 error is lower and the solution is more accurate around the interface.However, spurious oscillations are seen where the true solution is smooth.
The second benchmark problem is (4.10) with where f+ = max(f, 0).The solution of this problem is the Barenblatt profile, This solution has compact support and is not differentiable at the interface points, which move outward at a finite speed.We solve this problem with ∆x = 0.02 and ∆t = 0.09.
We show in Figure 5.9 that in this case the two proposed algorithms generate sequences of parameters that approach a small negative value for all the considered values of r.       5.4 shows that all the schemes preserve a discrete version of the global invariants.When the value of the parameter is fixed during the iteration, this is a consequence of the preservation of the local conservation laws.When a sequence of different parameters is used, this is instead due to the zero boundary conditions (see Remark 8).Although in this case the conservation laws are not preserved locally, the error in the solution is the lowest.
The figures in Table 5. 4 show that with increasing r the computation time decreases while the accuracy is not significantly affected even for r = 10.This is a consequence of the fact that the sequence of parameters obtained for different values of r quickly approach each other after just a few time-steps (see Figure 5.9).Once again, we find in Figure 5.10 (left) that the sequence of parameters {λ * 4,n } n 0 obtained from Algorithm 3.1 allows us to avoid the numerical instability that occurs under the default choice, λ = 0.
On the right half of Figure 5.10 we show the solution error of CS(λ * 4,n ) n 0 and of CS(λ * ).The solution error obtained using the sequence of parameters given by Algorithm 3.1 is mainly located at the points where the solution is not differentiable.Although λ = λ * minimizes the L 2 norm of the error with respect to any other fixed value of λ, we notice that the error at the interface can be further reduced by changing the parameter at every time-step.Moreover, for λ = λ * , relatively large components of error appear near ±2.5 where the solution is otherwise smooth.These are not visible in the solution errors obtained using either the adaptive sequences given by Algorithm 3.1 or the fixed parameters given by Algorithm 3.2.

Conclusions and future perspectives
In this paper we have proposed two approaches for identifying an optimal method in a parameter dependent family of numerical schemes, based on a minimization of the defect as an estimate of the local error.The first approach uses different (adaptive) values of the parameters at every time-step.In the second approach fixed values of the parameters are derived from a sequence.The latter approach does not compromise parameter depending conservation properties of geometric integrators.
The new algorithms solve an optimization problem at each time-step in order to identify the optimal values of the parameter.In principle, this can increase the computational cost of the original method prohibitively.However, in the large time-step regime, it is possible to solve the optimization problem on coarser spatial grids without compromising the accuracy of the optimal parameters, significantly decreasing the computational time.
The new approaches have been applied to families of schemes for the KdV equation and a nonlinear heat equation that preserve local conservation laws.The proposed numerical tests show that, on one hand, the new strategies effectively identify very accurate methods in each considered family of schemes.On the other hand, introducing a coarse grid for the solution of the optimization problem tremendously improves the efficiency of the new strategies.Overall, the computational time is comparable to that of other schemes in literature, while the accuracy of the proposed approach is much superior.
Since the former is independent of λ, this is a conserved invariant when using either of Algorithm 3.1 and Algorithm 3.2.In general, the latter is conserved only by Algorithm 3.2.However, if the boundary conditions are such that

Table 5 .
1: One-soliton problem for KdV (4.1).Errors in solution and conservation laws, and computation time.

Table 5 .
2: Two-soliton problem for KdV (4.1).Errors in solution and conservation laws, and computation time.

Table 5 .
3: NLH (4.10) with initial and boundary conditions(5.3).Errors in solution and conservation laws and computation time.

Table 5 .
4: NLH (4.10) with initial and boundary conditions(5.4).Errors in solution and conservation laws and computation time.