Adjoint-based exact Hessian computation

We consider a scalar function depending on a numerical solution of an initial value problem, and its second-derivative (Hessian) matrix for the initial value. The need to extract the information of the Hessian or to solve a linear system having the Hessian as a coefficient matrix arises in many research fields such as optimization, Bayesian estimation, and uncertainty quantification. From the perspective of memory efficiency, these tasks often employ a Krylov subspace method that does not need to hold the Hessian matrix explicitly and only requires computing the multiplication of the Hessian and a given vector. One of the ways to obtain an approximation of such Hessian-vector multiplication is to integrate the so-called second-order adjoint system numerically. However, the error in the approximation could be significant even if the numerical integration to the second-order adjoint system is sufficiently accurate. This paper presents a novel algorithm that computes the intended Hessian-vector multiplication exactly and efficiently. For this aim, we give a new concise derivation of the second-order adjoint system and show that the intended multiplication can be computed exactly by applying a particular numerical method to the second-order adjoint system. In the discussion, symplectic partitioned Runge–Kutta methods play an essential role.


Introduction
We consider an initial value problem of a d-dimensional time-dependent vector x driven by an ordinary differential equation (ODE) of the form d dt x(t; θ) = f (x(t; θ )), x(0; θ) = θ, (1.1) where t is time, the function f : R d → R d is assumed to be sufficiently differentiable, and θ is an initial value of x. Such an ODE is often solved numerically. Let x n (θ ) be the numerical solution that approximates the analytic solution at t = t n , i.e., x n (θ ) ≈ x(t n ; θ) = x(nh; θ), where n is an integer and h is a time step size. This study is interested in numerically computing derivatives of a twice differentiable scalar function C : R d → R, which depends on the numerical solution at a certain time t N , e.g., a gradient vector ∇ θ C(x N (θ )) and a second-derivative (Hessian) matrix H θ C(x N (θ )) of the function C with respect to θ . Calculating the gradient ∇ θ C(x N (θ )) is often required to solve an optimization problem min θ C(x N (θ )). (1.2) One simple way of obtaining an approximation to the gradient is to integrate the system (1.1) numerically multiple times for perturbed θ . For example where Δ is a small scalar constant and e i is the i-th column of the d-dimensional identity matrix, can be seen as an approximation to the i-th component of the gradient. However, when the dimensionality d or the number of time steps N is large, this approach becomes computationally expensive, which makes it difficult to obtain a sufficiently accurate approximation. Instead, in various fields such as optimal design in aerodynamics [5], variational data assimilation in meteorology and oceanography [3], inversion problems in seismology [4], and neural network [2], the adjoint method has been used to approximate the gradient: the gradient is approximated by integrating the so-called adjoint system numerically. This approach is more efficient than the simple approach in most cases, but the accuracy of the approximation is still limited when there are highly collected discretization errors. More recently, Sanz-Serna [15] showed that, if x N (θ ) is the solution of a Runge-Kutta method, the gradient ∇ θ C(x N (θ )) can be calculated exactly by solving the adjoint system with a particular choice of Runge-Kutta method: the computed gradient coincides with the exact gradient up to round-off in floating point arithmetic. Given a system of ODEs and Runge-Kutta method applied to the system, the recipe presented in [15] finds the intended Runge-Kutta method for the adjoint system. It is worth noting that, though the computation based on the recipe can be viewed as that using automatic differentiation with backward accumulation, the recipe provides new insights in the context of adjoint methods and is quite useful for practitioners. Hessian matrices also arise in several contexts. For example, if we apply the Newton method to the problem (1.2), a linear system whose coefficient matrix is the Hessian with respect to θ needs to be solved. Further, the information of the inverse of the Hessian is used to quantify the uncertainty for the estimation in the Bayesian context [9,19]. This case also requires solving a linear system whose coefficient matrix is the Hessian to calculate the inverse.
There are, however, several difficulties in solving such a linear system numerically. As is the case with the gradient, the simplest way of obtaining all elements of the Hessian is to integrate the system (1.1) multiple times for perturbed initial value. However, this approach is noticeably expensive, and further may suffer from the discretization error. Therefore, calculating all elements of the Hessian by this simple approach is often computationally prohibitive. If we apply a Krylov subspace method such as the conjugate gradient method or conjugate residual method [14], there is no need to have full entries of the Hessian, and instead, all we need to do is to compute a Hessian-vector multiplication, i.e., (H θ C(x N (θ )))γ for a given vector γ ∈ R d . It was pointed out in [20,21] that, if C is a function of the exact solution to (1.1), the Hessian-vector multiplication (H θ C(x(t N ; θ )))γ can be obtained by solving the so-called second-order adjoint system backwardly. This property indicates that solving the second-order adjoint system numerically gives an approximation to the intended Hessian-vector multiplication (H θ C(x N (θ )))γ . However, when the numerical solutions to the original system (1.1) or second-order adjoint system are not sufficiently accurate, the error between the intended Hessian-vector multiplication (H θ C(x N (θ )))γ and its approximation could be substantial.
In this paper, we extend the aforementioned technique, which was proposed by Sanz-Serna [15] to get the exact gradient, to calculate the exact Hessian-vector multiplication. More precisely, focusing on Runge-Kutta methods and their numerical solutions, we shall propose an algorithm that computes the Hessian-vector multiplication (H θ C(x N (θ )))γ exactly. For this aim, we give a new concise derivation of the second-order adjoint system, which makes it possible to discuss the second-order adjoint system within the framework of the conventional (first-order) adjoint system and to apply the technique [15] to the second-order adjoint system. We show that the intended Hessian-vector multiplication can be calculated by applying a particular choice of Runge-Kutta method to the second-order adjoint system.
In Sect. 2, we give a brief review of adjoint systems and the paper by Sanz-Serna [15]. The main results are shown in Sect. 3, where we present a new concise derivation of the second-order adjoint system and show how to compute the intended Hessian-vector multiplication exactly. Section 4 is devoted to numerical experiments. Concluding remarks are given in Sect. 5.

Preliminaries
In this section, we give a brief review of adjoint systems and the paper by Sanz-Serna [15]. In Sect. 2.1, we focus on the continuous case, where C is a function of the exact solution to (1.1), and explain how the gradient ∇ θ C(x(t N ; θ)) and the Hessianvector multiplication (H θ C(x(t N ; θ )))γ are obtained based on the adjoint system and second-order adjoint system, respectively. In Sect. 2.2, we explain that the gradient ∇ θ C(x N (θ )) can be calculated by solving the adjoint system using a particular choice of Runge-Kutta method.
The second-order adjoint system reads [20,21] where δ(t) is the solution to the variational system (2.1) and λ(t) is the solution to the adjoint system (2.2). In [21], the second-order adjoint system is introduced as the variational system of the adjoint system (2.2). Suppose that the initial state for (2.1) is δ(0) = γ and the final state for (2.2) is λ(t N ) = ∇ x C(x(t N ; θ)). Then, solving the secondorder adjoint system (2.5) with the final state ξ(t N ) = (H x C(x(t N ; θ )))δ(t N ) gives the intended Hessian-vector multiplication at t = 0, i.e., ξ(0) = (H θ C(x(t N ; θ )))γ . We here skip the original proof of [20] and shall explain this property based on a new derivation of the second-order adjoint system in Sect. 3.

Exact gradient calculation
We consider the discrete case, where C is a function of the numerical solution to (1.1) obtained by a Runge-Kutta method. Sanz-Serna [15] showed that the gradient ∇ θ C(x N (θ )) can be computed exactly by applying a particular choice of Runge-Kutta method for the adjoint system (2.2). We briefly review the procedure. Assume that the original system (1.1) is discretized by an s-stage Runge-Kutta method  We discretize the adjoint system (2.2) with another s-stage Runge-Kutta method In the continuous case, the adjoint system gives the gradient ∇ θ C(x(t N ; θ)) due to the property λ(t N ) δ(t N ) = λ(0) δ(0). Therefore, in the discrete case, to obtain the exact gradient ∇ θ C(x N (θ )), the numerical solution to the adjoint system must satisfy λ N δ N = λ 0 δ 0 . In [15], it is proved that if the Runge-Kutta method for the adjoint system is chosen such that the pair of the Runge-Kutta methods for the original system and adjoint system constitutes a symplectic partitioned Runge-Kutta method, the property λ N δ N = λ 0 δ 0 is guaranteed and the gradient ∇ θ C(x N (θ )) is exactly obtained as shown in Theorem 2.1. We note that the symplecticity is a fundamental concept in numerical analysis for ODEs, and symplectic numerical methods are well known in the context of geometric numerical integration. For more details, we refer the reader to [7, Chapter VI] (for symplectic partitioned Runge-Kutta methods, see also [1,17]). Theorem 2.1 [15] Let x 1 (θ ), . . . , x N (θ ) be the approximate solutions (to x(h; θ), . . . , x(N h; θ)) obtained by applying the Runge-Kutta method (2.6) characterized by the coefficients a i j and b i to (1.1). Assume that the coefficients A i j and B i of the Runge-Kutta method for the adjoint system (2.2) satisfy the relation Then, solving the adjoint system (2.2) with λ N = ∇ x C(x N (θ )) by using the Runge-Kutta method (2.7) characterized by A i j and B i gives the exact gradient at n = 0, i.e., λ 0 = ∇ θ C(x N (θ )).
The combination of Runge-Kutta methods for the original and adjoint systems can be seen as a partitioned Runge-Kutta method for the coupled system.

Remark 2.1 The conditions in Theorem 2.1 indicate that
which makes sense only when every weight b i is nonzero. However, for some Runge-Kutta methods such as the Runge-Kutta-Fehlberg method one or more weights b i vanish. For such cases, the above conditions cannot be used to find an appropriate Runge-Kutta method for the adjoint system. We refer the reader to Appendix in [15] for a workaround. For clarity, in this paper, we always assume that every weight b i is nonzero. [15], the overall accuracy of the partitioned Runge-Kutta method for the coupled system may be lower than that of the Runge-Kutta method for (1.1). Such an undesirable property is called the order reduction. We need to take into account the reduction especially when we intend to compute ∇ θ C(x(t N ; θ)) as accurately as possible in the context of, for example, sensitivity analysis (see, for example, [6,11] for the reduction of order conditions).

Remark 2.2 As explained in
We note that the Runge-Kutta method (2.7) for the adjoint system (2.2) is equivalently rewritten as This expression is convenient when the adjoint system is solved backwardly.

Hessian-vector multiplication
Given an arbitrary vector γ , we are interested in calculating the Hessian-vector multiplication (H θ C(x N (θ )))γ exactly.
In Sect. 3.1, we give a new concise derivation of the second-order adjoint system (2.5). The idea of the derivation plays an important role in Sect. 3.2, where we show how to calculate the exact Hessian-vector multiplication (H θ C(x N (θ )))γ .

Concise derivation of the second-order adjoint system
Let us couple the original system (1.1) and the variational system (2.1). This leads to the following system d dt which can be written as by introducing an augmented vector y = [x , δ ] . The adjoint system for (3.1) is given by which can also be written as where φ = [ξ , λ ] . Note that the system (3.2) is the combination of the secondorder adjoint system (2.5) and the adjoint system (2.2). As explained in Sect. 2.1, the Hessian-vector multiplication (H θ C(x(t N ; θ )))γ is obtained by solving the second-order adjoint system (2.5) backwardly. This property was proved in Theorem 2 in [20], but we give another proof building on (3.2). Proposition 3.1 Let x, δ and λ be the solutions to the original system (1.1) with the initial state x(0) = θ , to the variational system (2.1) with the initial state δ(0) = γ and to the adjoint system (2.2) with the final state λ(t N ) = ∇ x C(x(t N ; θ)), respectively. For the solution to the second-order adjoint system for any vector γ . We note that the solution to the variational system (2.1) depends on both θ and γ . Then, building on the discussion in Sect. 2.1 we see that solving the adjoint system (3.2) backwardly with the final states and leads to the Hessian-vector multiplication at t = 0, i.e., as well as the gradient The result of the above discussion lets the second-order adjoint system include within the framework of the first-order adjoint system.

Exact Hessian-vector multiplication
From the discussion in Sects. 2.2 and 3.1, we readily see that the exact Hessian-vector multiplication (H θ C(x N (θ )))γ is obtained by solving the coupled adjoint system (3.2) with a particular choice of Runge-Kutta method.
We omit the proof of this theorem because it proceeds as that for Theorem 2.1 by considering (3.1) and (3.2) instead of (1.1) and (2.2), respectively, and considering the function (3.3) instead of C(x).
As is the case with the gradient computation, an expression equivalent to (3.5): 1, . . . , s) , is convenient when the coupled adjoint system is solved backwardly.

Actual computation procedure
In this subsection, we summarize the actual computational procedure.
In what follows, "RK1" denotes a Runge-Kutta method with coefficients a i j and b i , and "RK2" the Runge-Kutta method with coefficients A i j and B i determined by the conditions in Theorem 3.1. The actual computational procedure for computing (H θ C(x N (θ )))γ is as follows.
Step 1 Integrate (3.1) by using RK1 (see (3.4)) to obtain x 1 (θ ), . . . , x N (θ ) and δ 1 (θ, γ ), . . . , δ N (θ, γ ). The computational cost for the δ-equation (variational system) is usually cheaper than that for the x-equation. For example, when f is nonlinear and RK1 is implicit, we need to solve a nonlinear system for the x-equation in each time step, but we only have to solve a linear system for the δ-equation.
In the above procedure, solving the x-equation (original system) is usually the most computationally expensive part.
When we apply a Newton-type method to solve the optimization problem (1.2), we need to compute a linear system having the Hessian H x C(x N (θ )) as a coefficient matrix in every Newton iteration. A Krylov subspace method is one of the choices for solving such a linear system, and it requires computing Hessian-vector multiplications repeatedly for the same Hessian but different vectors. We note that, when repeating the above procedure, we can skip solving the x-equation, which is the most computationally expensive part, and the λ-equation.
As far as the authors know, there has been no consensus for the standard choice of numerical integrators for the adjoint systems. A simple strategy is to solve the adjoint systems as accurately as possible (by interpolating the internal stages of the x-and δequations if necessary). However, this strategy fails to obtain the exact Hessian-vector multiplication and makes the computation of the adjoint system more expensive. We may conclude that the proposed method is the best choice among others in terms of the exactness of the Hessian-vector multiplication and the computational cost for the adjoint systems.

Numerical verification
This section validates the proposed method through three numerical experiments using (I) the simple pendulum (Sect. 4.1), (II) the one-dimensional Allen-Cahn equation (Sect. 4.2), and (III) a one-dimensional inhomogeneous wave equation (Sect. 4.3). The Allen-Cahn and wave equations are often employed as testbeds in the research fields of data assimilation and inversion problems. The experiment (I) aims at confirming that the proposed method works well through the small-scale problem that enables us to check all of the elements in the Hessian matrix. The performance of the proposed method in the large-scale problems are checked in the experiments (II) and (III), through an initial value problem (the experiment II) and a parameter-field inversion problem (the experiment III).
In all three experiments, we compare results of the proposed method with those of other numerical integrators. Using the same Runge-Kutta method for (3.1), we compare two Runge-Kutta methods for the coupled adjoint system (3.2). One is the method determined by Theorem 3.1, and the other one is selected such that it has the same number of stages and same order of accuracy as the Runge-Kutta method for (3.1).

Simple pendulum
We verify the discussion of Sect. 3.2 by a numerical experiment for the simple pendulum problem d dt which is employed as a toy problem. We discretize this system (4.1) by the explicit Euler method. The function C is defined by C(x) = C(Q, P) = Q 2 + Q P + P 2 + P 4 . The step size is set to h = 0.01. As a reference, we obtain an analytic Hessian H θ C(x 5 (θ ))| θ= [1,1] at N = 5 with the help of symbolic computation. 1 Note that symbolic computation is possible only when N is relatively small. The result is [ We compute ξ 0 using the proposed approach: the coupled adjoint system (3.2) is solved by . For comparison, we solve the coupled adjoint system (3.2), i.e., the pair of the (first-order) adjoint system and second-order adjoint system, by applying the explicit Euler method backwardly (more precisely, the explicit method with the exchanges φ n+1 ↔ φ n , y n+1 ↔ y n and h ↔ −h) This formula is obviously explicit when the coupled adjoint system (3.2) is solved backwardly in time. We then obtain the approximated Hessian 2.234679307434870 0.771449812673337 0.763169390266670 13.09133376424467 , which differs from (4.2) substantially. We also note that this matrix is no longer symmetric.

Allen-Cahn equation
We consider an initial value problem of a time-dependent field variable ψ(t, z) driven by the one-dimensional Allen-Cahn equation under the Neumann boundary condition: ψ z (t, 0) = ψ z (t, 1) = 0. In the following numerical experiments, the coefficients and initial value are set to (α, β, κ) = (10, 0.001, −1) and ψ(0, z) = cos(π z). We discretize the equation in space with d grid points and the grid spacing Δz, i.e., Δz = 1/(d − 1), and apply the second-order central difference to the space second-derivative to obtain a semi-discrete scheme: where Ψ ∈ R d is the discretized ψ, and we used • (m ) to describe a quantity • at z = (m − 1)Δz to simplify the notation. In the following, we solve the semi-discrete scheme and its variational system by the implicit Euler method, that is, we discretize (3.1) by y n+1 = y n + hg(y n+1 ). A cost function considered here is where · 2 denotes the Euclidean norm. The vector θ ∈ R d is an initial condition for the semi-discrete scheme andθ is the discretized ψ(0, z), i.e.,θ (m) = cos(π(m−1)Δz) for m = 1, . . . , d. The proposed method discretizes the coupled adjoint system (3.2) by φ n = φ n+1 + h∇ y g(y n+1 ) φ n . First, we check the extent to which the symmetry is preserved or broken in the Hessian matrices by measuring "degree of asymmetry" defined by for a given matrix M. Here, · max denotes the maximum norm ( M max = max i, j |M i j |). In this subsection, we also use the operator norm · ∞ induced by the vector maximum norm. We observed that τ (H) = 2.344 × 10 −5 for (4. Third, we check what happens when solving a linear system Hv = r with respect to v. We employ the conjugate residual (CR) method. 2 The tolerance is set to 1.0 × 10 −8 for the relative residual measured by the maximum norm. The vector r is set to r = Hv exact , where v exact = (1, 0, . . . , 0) . Figure 1 compares two approaches: "proposed" uses H while "approximation" usesH. It is observed from the top figure that the relative residual monotonically decreases for both approaches, but faster convergence is observed for the proposed approach. Note that CR method still works within double precision despite the symmetry is broken. However, from the bottom figure, which plots the error v − v exact ∞ , a significant error is observed forH even after the CR method itself converges, while the proposed approach reaches the exact solution. In the context of uncertainty quantification [9,19], the information we need is notH −1 but H −1 . In this viewpoint, the calculation usingH is problematic. In particular, the fact that the CR method itself converges could increase the risk of overconfidence. In contrast, the calculation based on the proposed approach seems of importance. Fig. 1 The convergence behavior of the CR method. Both the relative residual and error are measured by the maximum norm As a complementary study, let us investigate why the difference between the solution toHṽ = r computed by the CR method and v exact is so significant despite of the difference between H andH, which is not sufficiently small in double precision but is still much smaller than O(1). We compute the condition number 3 ofH, and the result is cond ∞ (H) = 2.993 × 10 5 (cond ∞ (H) = 2.696 × 10 5 ). Because the condition number is moderate, we suspect that the CR method forHṽ = r actually finds an accurate solution toHṽ = r . In general, for the solutions to Hv = r andHṽ = r , it follows that The right hand side of (4.5) is calculated to be 4.892×10 4 . Since v ∞ = v exact ∞ = 1 in the above problem setting, we see that v −ṽ ∞ could be as big as O(10 4 ), which explains the undesirable property observed in Fig. 1 (bottom figure). We note that the difference between v andṽ could result in the slow convergence in the Newton method (the slow convergence is discussed in Sect. 4.3).

Wave equation
We validate the proposed method through a problem to estimate an inner structure from a wave form of displacement field driven by the one-dimensional inhomogeneous wave equation under a periodic boundary condition, where u (t, z) is a displacement field and the spatial-dependent function w (z) is a "structure field" to be estimated. In the following numerical experiments, the initial conditions of u and its time derivative u t are set to u (0, z) = 16z 2 (L − z) 2 /L 4 and u t (0, z) = 0, and we assume that u (0, z) and u t (0, z) are known but w is not, that is, the structure field w is a unique control variable that determines the time evolution of the wave form of u. Such a problem to estimate the unobservable inner structure field from the acoustic response of objective materials appears ubiquitously in various scientific fields such as seismology [4] and engineering [8,13]. We discretize (4.6) in space via a staggered grid with d grid points and grid spacing Δz, i.e., Δz = L/d, and then we obtain a semi-discrete scheme of (4.6) given by where U ∈ R d , V ∈ R d , and W ∈ R d are the discretized u, u t , and w, respectively, and • (m ) indicates a quantity • at z = (m − 1)Δz. Throughout this subsection, the parameters L and Δz are fixed to L = 64 and Δz = 1 (i.e., d = 64), and the semidiscrete scheme (4.7) and its variational system are solved by the following 2-stage second order explicit Runge-Kutta (Heun) method: y n+1 = y n + h 2 p n,1 + p n,2 , p n,1 = g(Y n,1 ), p n,2 = g(Y n,2 ), Y n,1 = y n , Y n,2 = y n + hp n,1 .
The cost function C considered here is a sum of squared residuals between the time evolution of U depending on W to be estimated and that of "observation" of U : where T obs is a set of the observation times, and U obs is the observation of U obtained by solving (4.7) assuming a "true" structure field w true (z) = 0.5 + 0.25 sin (4π z/L). Note that we use the Heun method with the common step size when calculating U n (W ) and U obs n , meaning that C is exactly zero when W is given by the discretized w true . The set of observation times is fixed to T obs = {t obs | t obs = 0.2 j, 0 ≤ j ≤ 10, j ∈ Z} in the following experiments.
First, as well as the first experiment in Sect. 4.2, we start from checking the symmetry of the Hessian. The Hessian is evaluated at the point where W is given by the discretized w true , i.e., at the global minimum point. Let H andH respectively be the Hessian evaluated by the proposed method and the Hessian obtained by applying the Heun method to the coupled adjoint system (3.2) backwardly (more precisely, the Heun method with the exchanges φ n+1 ↔ φ n , y n+1 ↔ y n and h ↔ −h): (q n,1 +q n,2 ), q n,2 = ∇ y g(y n+1 ) Φ n,2 , (4.8) Although the difference between (4.8) and the scheme of the proposed method φ n = φ n+1 + h 2 (q n,1 +q n,2 ), q n,2 = ∇ y g(Y n,2 ) Φ n,2 , q n,1 = ∇ y g(Y n,1 ) Φ n,1 , Φ n,2 = φ n+1 , Φ n,1 = φ n+1 + hq n,2 , (4.9) is only on the arguments of ∇ y g, the small difference causes a significant difference on the symmetry of the Hessian. Figure 2 shows each degree of asymmetry, τ (H) and τ (H), as a function of step size h used for the time integrators. Obviously the proposed method can suppress the degree of asymmetry whileH cannot reproduce the symmetry when using large h. The degree of asymmetry τ (H) is scaled by h 2 , which results from the accuracy of the Heun method, meanwhile τ (H) does not show such a convergence property but a slow increase proportional to h −1 as h gets smaller mainly due to an accumulation of round-off error. This result demonstrates that the proposed method can calculate an "exact" Hessian up to round-off. Next, we check the speed of convergence in the optimization based on the Newton method. When conducting a Newton-type method to optimize a nonlinear function like C, a linear equation (H + D) ν = −∇C needs to be solved to get a descent direction ν in each iteration step, where D is a method-dependent diagonal matrix [10,12]. Whether a correct descent direction is obtained or not largely depends not only on the symmetry of Hessian, as seen in the third experiment of Sect. 4.2, but also on the error involved in the gradient ∇C, which needs to be calculated by solving an adjoint system. This experiment sets the step size to h = 0.2 for all of the time integrators and employs the Levenberg-Marquardt-regularized Newton (LMRN) method [12] to optimize the cost function C with an initial guess set to W = 0.5, and then measures the performance by the number of computations of the coupled adjoint system (3.2) needed in the optimization of C (We call this "number of backward evaluations" for simplicity). The LMRN method employs a tolerance set to 1.0 × 10 −8 for the residual measured by the maximum norm and the CR method to solve the linear equations with the same tolerance as the one used in Sect. 4.2. The result of the experiment shows that, although the optimum solutions in both cases accord with the true structure field, there is a significant difference between their speeds of convergence to get the solutions. The lines "proposed" and "approximation" in Fig. 3 are the convergence behaviors to attain the optimum solutions, in which (4.9) and (4.8) are respectively used, as a function of the number of backward evaluations. The optimization using H based on the proposed method is more than twice faster than that usingH. We confirmed that the former is robustly faster than the latter by checking the other cases using different

Conclusion
In this paper, we have shown a concise derivation of the second-order adjoint system, and a procedure for computing a matrix-vector multiplication exactly, where the matrix is the Hessian of a function of the numerical solution of an initial value problem with respect to the initial value. The fact that the second-order adjoint system can be reformulated to a part of a large adjoint system is the key point to obtain the exact Hessian-vector multiplication based on the Sanz-Serna scheme.
The proposed method can be used either to obtain the exact Hessian or to solve a linear system having the Hessian as the coefficient matrix based on a Krylov subspace method. Particularly in the latter case, the proposed method can contribute a rapid convergence since the accuracy of Hessian-vector multiplication affects the speed of convergence directly. The importance of calculating the exact Hessian was illustrated for the Allen-Cahn and wave equations.
We plan to test the method to quantify the uncertainty for the estimation of more practical problems. We note that in many applications an ODE system often arises from discretizing a time-dependent partial differential equation (PDE) in space. However, discretizing PDEs in space before taking the adjoint may lead to a very strong nonphysical behavior [16] (this issue has been partially addressed in [18]). Since the proposed method does not care about the spatial discretization, we have to take the spatial discretization into account when testing the proposed method to practical problems. Considering with such spatial discretization method may provide an optimal combination of time and spatial discretizations for given problems. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.