The discrete adjoint method for parameter identification in multibody system dynamics

The adjoint method is an elegant approach for the computation of the gradient of a cost function to identify a set of parameters. An additional set of differential equations has to be solved to compute the adjoint variables, which are further used for the gradient computation. However, the accuracy of the numerical solution of the adjoint differential equation has a great impact on the gradient. Hence, an alternative approach is the discrete adjoint method, where the adjoint differential equations are replaced by algebraic equations. Therefore, a finite difference scheme is constructed for the adjoint system directly from the numerical time integration method. The method provides the exact gradient of the discretized cost function subjected to the discretized equations of motion.


Introduction
In the last few years the complexity of the multibody systems has grown tremendously. In particular, industrial simulations of large systems include a high number of bodies, resulting in a vast number of degrees of freedom. In general, the bodies are linked to the ground or to other bodies by formulating algebraic constraint equations.
In most cases, multibody systems are described in descriptor form, given by a system of differential algebraic equations (DAE) M(q, u)q + C T q (q)λ = Q(q,q, u, t), in which q denotes the generalized coordinates, andq andq its time derivatives, M is the symmetric system mass matrix, and Q represents the vector of generalized and gyroscopic B T. Lauß thomas.lauss@fh-wels.at 1 Faculty of Engineering and Environmental Sciences, University of Applied Sciences Upper Austria, Stelzhamerstrasse 23, 4600 Wels, Austria forces. Due to the algebraic constraints C(q, t) = 0, the equations of motion have to be extended by constraint forces of the form C T q λ, where C q represents the constraint Jacobian matrix, and the vector λ includes the Lagrange multipliers. Moreover, u is a vector of parameters influencing the system behavior. We consider an optimization problem for the multibody system, which can be described in the general form as follows: Find the vector of unknown parameters u such that the cost function given by is minimized, where t f is the final end time. The integrand represents the mean deviation between a system output s(q,q,q, λ) and a given measured signals(t).
Various authors formulate the parameter identification problem as an optimization task. Vyasarayani et al. [21] solve the underlying optimization problem using a combination of the Gauss-Newton and single shooting methods. Therein, homotopy continuation is used to find a global minimum of the cost function. The gradient is computed by the sensitivities, and the Hessian is only of first-order accuracy since second-order terms are neglected.
Serban et al. [19] also realized the parameter identification in multibody systems by minimizing a cost function by the Levenberg-Marquardt method. The derivatives that are required for the optimization are computed through sensitivity analysis. In addition, a local identifiability test is developed in this contribution.
Oberpeilsteiner et al. [15] designed an optimal input by maximizing the information content of the parameters to identify. The required Jacobian matrix are computed with the adjoint method, and the optimization is done with the gradient method. Finally, the optimal input is used for a parameter identification.
Apart from the previous described methods, the adjoint method is already used in a wide range of parameter identification problems in engineering sciences. Especially, in the field of multibody systems, the computation of the gradient of the cost function, as, for example, in (2), is often the bottleneck for computational efficiency, and the adjoint method serves as the most efficient strategy in this case. The basic idea of the adjoint method is the introduction of additional adjoint variables determined by a set of adjoint differential equations from which the gradient can be computed straightforwardly. This main idea directly corresponds to the gradient technique for trajectory optimization pioneered by Bryson and Ho [3]. There are two strategies for this purpose: the equations of motion of the multibody system and adjoint equations may either be separately discretized from their representations as differential-algebraic equations, or, alternatively, the equations of motion of the multibody system may be discretized first, whereas the discrete adjoint equations are derived directly from the discrete multibody system equations; for more details, see [3].
The piecewise adjoint method presented in [18] formulates the coordinate partitioning underlying ordinary differential equations as a boundary value problem, which is solved by multiple shooting methods. The sensitivity analysis for differential-algebraic and partial differential equations using adjoint methods has also been in the focus of the group around Petzold, Cao, Li, and Serban [16]. The adjoint method has been used for sensitivity analysis in multibody systems as well by Eberhard [4], presenting a continuous, hybrid form of automatic differentiation. In [20], the use of the adjoint method for solving dynamical inverse problems is described, but rather academic examples are discussed. A recent paper [11] shows how the adjoint method can be applied efficiently to a multibody system described by differential-algebraic equations of index three. It also presents the structure of the adjoint equations depending on the Jacobian matrices of the system equations. However, the nu-merical solution of the adjoint system presented in [11] raises several questions concerning stability and accuracy with respect to time discretization.
An alternative and more natural approach is the discrete adjoint method (DAM), which constructs a finite difference scheme for the adjoint system directly from the numerical procedure used to solve the equations of motion. The method delivers the exact gradient of the discretized cost function subjected to the discretized equations of motion. Instead of using automatic differentiation techniques [1], the Jacobian matrices can be determined analytically for multibody systems in a very simple structure if redundant generalized coordinates are used to describe the motion of the bodies. In [7] the discrete adjoint method is derived and applied for optimal control problems only, focusing especially on the description of the adjoint method for explicit and implicit solvers for optimal control applications, as well as the interpolation of the gradients and the Hessian (BFGS).
The current paper focuses on parameter identification of multibody systems. The discrete adjoint equations are derived for the computation of the gradient of the cost function using the HHT-solver [6,13] for the solution of the system equations.
The advantage of the presented method is that the cost function may also depend on the accelerations if the discrete adjoint method is used. The reason is that the accelerations are included in the state vector of the HHT-solver. In contrast to the discrete adjoint method, in the continuous approach, the accelerations have to be expressed by the equations of motion, leading to a complex Jacobian matrix [12]. Practically speaking, the new approach allows us to use measured data from acceleration sensors in a straightforward manner as a reference trajectory in the cost function for the parameter identification.

Discrete adjoint method for implicit time integration methods
The discrete adjoint method is an elegant and efficient way to compute the gradient of a cost function. Following [7], the discretized multibody system equations can be rewritten in the form in which u ∈ R m denotes the vector of parameters to identify. The sequence of state vectors x 0 , . . . , x N ∈ R n are given at times t 0 , . . . , t N ∈ R. A discretized version of the cost function in (2) is given by where η i is a weighting factor. Hence, the goal is to find a set of parameters u such that the scalar cost function (4) is minimized. To introduce the adjoint gradient computation, the cost function J is extended by zero terms, representing the system equation (3), and reads where p 1 , . . . , p N denote the adjoint variables. Due to (3),J and J are equal for any choice of the adjoints, and so the corresponding gradients with respect to the parameters to identify are also equal. We will further choose p i such that the gradient computation becomes as easy as possible. Hence, the variation of the cost function is given by with the abbreviations f i+1 = f(x i+1 , x i , t i+1 , t i , u) and s i = s(x i ) and the corresponding Jacobian matrix ∂s i /∂x i , which is the partial derivative of the system output with respect to the states x evaluated at x i . Following the trivial index shift the variation of the cost functionJ can be reformulated in terms of δx i and δu as The adjoint state variables should be defined in such a way that the expressions in the brackets corresponding to the variation of the states vanish, and hence the complicated relations between δx i and δu need not be computed. Note that δx 0 vanishes. If we define the adjoint variables p i by the equations which can be solved successively for p i , starting with the boundary condition for p N and proceeding with i = (N − 1), . . . , 1, then the variation of the cost function is reduced to Hence, the gradient is given by In summary, to find the gradient of the cost function, we can proceed as follows. First, the difference equations (3) have to be solved forward in time with an initial guess for the actual values of the parameters u. Furthermore, the adjoint equations (9) have to be solved backward in time starting from the boundary conditions for p N . In combination with the adjoint variables, the gradient of the cost function with respect to the parameters u can be evaluated from (11). The gradient information can be used in various ways to find a set of parameters that minimizes the cost function. In the simplest form, the parameters are updated by walking in the direction of the negative gradient −∂J /∂u, that is, by setting u new = u − κ(∂J /∂u). If the number κ > 0 is sufficiently small, then the updated set of parameters always reduces the cost function J (steepest-descent method).
It should be noted that the convergence of a quasi-Newton method is much better than the convergence of the steepest-descent method, especially near the optimum. Hence, it is recommended to use a quasi-Newton method where the Hessian is approximated from the gradient information, for example, with the BFGS formula. The Hessian could also be estimated from the first-order sensitivity matrix [19], but as the adjoint method circumvents its computation, this approach seems not appropriate in this context.

Application to the HHT solver
In this section the implicit iteration scheme (3) is specified for the HHT-solver, which is a widely used time integration method in multibody system dynamics. The Jacobian matrices required for the computation of the discrete adjoint variables, and further for the gradient computation, are derived in this section.
A precursor of the HHT-method is the Newmark method [14], which approximates the positions and velocities of a second-order system by the formulas with parameters β and γ . Herein, q i ,q i , andq i denote positions, velocities, and accelerations at t = t i . The formulas (12) are used to discretize the motion equations (1) using the integration step size h = t i −t i−1 . Choosing γ = 1 2 and β = 1 4 , the method is equivalent to the trapezoidal rule, which is implicit and A-stable and leads to a second-order integration formula. The disadvantage of the trapezoidal rule is that no numerical damping is introduced, and so this method is impractical for stiff problems. Hence, the HHT-method was developed as an improvement, which includes numerical damping and preserves its A-stability while achieving second-order accuracy.
The method considers the second-order term in the motion equations (1) at t = t i whereas all other terms are weighted between t i−1 and t i by using the weighting factor α, resulting in As indicated in [13], the parameter α has to be chosen in the interval α ∈ [− 1 3 , 0] to possess the advertised stability. The parameter β and γ of (12) are expressed by α by setting β = (1 − α) 2 /4 and γ = (1 − 2α)/2.
Finally, for constrained systems, the constraint equation satisfied at t = t i is By introducing the state vector Eqs. (12), (13), and (14) have the form (3) and may be solved for x i if x i−1 is known. Rewriting (12), (13), and (14) yields where the equations are summarized in , which equals the right-hand side of the implicit recursive scheme (3). The Jacobian matrices ∂f i ∂x i and ∂f i+1 ∂x i , which are required for the computation of the adjoint variables according to (9), are given further. Note that q i+1 = q i+1 (q i , v i , a i , a i+1 ) due to the Newmark integration formulas (12). First, the partial derivative of the implicit iteration scheme f i with respect to the actual states x i at time t i is calculated and given by with abbreviations It is worth noting here that the lower right block of Jacobian matrix (16) corresponds to the Jacobian matrix (12) in Negrut et al. [13]. In this paper the constraints are scaled by a factor of 1 βh 2 to improve the condition number of the Jacobian matrix. In [5] a formal proof is given, which discusses the nonsingular character of the block Jacobian matrix (16). Furthermore, for the computation of the discrete adjoint variables, the inverse of the transposed matrix (16) is required. Due to the special structure of the Jacobian matrix, the adjoint variables based on the generalized coordinates and on the generalized velocities can be calculated directly by separating the matrix. The discrete adjoint variables based on the generalized acceleration and on the Lagrange multiplier can be calculated by inverting only the right lower part of the matrix (16). Hence, only the condition number of this submatrix is of interest.
In addition, the partial derivatives of the implicit iteration scheme (3) with respect to the states x i are required, which reads with the following partial derivatives of f 3,i+1 with respect to x i : The partial derivatives of f 4,i+1 with respect to x i are In (18) and (19), h = t i+1 − t i is the integration step size. The gradient computation according to (11) requires the following Jacobian matrix:

The discrete adjoints for a simple harmonic oscillator
In this section the discrete adjoint gradient computation is shown on a simple academic example. The reader should get a feel for the proposed method. Let us consider a harmonic one mass oscillator with a linear damping parameter d and a linear stiffness parameter c as a simple example. The goal is to compute the gradient of a cost function of the form (4) with respect to the parameters c and d with the proposed discrete adjoint method. The state vector x i = (q i , v i , a i ) T consists of the position, velocity, and acceleration of the mass. The systems mass matrix is M = m, the force vector is Q i = −c q i − d v i , and the system output is s(x i ) = a i . For this simple example, the constraint equation (15) 4 is not required, and therefore the dimension of the Jacobian is reduced. Moreover, all terms related to the constraints equation C(q i ) = 0 are zero. Hence, the Jacobian matrices (16) and (17) can be simply rewritten as which are constant for linear examples as it is the case here. Inserting both matrices into (9) results in an algebraic system of equations, which must solved successively for i = N − 1, . . . , 1. Starting from p N = 0 and proceeding with the discrete adjoint variables p i can be computed. Here,ā(t i ) denotes the measured accelerations at time t i . For simplicity, we set α = 0 and use an equidistant step size η i = h, which results in the explicit iteration scheme Finally, the gradient can be computed as The state variables q i , v i , and a i that are required for the gradient computation must be obtained from a forward simulation of the system in advance. The respective finite difference scheme is not shown here, but can easily derived by specializing (15).

Example: engine mount
As an illustrative example, we consider an engine mount, which is installed in every commercial car. The main purpose of this component is, on the one hand, the damping of lowfrequency oscillations and, on the other hand, the isolation of high-frequency vibrations from the chassis produced by the engine. Hence, the frequency selective damping is essential for this component. Further information on hydro-mounts can be found in the work by Amelunxen [2], and its functional principle is shown in Fig. 1.
In the development of combustion engines the behavior of the engine mount is of particular interest. Hence, a sufficient accurate mathematical model is important for valid simulation results. In this contribution an analogous model, shown in Fig. 2, is used. Note that in the literature other mathematical formulations can be found (see [2,9,17]). The goal of the present example is to identify a set of parameters such that a measurable output matches a desired trajectory. First of all, we introduce the vector of degrees of freedom q = (x 1 , x 2 , x 3 , x 4 ) T and a vector of parameters to identify u = (c E1 , c E2 , d E , d H 2 ) T . In Table 1 the parameters of the model (see Fig. 2) are summarized, and optimal and initial values for the four parameters to identify are given.
The system equations can be formulated in the form (1). The mass matrix is a constant diagonal matrix given by M = diag(m L , 0, m M , m H ) with a zero mass corresponding to the second degree of freedom x 2 . The external force vector results in with the given time-dependent force which is shown in Fig. 3. The function represents an exponential frequency sweep with parameters ω 0 = 4 π and C = 25 and the amplitude A = 100 N. The elastomer force consists of a nonlinear spring and a linear damper, whereas the membrane force F M (x 3 ,ẋ 3 ) = c M x 3 + d Mẋ3 is described by a linear spring and a linear damper. Finally, the hydro-damper force is given by The algebraic constraint equation links the redundant generalized coordinates.
The cost function to be minimized is the quadratic deviation of the system output and a measured trajectorys(t), which is often called an RMS-error (root-mean-square error). The desired trajectorys(t) originates from a simulation of the system equations with the final (optimal) set of parameter u opt . Instead of using virtual test data, a measured trajectory from a test-bench can be used here. In the present example the system output s(q) is the accelerationẍ 1 . On the one hand, acceleration sensors are cheaper than other measure equipments, and, on the other hand, the discrete adjoint method in combination with the HHT-solver makes a straightforward implementation possible. In Fig. 5 the comparison between the desired trajectorys(t) and the system output s(q) is plotted over the time, in  which s(q) results from a simulation with the initial parameter. The large deviation between the initial and the optimal solution results from the incorrect values of the parameters to identify. The reduction of the cost function during the optimization is shown in Fig. 4. After approximately 60 iterations, the value of the cost function is about 10 −18 , which means that Fig. 7 Convergence of the parameter to identify the deviation of the system output and the desired trajectory is almost zero. In Fig. 6 the RMS-error e(t) = s(q) −s(t) is shown during the optimization. It can be seen that the error is reduced significantly with increasing number of iterations.
The convergence of the four parameters over the iterations is shown in Fig. 7. It can be seen that the linear stiffness parameter c E1 converges very fast, whereas the cubic damping parameter d H 2 converges slower.

Conclusion
In this paper, we show a new approach for the computation of the gradient of a cost function associated with a dynamical system for a parameter identification problem. We present the discrete adjoint method for an implicit discretization scheme and the required Jacobian matrices in detail for the HHT-solver as a representative of a widely used implicit solver in multibody dynamics. Note that the discrete adjoint system depends on the integration scheme of the system equations.
The presented method has two main advantages in comparison with the traditional adjoint method in the continuous case (see, e.g., [11,20]): First, no separate solver is required to solve the adjoint differential algebraic system backward in time. The computation of the adjoint variables depends only on the recursive iteration scheme used to solve the system equations. Hence, only a system of algebraic equations has to be solved successively. The second advantage is that in combination with the HHT-solver, the cost function may also depend on the accelerations, if the discrete adjoint method is used. The reason is that the accelerations are included in the state vector of the solver method. Hence, the Jacobian matrices that are necessary for the discrete adjoint computations remain similar to the Jacobian matrices that are required for the HHT-solver. Otherwise, in the continuous case, the accelerations are not included in the state vector, but have to be expressed by the motion equations in the cost function, which lead to complex Jacobian matrices [12]. The straightforward and efficient considerations of the acceleration in the cost function have the advantage that the measured signals from acceleration sensors can be used directly for parameter identification in practice. Due to the simple use and low price of acceleration sensors, this strategy is a promising approach in the field of parameter identification.
The theory described in this paper is a powerful tool for parameter identification in time domain. In most cases the results lead to a best-fit solution, which means that high-frequency components with low amplitudes are not considered. However, the discrete adjoint method can also be used to identify parameters influencing the system at special frequencies. Hence, the basic idea is to compute the Fourier coefficients for the relevant oscillations and include the amplitude spectrum in the cost function. In [8] the parameters of a torsional vibration damper of a four-cylinder combustion engine are identified in combination with adjoint Fourier coefficient and the discrete adjoint method.