1 Introduction

There are two alternative ways to handle an optimal control problem numerically. The so-called indirect methods first derive the necessary conditions for optimality in the continuous-time setting by applying Pontryagin’s maximum principle and then discretizing the resulting equations. In contrast, direct methods first discretize the continuous problem, turning it into a finite dimensional one, and then apply a discrete version of Pontryagin’s maximum principle. In both cases, one is led to the augmentation of the original objective with the different constraints enforced by Lagrange multipliers. The Lagrange multipliers enforcing the plant (the dynamic equations) of the problem are commonly called adjoint or co-state variables. In the multibody systems literature it is common to refer to this as the adjoint method, and in particular, the discrete adjoint method when considered as a direct method. In this contribution, we apply the discrete adjoint method to optimal control problems with variational integrators approximating the dynamics.

In general, for direct approaches, the discretization of the ODE governing the dynamics results in a specific discretization of the adjoint variables especially for symplectic methods as e.g. variational integrators [13]. Variational and thus symplectic numerical methods are worthy of consideration as they can benefit the solution of boundary value problems [4]. For the optimal control of constrained ODEs, discretizations with conservation properties are of interest as well [5, 6].

The optimal control of mechanical PDEs, such as string and beam dynamics, is an active field of research [79]. The discrete adjoint method has been used for the optimization of flexible multibody systems [10] as well as for parameter identification in rigid body dynamics [11, 12]. The discrete adjoint method for variational integrators with holonomic constraints is discussed in [13]. The discrete adjoint method is derived for a specific discretization of dynamics, and this matches the chosen integrator. Therefore, it suggests itself to be applied to integrators that are structure preserving [14].

In this work we briefly summarize variational integrators and then show how to derive the discrete adjoint equations for this class of integrators. The basic principles, the derivation of boundary conditions, and the discretization of forces are explained. The discrete adjoint method is then extended to variational integrators for holonomically constrained ODEs. The convergence behavior of both methods is investigated with the example of a mathematical pendulum. Finally, the method is applied to the constrained PDE case of geometrically exact beam dynamics.

2 Discrete adjoint method for variational integrators

2.1 Variational integrators

This section illustrates the derivation of the equations of motion for forced systems via variational principles in the continuous and discrete setting [2, 15]. These equations have to be fulfilled as constraints for the optimal control problem.

Consider a Lagrangian mechanical system whose configuration space is the \(n\)-dimensional smooth manifold \(Q\). The motion of our system is represented by a curve \(q: [0, T] \to \mathcal{Q}\), \(t \mapsto q(t)\). We denote the velocity of the configuration at time \(t\) by \(\dot{q}(t) \in \mathcal{T}_{q(t)}\mathcal{Q}\). The Lagrangian is a function defined on the tangent bundle of \(\mathcal{Q}\), \(\mathcal{TQ}\), \(\mathcal{L}: \mathcal{TQ} \to \mathbb{R}\). It usually represents the difference of kinetic and potential energy. An external Lagrangian control force is a map \(f_{\mathcal{L}}: \mathcal{TQ} \times \mathcal{U} \to \mathcal{T}^{*} \mathcal{Q}\) where \(\mathcal{U} \subseteq \mathbb{R}^{l}\), \(l \leq n\), is the space of admissible controls. A control is thus a curve \(u: [0,T] \to \mathcal{U}\). The total virtual work of such a system vanishes

$$ \delta \int _{0}^{T} \mathcal{L}\big(q(t), \dot{q}(t) \big)~dt + \int _{0}^{T} f_{\mathcal{L}} \big(q(t), \dot{q}(t), u(t)\big) \, \delta q(t) ~dt =0,\quad \forall \delta q(t). $$
(1)

This is the Lagranged’Alembert principle (with controls), which states that the total virtual work evaluated over a physical trajectory of the system \(q\) (and a control \(u\)) vanishes for all variations \(\delta q(t)\) with fixed end-points \(\delta q(0) = \delta q(T) = 0\). This leads to the equations of motion, the forced EulerLagrange equations:

$$ -\frac{d}{dt} \frac{\partial \mathcal{L}(q, \dot{q})}{\partial \dot{q}} + \frac{\partial \mathcal{L}(q, \dot{q})}{\partial q} + f_{\mathcal{L}}(q, \dot{q}, u) = 0. $$
(2)

This principle is an extension of Hamilton’s principle to include nonconservative forces such as control or dissipative forces. A forced variational integrator is derived via the approximation of the action and the virtual work of nonconservative forces and subsequent variation in the discrete setting [1517]. The time interval \([0,T]\) is discretized by \(N\) time nodes, we consider a discrete configuration path \(\{ q_{n} \}_{n=0}^{N}\) with \(q_{n} \approx q(t_{n})\) with linear approximation of \(q(t)\) in \([t_{n}, t_{n+1}]\). The approximation of the action integral via the discrete Lagrangian \(L_{d}\) and the approximation of the virtual work of nonconservative forces via the left and right side discrete forces \(f_{d}^{-}\) and \(f_{d}^{+}\) is considered. The input variable is approximated as \(u_{n} \approx u(t_{n})\). In each time interval \([t_{n}, t_{n+1}]\), the control path \(u_{d} = \{ u_{n}\}_{n=0}^{N-1}\) is approximated constant.

$$\begin{aligned} \int _{t_{n}}^{t_{n+1}} \mathcal{L} (q(t),\dot{q}(t)) ~dt &\approx L_{d}(q_{n}, q_{n+1}) , \end{aligned}$$
(3)
$$\begin{aligned} \int _{t_{n}}^{t_{n+1}} f_{\mathcal{L}} (q(t),\dot{q}(t),u(t)) ~\delta q(t) ~dt &\approx f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) ~\delta q_{n} \\ &+ f_{d}^{+}(q_{n}, q_{n+1}, u_{n}) ~\delta q_{n+1}. \end{aligned}$$
(4)

The discrete total virtual work vanishes:

$$ \sum _{n=0}^{N-1} \left [ \delta L_{d}(q_{n}, q_{n+1}) + f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) ~\delta q_{n} + f_{d}^{+}(q_{n}, q_{n+1}, u_{n}) ~ \delta q_{n+1} \right ]=0, \; \forall \delta q_{n} $$
(5)

with \(\delta q_{0} = \delta q_{N} = 0\). The discrete Lagranged’Alembert principle leads to the discrete, forced EulerLagrange equations, which are derived via discrete variation and subsequent rearrangement of terms for fixed boundary conditions. The slot derivatives \(D_{k}\) denote derivatives with respect to the \(k\)th argument.

$$ D_{1} L_{d}(q_{n}, q_{n+1}) + D_{2} L_{d}(q_{n-1}, q_{n}) + f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) + f_{d}^{+}(q_{n-1}, q_{n}, u_{n-1}) = 0 $$
(6)

for \(n=1,~\ldots,~N-1\). This equation takes two positions at the current and the previous time node and defines the relation with the next one. Given \(q_{n-1}\), \(q_{n}\), \(u_{n-1}\), and \(u_{n}\), this equation determines a unique \(q_{n+1}\) provided the discrete Lagrangian is regular, i.e., the matrix \(D_{1} D_{2} L_{d} = D_{2} D_{1} L_{d}\) is regular.

The initial conditions are usually defined on \(\mathcal{TQ}\) as position and velocity or on \(\mathcal{T}^{*} \mathcal{Q}\) as position and momentum, but not on \(\mathcal{Q} \times \mathcal{Q}\) as two positions at different points in time. To initialize this time stepping scheme, both a continuous and a discrete version of the Legendre transformation are needed.

The continuous Legendre transformation \(\mathbb{F}\mathcal{L}: \mathcal{TQ} \to \mathcal{T^{*}Q}\), \((q,\dot{q}) \mapsto ( q, p = D_{2} L(q,\dot{q}) )\) connects the Lagrangian and the Hamiltonian formulations of dynamics. It allows us to compute an initial momentum \(p^{0}\) from an initial configuration and velocity \((q^{0}, \dot{q}^{0})\). In the discrete setting, the (forced) discrete Legendre transformation defines two distinct maps from the discrete state space to the cotangent bundle, \(\mathbb{F}^{\pm}L_{d}: \mathcal{Q} \times \mathcal{Q} \times \mathcal{U} \to \mathcal{T}^{*}\mathcal{Q}\), defined by

$$\begin{aligned} \mathbb{F}^{-} L_{d}: (q_{n}, q_{n+1}, u_{n}) \mapsto &(q_{n}, p_{n}^{-}) \end{aligned}$$
(7a)
$$\begin{aligned} &= \Big(q_{n}, -D_{1} L_{d}(q_{n}, q_{n+1}) - f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) \Big), \\ \mathbb{F}^{+} L_{d}: (q_{n}, q_{n+1}, u_{n}) \mapsto &(q_{n+1}, p_{n+1}^{+}) \\ &= \Big(q_{n+1}, D_{2} L_{d}(q_{n}, q_{n+1}) + f_{d}^{+}(q_{n}, q_{n+1}, u_{n})\Big), \end{aligned}$$
(7b)

with the left and right side discrete momenta \(p_{n}^{-}\) and \(p_{n}^{+}\). These allow us to interpret the discrete EulerLagrange equations (6) as a matching of momenta \(p_{n}^{-} = p_{n}^{+}\) for \(n=1,~\ldots,~N-1\).

To initialize the algorithm, given a configuration \(q^{0}\), a velocity \(\dot{q}^{0}\), and an initial control \(u_{0}\), the relation

$$ D_{2} L(q^{0}, \dot{q}^{0}) = p^{0} = -D_{1} L_{d} (q^{0}, q_{1}) -f_{d}^{-}(q^{0}, q_{1}, u_{0}) $$
(8)

determines \(q_{1}\).

2.2 Derivation of the discrete adjoint method for variational integrators

Similar to the discrete variational principle in Sect. 2.1, now the discrete adjoint method for variational integrators in (6) is derived via a discrete variational principle, and the structure and the resulting numerical method for the adjoint equations are illustrated.

Here, we concentrate on a discrete objective \(J_{d}\) containing a quadratic Mayer term:

$$ J_{M}(q_{N},p_{N}) = \frac{1}{2} (q_{N}-q^{N})^{T} S_{q} (q_{N}-q^{N}) + \frac{1}{2} (p_{N}-p^{N})^{T} S_{p} (p_{N}-p^{N}), $$
(9)

where \(S_{q}\) and \(S_{p}\) are positive semidefinite matrices. The Mayer term is used to relax the enforcement of the end state conditions \((q^{N}, p^{N})\), introducing weights for the reaching of the configuration and the momentum at the last time step \(N\).

The discrete adjoint method is derived by augmenting the objective with the variational integrator (6) and (8) as constraints and by taking variations of the augmented objective [2].

The resulting nonlinear constrained optimization problem reads

$$ \underset{q_{d}, u_{d}}{\min}\, J_{d}(q_{d}, u_{d}) = J_{M}(q_{N},p_{N}) + \sum _{n=0}^{N-1} \frac{1}{2} u_{n}^{T} R u_{n} $$
(10a)

subject to:

$$\begin{aligned} \hphantom{XXXXX} q_{0} &= q^{0}, \end{aligned}$$
(10b)
$$\begin{aligned} p^{0} &= - D_{1} L_{d}(q^{0}, q_{1}) - f_{d}^{-}(q^{0}, q_{1}, u_{0}), \end{aligned}$$
(10c)
$$\begin{aligned} 0 &= D_{1} L_{d}(q_{n}, q_{n+1}) + f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) \end{aligned}$$
(10d)
$$\begin{aligned} &+ D_{2} L_{d}(q_{n-1}, q_{n}) + f_{d}^{+}(q_{n-1}, q_{n}, u_{n-1}), \quad \text{for}~n=1,~\ldots,~N-1, \\ p_{N} &= D_{2} L_{d}(q_{N-1}, q_{N}) + f_{d}^{+}(q_{N-1}, q_{N}, u_{N-1}). \end{aligned}$$
(10e)

The quantities \(p^{0}\) and \(q^{0}\) are prescribed initial conditions at the initial time node. The objective also includes a Lagrange term, which is quadratic in the control and \(R\) is a positive-definite weight matrix. Equation (10e) defining \(p_{N}\) corresponds to the discrete Legendre transformation \(\mathbb{F}^{+}L_{d}(q_{N-1},q_{N},u_{N-1})\).

Remark 1

The dependence on \(q_{N-1}\) and \(q_{N}\) of the momentum term (10e) of the Mayer term makes it more prone to producing larger contributions than the configuration term. This can make the optimization process unstable and possibly not convergent. To improve this, an iterative approach may be used where the end momentum of the \((i)\)th iteration \(p_{N}^{(i)}\) is used to inform the choice of a modified desired end momentum \(\tilde{p}^{N}\) such that

$$ \Vert p_{N}^{(i)}-\tilde{p}^{N}(p_{N}^{(i)},p^{N}) \Vert \leq \Vert p_{N}^{(i)}- p^{N} \Vert $$

with \(\tilde{p}^{N}(p^{N},p^{N}) = p^{N}\). The procedure can be initialized by considering a first iteration with \(S_{p} = 0\) and ended once \(\Vert p_{N}^{(i)}- p^{N} \Vert \) is sufficiently small to allow us to substitute \(\tilde{p}^{N}\) by \(p^{N}\) in a final iteration.

The objective \(J_{d}\) is augmented to \(\tilde{J}_{d}\) by the initial conditions and the discrete EulerLagrange equations via adjoint variables \(\lambda _{n} \approx \lambda (t_{n})\) with the discrete adjoint path \(\lambda _{d} = \{\lambda _{n}\}_{n=0}^{N-1}\). The indices are chosen such that \(\lambda _{n}\) pairs with the corresponding momenta \(p^{\pm}_{n}\).

$$\begin{aligned} \tilde{J}_{d}(q_{d}, u_{d}, \lambda _{d}) &= J_{M} (q_{N},p_{N}(q_{N-1},q_{N},u_{N-1})) + \sum _{n=0}^{N-1} \frac{1}{2} u_{n}^{T} R u_{n} \\ &+ \lambda _{0}^{T} \Big[ p^{0} + D_{1} L_{d}(q^{0}, q_{1}) + f_{d}^{-}(q^{0}, q_{1}, u_{0}) \Big] \\ &+ \sum _{n=1}^{N-1} \lambda _{n}^{T} \Big[ D_{1} L_{d}(q_{n}, q_{n+1}) + D_{2} L_{d}(q_{n-1}, q_{n}) \\ &+ f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) + f_{d}^{+}(q_{n-1}, q_{n}, u_{n-1}) \Big] . \end{aligned}$$
(11)

The discrete variation of the augmented objective \(\delta \tilde{J}_{d} = 0\) has to vanish for variations \(\delta u_{n}\), \(\delta \lambda _{n}\), and \(\delta q_{n}\) with boundary conditions \(\delta q_{0} = 0\), that is, directly enforced as \(q_{0} = q^{0}\) at the initial time node is specified in problem (10a)–(10e). The variation of the three types of variables leads to three sets of equations. The variation w.r.t the adjoint variables leads to the discrete EulerLagrange equations, the constraints in (10a)–(10e). The variation with respect to the configuration variable yields the adjoint equations, reading with rearrangement of terms as follows:

λ N 1 T [ D 2 D 1 L d ( q N 1 , q N )+ D 2 f d ( q N 1 , q N , u N 1 )]
(12a)
= S q ( q N q N ) S p [ p N ( q N 1 , q N , u N 1 ) p N ] × [ D 2 D 2 L d ( q N 1 , q N ) + D 2 f d + ( q N 1 , q N , u N 1 ) ] , λ N 2 T [ D 2 D 1 L d ( q N 2 , q N 1 ) + D 2 f d ( q N 2 , q N 1 , u N 2 ) ] + λ N 1 T [ D 2 D 2 L d ( q N 2 , q N 1 ) + D 2 f d + ( q N 2 , q N 1 , u N 2 ) + D 1 D 1 L d ( q N 1 , q N ) + D 1 f d ( q N 1 , q N , u N 1 ) ] = S p [ p N ( q N 1 , q N , u N 1 ) p N ] × [ D 1 D 2 L d ( q N 1 , q N ) + D 1 f d + ( q N 1 , q N , u N 1 ) ] ,
(12b)
0 = λ n 1 T [ D 2 D 1 L d ( q n 1 , q n ) + D 2 f d ( q n 1 , q n , u n 1 ) ] + λ n T [ D 2 D 2 L d ( q n 1 , q n ) + D 2 f d + ( q n 1 , q n , u n 1 ) + D 1 D 1 L d ( q n , q n + 1 ) + D 1 f d ( q n , q n + 1 , u n ) ] + λ n + 1 T [ D 1 D 2 L d ( q n , q n + 1 ) + D 1 f d + ( q n , q n + 1 , u n ) ] for n = N 2 , , 1 .
(12c)

The discrete variational principle directly provides the boundary conditions (12a) and (12b) for the two last adjoint variables, as no boundary conditions for the state variables are prescribed at these time nodes. The variation w.r.t. the input \(u_{n}\) yields the optimality conditions. Note that the last equation is different:

$$\begin{aligned} 0&= R u_{n} + \lambda ^{T}_{n} D_{3} f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) \end{aligned}$$
(13a)
$$\begin{aligned} &+ \lambda ^{T}_{n+1} D_{3} f_{d}^{+}(q_{n}, q_{n+1}, u_{n}),\qquad \text{for}~n=0,~\ldots,~N-2, \\ & \\ 0&= R u_{N-1} + \lambda ^{T}_{N-1} D_{3} f_{d}^{-}(q_{N-1}, q_{N}, u_{N-1}) \\ & + S_{p} \left [ \, p_{N}(q_{N-1},q_{N},u_{N-1}) - p^{N} \right ] D_{3} f_{d}^{+}(q_{N-1}, q_{N}, u_{N-1}). \end{aligned}$$
(13b)

The discrete EulerLagrange equations (6) can be solved forward in time and the adjoint equations (12a)–(12c) backward in time sequentially given the configuration path to determine the discrete adjoint variables as a shooting method while using the input equations (13a)–(13b) to update the input. Such a direct shooting algorithm directly uses the equations derived above and thus is simple to implement. However, an appropriately small time step \(h\) is necessary for stable integration in both directions in time. The discrete optimization problem with respect to \(q_{d}\), \(u_{d}\), and \(\lambda _{d}\) can also be solved by applying an interior point algorithm [18] or sequential quadratic programming [19]. In those, the variational integrator is used as equality constraints for the optimization as in (10a)–(10e).

2.3 Application of the discrete adjoint method to a mathematical pendulum

Let us consider a mathematical pendulum as depicted in Fig. 1, in minimal coordinates \(q=\varphi \) with the Lagrangian \(\mathcal{L}(\varphi , \dot{\varphi}) = \frac{1}{2} m l^{2} \dot{\varphi}^{2} - m g l \cos (\varphi )\) that is actuated by a torque \(f=u\). The discrete Lagrangian approximated with the midpoint rule is \(L_{d}(\varphi _{n}, \varphi _{n+1}) = \frac{1}{2} h m l^{2} ( \frac{\varphi _{n+1} - \varphi _{n}}{h})^{2} - h m g l \cos ( \frac{\varphi _{n+1} + \varphi _{n}}{2})\) with the time step \(h\). The discrete forces are \(f_{d}^{\pm}(\varphi _{n}, \varphi _{n+1},u_{n}) = \frac{1}{2} h u_{n}\). The desired end configuration in this problem is \(q^{N} = \varphi ^{N} = \pi \). The end momentum has to vanish \(p^{N} = 0\). The first slot derivatives of the discrete Lagrangian used for the discrete EulerLagrange equations are as follows:

$$\begin{aligned} D_{1} L_{d}(\varphi _{n}, \varphi _{n+1}) &= - m l^{2} \frac{\varphi _{n+1}-\varphi _{n}}{h} + \frac{h}{2} \, m g l \, \sin \left (\frac{\varphi _{n+1}+\varphi _{n}}{2}\right ), \end{aligned}$$
(14)
$$\begin{aligned} D_{2} L_{d}(\varphi _{n-1}, \varphi _{n}) &= \hphantom{-} m l^{2} \frac{\varphi _{n}-\varphi _{n-1}}{h} + \frac{h}{2} \, m g l \, \sin \left (\frac{\varphi _{n}+\varphi _{n-1}}{2}\right ). \end{aligned}$$
(15)

The time stepping scheme (6) for the configuration is

$$ \begin{aligned} 0=\frac{\varphi _{n+1} - 2 \varphi _{n} + \varphi _{n-1}}{h} &- \frac{h}{2} \frac{g}{l} \, \text{sin}\left ( \frac{\varphi _{n+1} + \varphi _{n}}{2}\right ) \\ & - \frac{h}{2} \frac{g}{l} \, \text{sin}\left ( \frac{\varphi _{n} + \varphi _{n-1}}{2}\right ) - h \frac{u_{n} + u_{n-1}}{2}. \end{aligned} $$
(16)

It is initialized with

$$ 0 = p^{0} - m l^{2} \frac{\varphi _{1}-\varphi _{0}}{h} + \frac{h}{2} \, m g l \, \text{sin}\left (\frac{\varphi _{1}+\varphi _{0}}{2} \right ) + h \frac{u_{0}}{2}. $$
(17)

The second derivatives of the discrete Lagrangian inserted in the adjoint equations (12c) leads to

$$ \begin{aligned} 0= & \frac{\lambda _{n-1}^{T} - 2 \lambda _{n}^{T} + \lambda _{n+1}^{T}}{h} \\ &- \frac{\lambda _{n}^{T} + \lambda _{n-1}^{T}}{2} \, \frac{h}{2} \frac{g}{l} \, \cos \left (\frac{\varphi _{n} + \varphi _{n-1}}{2} \right ) \\ &- \frac{\lambda _{n+1}^{T} +\lambda _{n}^{T}}{2} \, \frac{h}{2} \frac{g}{l} \, \cos \left (\frac{\varphi _{n+1} + \varphi _{n}}{2} \right ). \end{aligned} $$
(18)

Two equations according to (12a) and (12b) are necessary to initialize the backward integration (18) in time:

$$\begin{aligned} 0 &= \lambda _{N-1}^{T} \left [ - \frac{m l^{2}}{h} + \frac{h}{4} \, m g \, \cos \left (\frac{\varphi _{N}+\varphi _{N-1}}{2}\right ) \right ] + S_{q} (\varphi _{N} - \pi ) \\ &+ S_{p} \left [ m l^{2} \frac{\varphi _{N}-\varphi _{N-1}}{h} + \frac{h}{2} \, m g l \, \sin \left ( \frac{\varphi _{N}+\varphi _{N-1}}{2}\right ) \right ] \\ &\times \left [ \frac{m l^{2}}{h} + \frac{h}{4} \, m g \, \cos \left ( \frac{\varphi _{N}+\varphi _{N-1}}{2}\right )\right ], \end{aligned}$$
(19a)
$$\begin{aligned} 0 &= \frac{2\lambda ^{T}_{N-1} - \lambda ^{T}_{N-2}}{h} \\ &+ \frac{\lambda _{N-1}^{T} + \lambda _{N-2}^{T}}{2} \, \frac{h}{2} \, \frac{g}{l} \cos \left (\frac{\varphi _{N-1}+\varphi _{N-2}}{2} \right ) + \lambda _{N-1}^{T} \, \frac{h}{4} \frac{g}{l} \, \cos \left (\frac{\varphi _{N}+\varphi _{N-1}}{2}\right ) \\ &+ S_{p} \left [ m l^{2} \frac{\varphi _{N}-\varphi _{N-1}}{h} + \frac{h}{2} \, m g l \, \sin \left ( \frac{\varphi _{N}+\varphi _{N-1}}{2}\right ) \right ] \\ & \times \left [\frac{m l^{2}}{h} + \frac{h}{4} \, m g \, \cos \left (\frac{\varphi _{N}+\varphi _{N-1}}{2}\right )\right ]. \end{aligned}$$
(19b)

The equations for the input are as follows:

$$\begin{aligned} 0&=R h \, u_{n} + h \frac{\lambda ^{T}_{n} + \lambda ^{T}_{n+1}}{2},\quad \text{for}~n=0,~\ldots,~N-2 , \end{aligned}$$
(20a)
$$\begin{aligned} 0&=R h \, u_{N-1} + \frac{h}{2} \lambda ^{T}_{N-1} \\ & + \, \frac{h}{2} S_{p} \left [ m l^{2} \frac{\varphi _{N}-\varphi _{N-1}}{h} + \frac{h}{2} \, m g l \, \sin \left ( \frac{\varphi _{N}+\varphi _{N-1}}{2}\right ) \right ]. \end{aligned}$$
(20b)

The convergence of the configuration \(q_{d}\) and the adjoint variables \(\lambda _{d}\) is illustrated in Figs. 2(a) and 2(b), respectively. A simulation time of \(T=2\) and a constant input of \(u^{n}=1\) for \(n=0,~\ldots,~N-1\) is used; the pendulum has a length of \(L=1\) with a gravitational constant of \(g=9.81\). The mass of the pendulum is \(m=1\). For the input weight \(R=10^{-5} h\) is used. The weights in the Mayer term are \(S_{q} = 10^{3}\) and \(S_{p} = 10^{-2}\). These values were chosen to obtain solutions that achieve the upswing of the pendulum to the upper equilibrium point, with minimal effort. Larger values for the input weighting lead to solutions with end configuration at the lower equilibrium of the pendulum. The absolute error in these plots is computed using the infinity norm of the difference of the variables and a reference solution \((q_{\mathrm{ref}}, \lambda _{\mathrm{ref}})\), which is a simulation with a fine discretization of \(h=10^{-5}\), \(\left \Vert q_{d} - q_{\mathrm{ref}} \right \Vert _{ \infty}\), and \(\left \Vert \lambda _{d} - \lambda _{\mathrm{ref}} \right \Vert _{\infty}\), respectively. The convergence rate for the configuration and adjoint variables is equal, we observe second order convergence. This is in accordance with the theoretical results in [2]. These convergence results are derived for the forward integration of the time stepping scheme (16) and the subsequent backwards solution of (18) using the configuration variables calculated with the same time step width.

Fig. 1
figure 1

Torque-controlled mathematical pendulum

Fig. 2
figure 2

Error of the configuration \(q_{d}\) and adjoint variable \(\lambda _{d}\)

The optimized motion of the pendulum is depicted in Figs. 3(a), 3(b), 3(c), and 3(d). The momentum \(p\) and the kinetic energy \(T\) are close to zero at the end of the simulation with the optimized input acting on the pendulum.

Fig. 3
figure 3figure 3

Optimization results for the pendulum, using the discrete adjoint method and single shooting

Remark 2

Pontryagin’s maximum principle leads to necessary conditions for optimality in the continuous-time setting. The resulting adjoint equations are \(\ddot{\lambda}^{T} - g/l \lambda \cos \varphi = 0\) and the control equations are \(R u + \lambda = 0\). It can be checked that the discrete equations (18) and (20a) are the corresponding discrete versions of these equations when discretized using a midpoint rule. The discrete boundary conditions (19a)–(19b) and (20b), however, are not so easy to relate to their continuous counterparts. We plan to address this very issue in a future publication.

3 Discrete adjoint method for variational integration of constrained dynamics

3.1 Variational integration of constrained dynamics

The derivation of variational integrators for constrained systems that use null space projection and nodal reparametrization [20] is shortly summarized in the following section, using similar steps as in Sect. 2. The discrete adjoint method for such systems is derived thereafter similar to Sect. 2.2.

Up until now, we have worked in local coordinates directly on the configuration manifold \(\mathcal{Q}\). However, it can be advantageous to consider \(\mathcal{Q}\) an ambient (vector) space parametrized by redundant coordinates and constrain the motion by constraints. Given a scleronomic, holonomic constraint function \(g: \mathcal{Q} \to \mathbb{R}^{m}\), the constraint submanifold is then

$$ \mathcal{M} := \{q \in \mathcal{Q} ~|~ g(q)=0\}. $$
(21)

We assume that the Jacobian \(\frac{\partial g}{\partial q}\) has full rank \(m\), so the dimension of the constraint manifold is \(n-m\), the number of degrees of freedom of the mechanical system. We also assume consistent initial conditions \((q^{0},\dot{q}^{0})\) that fulfill the constraints on configuration level \(g(q^{0})=0\) as well as on velocity level \(\frac{d}{dt} g(q^{0}) = \frac{\partial g(q^{0})}{\partial q} \dot{q}^{0}=0\).

A Lagrange multiplier \(\nu \) is used to enforce the constraint by appending the term \(- g(q)^{T} \nu \) to the Lagrangian in the action integral. Thus, the Lagranged’Alembert principle in this setting reads

$$ 0 = \delta \int _{0}^{T} \left [ \mathcal{L} \big(q(t), \dot{q}(t) \big) - g(q)^{T} \nu \right ] ~dt + \int _{0}^{T} f_{\mathcal{L}} \big(q(t), \dot{q}(t), u(t) \big) \delta q~ dt,\quad \forall \delta q, \delta \nu $$
(22)

with \(\delta q(0) = \delta q(T) = 0\). The constraint part of the action integral is approximated with the trapezoidal rule:

$$ \int _{t_{n}}^{t_{n+1}} g(q(t))^{T} \nu (t) ~dt \approx \frac{1}{2} \big[ g_{d}(q_{n}) \nu _{n} + g_{d}(q_{n+1}) \nu _{n+1} \big] $$
(23)

with \(g_{d}(q_{n}) = h g(q_{n})\). Including this in the discrete variational principle in (5), in the constrained case, the discrete variational principle, the variation of the discrete action sum with the variations \(\delta q_{n}\) and \(\delta \nu _{n}\) and \(\delta q_{0} = \delta q_{N} = 0\) with subsequent rearrangement of terms leads to the discrete, constrained Euler-Lagrange equations

$$\begin{aligned} 0 &= D_{1} L_{d}(q_{n}, q_{n+1}) + D_{2} L_{d}(q_{n-1}, q_{n}) + \frac{\partial g_{d}(q_{n})}{\partial q_{n}}^{T} \nu _{n} \end{aligned}$$
(24a)
$$\begin{aligned} &+ f_{d}^{-}(q_{n}, q_{n+1}, u_{n}) + f_{d}^{-}(q_{n-1}, q_{n}, u_{n-1}), \\ 0 &= g(q_{n+1}) \end{aligned}$$
(24b)

of dimension \(n+m\). To reduce the dimension of (24a) from \(n\) to \(n-m\) and eliminate the Lagrange multipliers, thus avoiding conditioning problems related to these, a discrete null space matrix \(P(q_{n}) \in \mathbb{R}^{n \times (n-m)}\), with columns spanning the tangent space \(T_{q_{n}}\mathcal{M}\), that only depends on quantities at the current step can be applied such that the constraint forces are eliminated. Further, a nodal reparametrization \(q_{n+1} = F_{d}(q_{n}, v_{n+1})\) with \(v_{n+1} \in \mathcal{V} \subseteq \mathbb{R}^{n-m}\) is then used to eliminate the constraints as \(g(F_{d}(q_{n}, v_{n+1})) = 0\), \(\forall v_{n+1} \in \mathcal{V}\), for \(n=0,~\ldots,~N-1\). Together with the null space matrix, the reparametrization \(F_{d} : \mathcal{V} \times \mathcal{Q} \to \mathcal{M}\) leads to the integration scheme

$$\begin{aligned} P^{T}(q_{n}) &\big[ D_{1} L_{d}\big(q_{n}, F_{d}(q_{n}, v_{n+1}) \big) + D_{2} L_{d}\big(q_{n-1}, q_{n} \big) \\ &+ f_{d}^{-}\big(q_{n}, F_{d}(q_{n}, v_{n+1}), u_{n} \big) \\ &+ f_{d}^{+} \big(q_{n-1}, q_{n}, u_{n-1} \big) \big] =0,\quad \text{for}~n=1,~\ldots,~N-1 \end{aligned}$$
(25)

that has to be iteratively solved for \(v_{n+1}\) in each time step, given \(q_{n-1}\), \(q_{n}\), \(u_{n-1}\) and \(u_{n}\).

The redundant control forces \(f(q,u) = B^{T}(q) \tau (u) \in \mathbb{R}^{n}\) depend on the generalized control forces \(\tau (u) \in \mathbb{R}^{n - m}\) and the input transformation matrix \(B^{T}(q) \in \mathbb{R}^{n \times (n-m)}\) that must be chosen such that the consistency with the constraints and consistency of momentum maps are ensured [6]. The discrete approximations of the redundant forces

$$\begin{aligned} f_{d}^{-}(q_{n},q_{n+1},u_{n}) &= \frac{h}{2} B^{T}(q_{n}) \tau (u_{n}), \\ f_{d}^{+}(q_{n},q_{n+1},u_{n}) &= \frac{h}{2} B^{T}(q_{n+1}) \tau (u_{n}) \end{aligned}$$

capture the effect of the generalized forces acting on the time \([t_{n}, t_{n+1}]\). We have assumed that \(u\) is approximated constant in each time interval.

3.2 Derivation of the discrete adjoint method for variational integration of constrained dynamics

The constrained setting with null space projection and reparametrization for a mechanical system leads to implicit equations of minimal dimension. The discrete adjoint method applied to such a system leads to adjoint variables of minimal dimension \(n-m\). It also involves the null space projection for the adjoint equations.

The starting point is a problem such as in equation (10a)–(10e), but now constrained by the discrete EulerLagrange equations for the constrained system with null space projection and nodal reparametrization (25) as in [6]. Similar to the procedure outlined in Sect. 2.2, the objective is augmented with the discrete EulerLagrange equations. As these equations are defined on ℳ using the nodal reparametrization \(F_{d}(q_{n}, v_{n+1})\), adjoint variables of the same dimension as \(v_{n+1}\) are necessary.

An objective \(J_{d}\) consisting of a Mayer term and an integral term quadratic in the control, similar to the discrete adjoint method for systems without constraints in equation (10a)–(10e) is considered:

$$ J_{d} = \frac{1}{2} (q_{N} - q^{N})^{T} S_{q} (q_{N} - q^{N}) + \sum _{n=0}^{N-1} \frac{1}{2} u_{n}^{T} R u_{n}. $$
(26)

However, to simplify matters, the Mayer term of the momentum has been omitted since it can be handled similarly as in the unconstrained case. The variation of the objective \(\delta J_{d} = 0\) with respect to all variables \(\delta q_{n}\), \(\delta \lambda _{n}\), \(\delta u_{n}\), and \(\delta v_{n+1}\) at all time steps has to vanish. The variation of the redundant configuration \(\delta q_{n}\) with respect to the minimal coordinate \(\delta v_{n}\) reads

$$ \delta q_{n} = D_{2} F_{d}(q_{n-1}, v_{n}) ~\delta v_{n}. $$
(27)

The Jacobian matrix \(\frac{\partial F_{d}}{\partial v_{n}}\) is a null space matrix [21]. After applying this relation, the adjoint equations become

$$\begin{aligned} &\lambda _{N-1}^{T} P^{T}(q_{N-1}) \left [ D_{2} D_{1} L_{d}(q_{N-1}, F_{d}(q_{N-1}, v_{N})) \right ] \end{aligned}$$
(28a)
$$\begin{aligned} &= - S_{q} (q_{N} - q^{N}) ~D_{2} F_{d}(q_{N-1}, v_{N}), \\ \\ &\lambda _{N-2}^{T} P^{T}(q_{N-2}) \Big[ D_{2} D_{1} L_{d}(q_{N-2}, F_{d}(q_{N-2}, v_{N-1})) \Big] \\ &= - \left \lbrace \lambda _{N-1}^{T} D_{1} P^{T}(q_{N-1}) \Big[ D_{1} L_{d}(q_{N-1}, F_{d}(q_{N-1}, v_{N}))+D_{2} L_{d}(q_{N-2}, q_{N-1}) \right . \\ &+ f_{d}^{-} (q_{N-1}, F_{d}(q_{N-1}, v_{N}), u_{N-1}) + f_{d}^{+}(q_{N-2}, q_{N-1}, u_{N-2}) \Big] \\ &+ \left . \lambda _{N-1}^{T} P^{T}(q_{N-1}) \Big[ D_{1} D_{1} L_{d}(q_{N-1}, F_{d}(q_{N-1}, v_{N})) + D_{2} D_{2} L_{d}(q_{N-2}, q_{N-1}) \Big] \right \rbrace \\ &\times D_{2} F_{d}(q_{N-2}, v_{N-1}), \end{aligned}$$
(28b)
$$\begin{aligned} &\lambda _{n-1}^{T} P^{T}(q_{n-1}) \Big[ D_{2} D_{1} L_{d}(q_{n-1}, F_{d}(q_{n-1}, v_{n})) \Big] \\ &= - \left \lbrace \lambda _{n}^{T} D_{1} P^{T}(q_{n}) \Big[ D_{1} L_{d}(q_{n}, F_{d}(q_{n}, v_{n+1})) + D_{2} L_{d}(q_{n-1}, q_{n}) \right . \\ &+ f_{d}^{-}(q_{n}, F_{d}(q_{n}, v_{n+1}), u_{n}) + f_{d}^{+}(q_{n-1}, q_{n}, u_{n-1}) \Big] \\ &+ \lambda _{n}^{T} P^{T}(q_{n}) \Big[ D_{1} D_{1} L_{d}(q_{n}, F_{d}(q_{n}, v_{n+1}))+ D_{2} D_{2} L_{d}(q_{n-1}, q_{n}) \Big] \\ &+\left .\lambda _{n+1}^{T} P^{T}(q_{n+1}) D_{1} D_{2} L_{d}(q_{n}, q_{n+1}) \vphantom{\sum} \right \rbrace D_{2} F_{d}(q_{n-1}, v_{n})\,~\text{for}~n=N-2,~\ldots,~1. \end{aligned}$$
(28c)

The variations with respect to the input variables vanish if

$$\begin{aligned} 0 &= R u_{n} + \lambda _{n}^{T} P^{T}(q_{n}) D_{3} f_{d}^{-} (q_{n}, F_{d}(q_{n}, v_{n+1}), u_{n}) \end{aligned}$$
(29a)
$$\begin{aligned} &+ \lambda _{n+1}^{T} P^{T}(q_{n+1})D_{3} f_{d}^{+} (q_{n}, F_{d}(q_{n}, v_{n+1}), u_{n})~\text{for}~n=1,~\ldots,~N-2, \\ & \\ 0 &= R u_{N-1} + \lambda _{N-1}^{T} P^{T}(q_{N-1}) D_{3} f_{d}^{-}(q_{N-1}, F_{d}(q_{N-1}, v_{N}), u_{N-1}) \end{aligned}$$
(29b)

hold. The evaluation of these equations can be used to update the input variables in a shooting method.

3.3 Discrete adjoint method for a mathematical pendulum described as constrained system

The mathematical pendulum is described as a constrained system in the ambient space \(\mathcal{Q} = \mathbb{R}^{2}\) with redundant coordinates \(q=[x \quad y]^{T}\) and the constraint equation \(g(q)=1/2(x^{2} + y^{2} - l^{2} )\). The null space matrix is \(P(q_{n})^{T} = [-y_{n} \quad x_{n}]\), the input transformation matrix is \(B(q_{n})^{T} = [\frac{-y_{n}}{2l^{2}} \quad \frac{x_{n}}{2l^{2}}]\), the generalized force is \(\tau (u) = u\), and the nodal reparametrization reads

$$ q_{n+1} = F_{d}(q_{n}, v_{n+1}) = \begin{bmatrix} \cos (v_{n+1}) & -\sin (v_{n+1}) \\ \sin (v_{n+1}) & \cos (v_{n+1}) \end{bmatrix} q_{n}. $$
(30)

The input variable can be interpreted as the physical torque and the variable \(v\) as the incremental angle. The Figs. 4(a) and 4(b) show the convergence results for the pendulum in the constrained case. The adjoint variables are of minimum dimension \((n-m)\) just as the configuration variables. The error is calculated in the same way as for the unconstrained case in Sect. 2.3 as infinity norm of the difference to the reference trajectory using the same parameters. These errors are determined with solutions obtained via forward time stepping for the configuration and backward time stepping for the adjoint variables with fixed input. It can be observed in the figures that also in the constrained case the convergence rate is quadratic. However, note that the theoretical results in [2] only consider the case in minimal coordinates and not the constrained case.

Fig. 4
figure 4

Error of the configuration \(q_{d}\) and adjoint variable \(\lambda _{d}\)

The optimized motion of the pendulum is depicted in Figs. 5(a), 5(b), 5(c), and 5(d). The input \(u\) and the kinetic energy \(T\) are close to zero at the end of the simulation with the optimized input acting on the pendulum. The end configuration is weighted with \(S_{q}=10^{3}\), the end momentum weight is \(S_{p}=10^{-2}\). The weight for the input is \(R=10^{-5} h\). This low weight for the input is chosen to reach the upper equilibrium position of the pendulum. It reduces the input from a constant initial guess of 1 as well as regularizing the optimization problem.

Fig. 5
figure 5figure 5

Optimization results for the pendulum in the constraint setting, using the discrete adjoint method and single shooting

The results are similar to those obtained previously by the pendulum in minimal coordinates. Small differences in the solution are visible but show a similar optimized result.

4 Discrete adjoint method for geometrically exact beam dynamics

In this section, the discrete adjoint method is applied to an optimal control problem involving dynamics of a geometrically exact beam being approximated via the multisymplectic integrator found in [22].

4.1 Geometrically exact beam model

The geometrically exact beam [26] models a rod-like deformable body as a curve \(x(t,s) \in \mathbb{R}^{3}\), with a rigid cross section attached to each of its points. Here, \(t \in [0,T]\) is used again to parametrize time, while \(s \in [0, \ell ]\) parametrizes the longitudinal position along the curve. The orientation of the cross section at \(s\) is described by a rotation \(R(t,s) \in SO(3)\). When considered as a collection of columns \(R(t,s) = [d_{1}(t,s), d_{2}(t,s), d_{3}(t,s)]\), the triad of vectors are known as the directors of the cross section (see Fig. 6).

Fig. 6
figure 6

Configuration of a geometrically exact beam

This can be considered as a Lagrangian field theory with configuration space \(\mathcal{Q} = \mathbb{R}^{3} \times SO(3)\). This space is diffeomorphic to the group of special Euclidean transformations in 3D, \(SE(3)\), to which it differs only in terms of group structure. In [22, 27], the authors claim it to be numerically more advantageous to consider this latter space.

If \(g(t,s) = (R(t,s), x(t,s)) \in SE(3)\) denotes the configuration of a cross section, its derivatives with respect to \(t\) and \(s\) are related to velocities and strains respectively. More specifically,

$$ \begin{aligned}{3} (\Omega , V) &= g^{-1} \dot{g} = (R^{-1} \dot{R}, R^{-1} \dot{u}) & \; \text{body angular and linear time derivatives}, \\ (K, W) &= g^{-1} g^{\prime }= (R^{-1} R^{\prime}, R^{-1} x^{\prime}) & \; \text{body angular and linear space derivatives}, \end{aligned} $$

where we have used \(\dot{X} = \frac{\partial X}{\partial t}\) and \({X}^{\prime} = \frac{\partial X}{\partial s}\) and “body” is meant to signify “in the reference frame of the section itself”. Considering a reference configuration \(g_{\mathrm{ref}}(s) \in SE(3)\), we also define the strains

$$ (\Lambda , \Gamma ) = (K - K_{\mathrm{ref}}, W - W_{\mathrm{ref}}). $$

The simple case of a straight initial configuration along the \(e_{1}\) axis, \(g_{\mathrm{ref}}(s) = (I, s e_{1})\), where \(I\) is the identity matrix, leads to \(\Lambda = K\) and \(\Gamma = W - e_{1}\). One can see that \(\Lambda \) measures the curvature (bending and torsion) and \(\Gamma \) measures the difference between \(d_{1}\) and \(x^{\prime}\) (elongation and shear).

Considering a hyperelastic material model with moderate strains, the Lagrangian density of the system can be written as

$$ \mathcal{L}(g,\dot{g},g^{\prime}) = \frac{1}{2}\left ( \Omega ^{T} \mathbb{J} \Omega + \rho A V^{T} V - \Lambda ^{T} \mathbb{C}_{1} \Lambda - \Gamma ^{T} \mathbb{C}_{2} \Gamma \right ) - U_{ \mathrm{ext}}(R, x), $$

where \(\rho > 0\) is the linear density of the beam, \(U_{\mathrm{ext}}: SE(3) \to \mathbb{R}\) is an external potential function, and \(\mathbb{J} = \rho \,\text{diag}([J_{1}, J_{2}, J_{3}])\) is the matrix of moments of inertia of the sections in the body frame. Assuming uniform cross sections and directors \(d_{2}\) and \(d_{3}\) coincident with the principal moments of area \(I_{2}\) and \(I_{3}\), one gets that \(J_{1} = \rho \,(I_{2} + I_{3})\), \(J_{2} = \rho I_{2}\) and \(J_{3} = \rho I_{3}\), and \(\mathbb{C}_{1} = \text{diag}([G (I_{2} + I_{3}), E I_{2}, E I_{3}])\), \(\mathbb{C}_{2} = \text{diag}([E A, \kappa _{2} G A, \kappa _{3} G A])\), which are the matrices representing the corresponding stiffness parameters of the sections. \(\kappa _{2}\) and \(\kappa _{3}\) are possible shear correction factors.

4.2 Unit dual quaternion formulation

Working on \(SE(3)\) is difficult. In [22] the authors propose the use of a constrained approach where the space of dual quaternions \(\widetilde{\mathbb{H}}\), which is a vector space, is considered as ambient manifold and the unit dual quaternions \(\widetilde{\mathbb{H}}_{1}\) as constraint submanifold since it is well known that this latter space provides a double covering of \(SE(3)\).

The space of dual quaternions is defined by

$$ \widetilde{\mathbb{H}} := \left \lbrace \tilde{q} = q_{r} + q_{d} \boldsymbol{\epsilon} \, \vert \, q_{r}, q_{d} \in \mathbb{H}, \boldsymbol{\epsilon}^{2} = 0 \right \rbrace , $$

where \(\mathbf{\epsilon}\) is the so-called dual unit and

$$ \mathbb{H} := \left \lbrace q = q_{0} + q_{1} \boldsymbol{i} + q_{2} \boldsymbol{j} + q_{3} \boldsymbol{k} \, \vert \, q_{i} \in \mathbb{R}, \boldsymbol{i}^{2} = \boldsymbol{j}^{2} = \boldsymbol{k}^{2} = \boldsymbol{i j k} = -1 \right \rbrace $$

is the space of quaternions. Both of these are vector spaces, so working with them is quite simple.

Similar to complex numbers, a conjugation operation is defined on the space of quaternions, namely, if \(p = p_{0} + p_{1} \boldsymbol{i} + p_{2} \boldsymbol{j} + p_{3} \boldsymbol{k}\), then \(\bar{p} = p_{0} - q_{1} \boldsymbol{i} - q_{2} \boldsymbol{j} - q_{3} \boldsymbol{k}\), and this operation is inherited by dual quaternions. This defines a norm on ℍ, \(\Vert p \Vert = \sqrt{\bar{p} p}\), and lets us write the inverse of \(p\) as \(p^{-1} = \bar{p}/\Vert p \Vert ^{2}\). This also defines a seminorm on \(\widetilde{\mathbb{H}}\) by \(\Vert \tilde{p} \Vert = \sqrt{\bar{\tilde{p}} \tilde{p}} = \Vert p_{r} \Vert + \frac{p_{r}^{T} p_{\epsilon}}{\Vert p_{r} \Vert} = \sqrt{p_{r}^{T} p_{r}} + \frac{p_{r}^{T} p_{\epsilon}}{\sqrt{p_{r}^{T} p_{r}}}\), where in the last equality we consider the quaternions \(q_{r}\), \(q_{\epsilon}\) as vectors in \(\mathbb{R}^{4}\). The set of unit quaternions and unit dual quaternions are thus \(\mathbb{H}_{1} := \left \lbrace q \in \mathbb{H} \, \vert \, \Vert q \Vert = 1 \right \rbrace \) and \(\widetilde{\mathbb{H}}_{1} := \left \lbrace \tilde{q} \in \widetilde{\mathbb{H}} \, \vert \, \Vert \tilde{q} \Vert = 1 \right \rbrace \) respectively. More explicitly, the latter implies

$$\begin{aligned} q_{0,r}^{2} + q_{1,r}^{2} + q_{2,r}^{2} + q_{3,r}^{2} &= 1 , \end{aligned}$$
(31a)
$$\begin{aligned} q_{0,r} q_{0,\epsilon} + q_{1,r} q_{1,\epsilon} + q_{2,r} q_{2, \epsilon} + q_{3,r} q_{3,\epsilon} &= 0. \end{aligned}$$
(31b)

As stated before, an element \(\tilde{q} \in \widetilde{\mathbb{H}}_{1}\) can be put into correspondence with an element of \(SE(3)\). In particular, we can parametrize \(\tilde{q}\) by a rotation angle \(\theta \) and two purely imaginary quaternions \(n\), \(x\), i.e., \(n_{0} = x_{0} = 0\), with \(\Vert n \Vert = 1\), representing a rotation axis and a three-dimensional translation respectively. This way \(q = \cos (\theta /2) + n \sin (\theta /2)\) and \(\tilde{q} = q + \frac{1}{2} x q \boldsymbol{\epsilon}\). If \(\tilde{q}(t,s) \in \widetilde{\mathbb{H}}_{1}\), then

$$ \widetilde{\Omega} := 2 \bar{\tilde{q}}\dot{\tilde{q}} = \Omega + V \boldsymbol{\epsilon}\,, \qquad \widetilde{K} := 2 \bar{\tilde{q}} \tilde{q}^{\prime} = K + W \boldsymbol{\epsilon}. $$

One can thus define an ambient Lagrangian in the dual quaternions

$$ \widetilde{\mathcal{L}}\left (\tilde{q},\dot{\tilde{q}},\tilde{q}^{ \prime}\right ) = 2 \widetilde{M}\left (\bar{\tilde{q}} \dot{\tilde{q}}, \bar{\tilde{q}}\dot{\tilde{q}}\right ) - 2 \widetilde{C}\left ( \bar{\tilde{q}}\tilde{q}^{\prime} - \bar{\tilde{q}}_{\mathrm{ref}}\tilde{q}^{\prime}_{\mathrm{ref}}, \bar{\tilde{q}}\tilde{q}^{\prime} - \bar{\tilde{q}}_{\mathrm{ref}}\tilde{q}^{\prime}_{\mathrm{ref}}\right) - \widetilde{U}(\tilde{q}), $$
(32)

where \(\widetilde{M}(\tilde{q},\tilde{p}) = q_{r}^{T} \widetilde{\mathbb{J}} p_{r} + q_{\epsilon}^{T} \tilde{\rho} p_{ \epsilon}\), \(\widetilde{C}(\tilde{q},\tilde{p}) = q_{r}^{T} \widetilde{\mathbb{C}}_{1} p_{r} + q_{\epsilon}^{T} \widetilde{\mathbb{C}}_{2} p_{\epsilon}\), with \(\widetilde{\mathbb{J}} = \text{diag}([\alpha _{1},\mathbb{J}])\), \(\tilde{\rho} = \text{diag}([\alpha _{2},\rho A I])\), \(\widetilde{\mathbb{C}}_{1} = \text{diag}([\alpha _{3},\mathbb{C}_{1}])\), and \(\widetilde{\mathbb{C}}_{2} = \text{diag}([\alpha _{4},\mathbb{C}_{2}])\), and \(\alpha _{i} \in \mathbb{R}\). These \(\alpha \) can be chosen arbitrarily as they play no role in the dynamics once the unity constraints (31a)–(31b) are enforced.

4.3 Discrete Lagrangian

To discretize the beam, the spacetime \([0,T] \times [0, \ell ]\) is discretized into a regular grid (see Fig. 7) with constant space and time steps \(\Delta s\) and \(\Delta t\) respectively.

Fig. 7
figure 7

Spacetime grid for the multisymplectic variational integrator with relative indices marked in red

We discretize the ambient Lagrangian density (32) applying the trapezoidal rule in both space and time

$$\begin{aligned} &\widetilde{L}_{d}(\tilde{q}_{a}^{n}, \tilde{q}_{a+1}^{n}, \tilde{q}_{a}^{n+1}, \tilde{q}_{a+1}^{n+1}) = \frac{1}{4 \Delta t \Delta s} \\ &\quad \times \left [ \widetilde{\mathcal{L}}\left ( \tilde{q}_{a}^{n}, \frac{\tilde{q}_{a}^{n+1} - \tilde{q}_{a}^{n}}{\Delta t}, \frac{\tilde{q}_{a+1}^{n} - \tilde{q}_{a}^{n}}{\Delta s}\right ) + \widetilde{\mathcal{L}}\left ( \tilde{q}_{a+1}^{n}, \frac{\tilde{q}_{a+1}^{n+1} - \tilde{q}_{a+1}^{n}}{\Delta t}, \frac{\tilde{q}_{a+1}^{n} - \tilde{q}_{a}^{n}}{\Delta s}\right ) \right . \\ &\quad \left . + \,\widetilde{\mathcal{L}}\left ( \tilde{q}_{a}^{n+1}, \frac{\tilde{q}_{a}^{n+1} - \tilde{q}_{a}^{n}}{\Delta t}, \frac{\tilde{q}_{a+1}^{n+1} - \tilde{q}_{a}^{n+1}}{\Delta s}\right ) + \widetilde{\mathcal{L}}\left ( \tilde{q}_{a+1}^{n+1}, \frac{\tilde{q}_{a+1}^{n+1} - \tilde{q}_{a+1}^{n}}{\Delta t}, \frac{\tilde{q}_{a+1}^{n+1} - \tilde{q}_{a}^{n+1}}{\Delta s}\right ) \right ] \end{aligned}$$

and introduce the notation \((\widetilde{L}_{d})_{a}^{n} := \widetilde{L}_{d}(\tilde{q}_{a}^{n}, \tilde{q}_{a+1}^{n}, \tilde{q}_{a}^{n+1}, \tilde{q}_{a+1}^{n+1})\) to simplify the formulas.

As derived in [22], the discrete constrained EulerLagrange field equations are derived via a discrete variational principle in space and time and subsequent rearrangement of terms in space index \(a\) and time index \(n\). As shown there, a natural choice of null space matrix is

$$ \widetilde{P}(\tilde{q}) = \left [ \textstyle\begin{array}{c@{\quad}c} P(q_{r}) & 0 \\ P(q_{\epsilon}) & P(q_{r}) \end{array}\displaystyle \right ]\,, \qquad P(q) = \frac{1}{2} \left [ \textstyle\begin{array}{r@{\quad}r@{\quad}r} -q_{1} & -q_{2} & -q_{3} \\ q_{0} & -q_{3} & q_{2} \\ q_{3} & q_{0} & -q_{1} \\ -q_{2} & q_{1} & q_{0} \end{array}\displaystyle \right ]\,. $$

The forced version of these equations results from the application of the discrete Lagranged’Alembert principle, similar to (5),

$$ \sum _{a} \sum _{n} \left [ \delta (\widetilde{L}_{d})_{a}^{n} + (f_{d}^{1})_{a}^{n} \delta \tilde{q}_{a}^{n} + (f_{d}^{2})_{a}^{n} \delta \tilde{q}_{a+1}^{n} + (f_{d}^{3})_{a}^{n} \delta \tilde{q}_{a}^{n+1} + (f_{d}^{4})_{a}^{n} \delta \tilde{q}_{a+1}^{n+1} \right ] = 0 $$

with \((f_{d}^{i})_{a}^{n} := f_{d}^{i}(\tilde{q}_{a}^{n}, \tilde{q}_{a+1}^{n}, \tilde{q}_{a}^{n+1}, \tilde{q}_{a+1}^{n+1},u_{a}^{n})\) denoting all external and control forces, and \(i = 1,\ldots,4\) coinciding with the corresponding relative node on which they are applied, as in Fig. 7. This leads to DEL with a force contribution from each adjacent spacetime rectangle sharing the node under consideration:

$$ \begin{aligned} \widetilde{P}(\tilde{q}_{a}^{n})^{T} \,\Big[ D_{1} (\widetilde{L}_{d})_{a}^{n} + D_{2} (\widetilde{L}_{d})_{a-1}^{n} + D_{3} (\widetilde{L}_{d})_{a}^{n-1} + D_{4} (\widetilde{L}_{d})_{a-1}^{n-1} & \\ + (f_{d}^{1})_{a}^{n} + (f_{d}^{2})_{a-1}^{n} + (f_{d}^{3})_{a}^{n-1} + (f_{d}^{4})_{a-1}^{n-1} \Big] &= 0. \end{aligned} $$
(33)

Suitable boundary conditions in space and time as well at the spacetime corners are directly derived via the discrete variational principle.

KelvinVoigt type viscous damping is included as external forces that are proportional to the discrete approximation of the strain rate [23] with bulk viscosity \(\zeta \) and shear viscosity \(\eta \). In the moderate strain regime these result in a damping matrix \(\widetilde{\mathbb{D}} = \mathrm{diag}([0,\eta (I_{2} + I_{3}), \chi I_{2}, \chi I_{3}, 0, \chi A, \eta A, \eta A])\), where \(\chi = \zeta (3 - E/G)^{2} + \eta (E/G)^{2}/3\) is the extensional viscosity. The corresponding discrete force is

$$\begin{aligned} \mathrm{L}_{(\tilde{q}^{n}_{a})}^{T} {(f^{\mathrm{KV, 1}}_{d})^{n}_{a}} = \mathrm{L}_{(\tilde{q}^{n+1}_{a})}^{T} {(f^{\mathrm{KV, 3}}_{d})^{n}_{a}} &= \frac{\Delta t \Delta s}{4}\, \widetilde{\mathbb{D}}\left ( \frac{\widetilde{K}^{n+1}_{a} - \widetilde{K}^{n}_{a}}{\Delta t} \right ), \\ \mathrm{L}_{(\tilde{q}^{n}_{a+1})}^{T} {(f^{\mathrm{KV, 2}}_{d})^{n}_{a}} = \mathrm{L}_{(\tilde{q}^{n+1}_{a+1})}^{T} {(f^{\mathrm{KV, 4}}_{d})^{n}_{a+1}} &= \frac{\Delta t \Delta s\,}{4} \widetilde{\mathbb{D}}\left ( \frac{\widetilde{K}^{n+1}_{a+1} - \widetilde{K}^{n}_{a+1}}{\Delta t} \right ), \end{aligned}$$

where by \(\mathrm{L}_{\tilde{q}}^{T}\) we denote the transposed of the dual quaternion left multiplication operation by \(\tilde{q}\), and \(\widetilde{K}^{n}_{a} = \widetilde{K}(\tilde{q}^{n}_{a},(\tilde{q}')^{n}_{a})\) with \((\tilde{q}')^{n}_{a} = (\tilde{q}')^{n}_{a+1} = (\tilde{q}^{n}_{a+1} - \tilde{q}^{n}_{a})/\Delta s\) and \((\tilde{q}')^{n+1}_{a} = (\tilde{q}')^{n+1}_{a+1} = (\tilde{q}^{n+1}_{a+1} - \tilde{q}^{n+1}_{a})/\Delta s\). Figure 8 shows the position of the tip of a cantilever beam with fixed-free boundary conditions that is initially straight under gravity. The strain-rate proportional damping leads to reduced high frequency oscillations.

Fig. 8
figure 8

Viscous damping of a cantilever beam

4.4 The discrete adjoint method in spacetime

The discrete adjoint method for the geometrically exact beam considers the configuration variables as well as the adjoint variables in space and time to derive the discrete adjoint equations in space and time. Single shooting in time while simultaneously solving the equations in space is used for the solution of the optimal control problem. The BarzileiBorwein gradient method [24, 25] is used for the update. Here, a pendulum-like beam subject to gravity and fixed-free translation and free-free rotation boundary conditions is considered with a torque \(u\) applied at the fixed end as discrete redundant control forces

$$ \mathrm{L}_{(\tilde{q}^{n}_{0})}^{T} (f^{\mathrm{C, 1}}_{d})_{0}^{n} = \mathrm{L}_{(\tilde{q}^{n+1}_{0})}^{T} (f^{\mathrm{C, 3}}_{d})_{0}^{n} = 2 \Delta t \Delta s \, u_{0}^{n} \, \boldsymbol{k}\,. $$

Since our control is only applied at the boundary, this is a boundary control problem for a PDE.

The desired configuration is the upright rotated position of the beam, specified for each node in space. The final position considered is undeformed. The desired maneuver is from the lower position to the upright position in such a way that the inertial terms cancel the strains in the end configuration. As the system is heavily underactuated, the chosen input does not allow us to control the motion in the axial direction and does not lead to a stationary upright position. Hence, no end momentum is imposed. Nonetheless, the control task should demonstrate the presented method in an academic example that resembles the previous pendulum examples sufficiently.

Our optimal control problem is of the form

$$ \underset{\tilde{q}_{d}, u_{d}}{\min}\, J_{d}(\tilde{q}_{d}, u_{d}) = \sum _{a = 0}^{A} [\tilde{q}_{a}^{N} - (\tilde{q}_{a}^{N})_{*}]^{T} S_{q} [\tilde{q}_{a}^{N} - (\tilde{q}_{a}^{N})_{*}] + \sum _{n=0}^{N-1} \frac{1}{2} (u_{0}^{n})^{T} R (u_{0}^{n}) $$
(34a)

subject to:

$$\begin{aligned} \tilde{q}_{a}^{0} &= (\tilde{q}_{a}^{0})_{*}, \hspace{6.4cm} \text{for}~a=0,~\ldots,~N, \end{aligned}$$
(34b)
$$\begin{aligned} u_{0}^{0} &= (u_{0}^{0})_{*}, \end{aligned}$$
(34c)
$$\begin{aligned} 0 &= \widetilde{P}^{T}(\tilde{q}_{0}^{n}) \left [ D_{1} ( \widetilde{L}_{d})_{0}^{n} + D_{3} (\widetilde{L}_{d})_{0}^{n-1} + (f_{d}^{1})_{0}^{n} + (f_{d}^{3})_{0}^{n-1}\right ], \end{aligned}$$
(34d)
$$\begin{aligned} & \hspace{8.1cm} \text{for}~n=1,~\ldots,~N-1, \\ 0 &= \widetilde{P}^{T}(\tilde{q}_{a}^{n}) \left [ D_{1} ( \widetilde{L}_{d})_{a}^{n} + D_{2} (\widetilde{L}_{d})_{a-1}^{n} + D_{3} (\widetilde{L}_{d})_{a}^{n-1} + D_{4} (\widetilde{L}_{d})_{a-1}^{n-1} \right . \\ &+\left . (f_{d}^{1})_{a}^{n} + (f_{d}^{2})_{a-1}^{n} + (f_{d}^{3})_{a}^{n-1} + (f_{d}^{4})_{a-1}^{n-1}\right ], \\ & \hspace{4.8cm} \text{for}~a=1,~\ldots,~A-1,\,\text{for}~n=1,~\ldots,~N-1, \end{aligned}$$
(34e)

where \((\tilde{q}_{a}^{0})_{*}\), \((u_{0}^{0})_{*}\) are given initial discrete values and \((\tilde{q}_{a}^{N})_{*}\) denotes the discretized desired end configuration.

The adjoint equations are obtained similar to the constrained temporal case by applying discrete variational calculus and nodal reparametrization, but now in space and time. However, the resulting equations are quite long, and so they will not be reproduced here in their entirety. For instance, the equations obtained by taking variations of the inputs at the fixed boundary \(a=0\) are

$$\begin{aligned} &0= (\lambda _{0}^{n})^{T} \widetilde{P}^{T}(\tilde{q}_{0}^{n}) (D_{5} f_{d}^{1})_{0}^{n} + (\lambda _{0}^{n+1})^{T} \widetilde{P}^{T}( \tilde{q}_{0}^{n+1}) (D_{5} f_{d}^{2})_{0}^{n}\,, \end{aligned}$$
(35a)
$$\begin{aligned} & \hspace{8.85cm} \mathrm{for}~n\in [1,N-2], \\ &0= (\lambda _{0}^{N-1})^{T} \widetilde{P}^{T}(\tilde{q}_{0}^{N-1}) (D_{5} f_{d}^{1})_{0}^{N-1}. \end{aligned}$$
(35b)

These are used to update the torque. If instead of boundary control we had controls over the bulk, then these equations would generalize to all nodes as follows:

$$ \begin{aligned} 0&=(\lambda _{a}^{n})^{T} \widetilde{P}^{T}(\tilde{q}_{a}^{n}) (D_{5} f_{d}^{1})_{a}^{n} + (\lambda _{a+1}^{n})^{T} \widetilde{P}^{T}(\tilde{q}_{a+1}^{n}) (D_{5} f_{d}^{2})_{a}^{n} \\ &+ (\lambda _{a}^{n+1})^{T} \widetilde{P}^{T}(\tilde{q}_{a}^{n+1}) (D_{5} f_{d}^{3})_{a}^{n} + (\lambda _{a+1}^{n+1})^{T} \widetilde{P}^{T}( \tilde{q}_{a+1}^{n+1}) (D_{5} f_{d}^{4})_{a}^{n}, \\ & \hspace{5.5cm} \mathrm{for}~a=1,~\ldots,~A-2,~n=1,~\ldots,~N-2, \\ 0&=(\lambda _{A-1}^{n})^{T} \widetilde{P}^{T}(\tilde{q}_{A-1}^{n}) (D_{5} f_{d}^{1})_{A-1}^{n} + (\lambda _{A-1}^{n+1})^{T} \widetilde{P}^{T}( \tilde{q}_{A-1}^{n+1}) (D_{5} f_{d}^{3})_{A-1}^{n}, \\ & \hspace{8.85cm} \mathrm{for}~n\in [1,N-2], \\ 0&=(\lambda _{a}^{N-1})^{T} \widetilde{P}^{T}(\tilde{q}_{a}^{N-1}) (D_{5} f_{d}^{1})_{a}^{N-1} + (\lambda _{a+1}^{N-1})^{T} \widetilde{P}^{T}( \tilde{q}_{a+1}^{N-1}) (D_{5} f_{d}^{2})_{a}^{N-1}, \\ & \hspace{8.95cm} \mathrm{for}~a\in [1,A-2], \\ 0&=(\lambda _{A-1}^{N-1})^{T} \widetilde{P}^{T}(\tilde{q}_{A-1}^{N-1}) (D_{5} f_{d}^{1})_{A-1}^{N-1}. \end{aligned} $$

4.5 Fairly rigid beam

The fairly rigid beam demonstrates the sequential optimization of the beam dynamics with objective minimization of the control effort. The simulation of the beam dynamics uses \(A=10\) nodes in space and \(N=3000\) nodes in time. The beam has a length of \(L=1\). The simulation duration is \(T=1\). The resulting time step is \(h = \frac{1}{3000}\) in time and the step size in space is \(\Delta s = \frac{1}{10}\). A constant initial guess of \(u^{0}=1500\) is used. The beam has a square cross-section of \(A_{\mathrm{cross}}=0.01\) with a side length of \(l_{s}=0.1\). The chosen weighting for the end term is \(S_{q}=10^{8}\), and \(R = 10^{-2}\) for the input.Footnote 1 The material of the beam is fairly rigid with a Young’s modulus of \(E=210{,}000\) and a Poisson ratio of \(\nu =0.3\). The mass density is \(\rho = 7.85\). The beam is damped with \(\eta = 1\cdot 10^{-1}\) and \(\zeta =1\cdot 10^{-2}\).

Figure 9(a) shows snapshot of the motion of the beam. Figure 9(b) shows the total energy \(H\) as well as all its contributions over time. The deformation energy is the difference between the total potential energy of the system \(U\) and the gravitational potential energy \(U_{grav}\). The main contribution to the kinetic energy \(T\) is due to translation. At the end of the simulation, the kinetic energy reduces due to the input weight. The optimized input is depicted in Fig. 10(a), it decreases to zero at the end of the simulation time. The optimized quantities, the distance of the beam to the desired end configuration as well as the control effort are depicted in Figs. 10(b) and 10(c), respectively. The gradient depicted in Fig. 10(d) shows heavy oscillations.

Fig. 9
figure 9

Motion of a fairly rigid beam and its energy

Fig. 10
figure 10

Optimization results of beam dynamics using single shooting for a fairly rigid beam

4.6 Very flexible beam

A very flexible beam demonstrates the sequential optimization for more flexible beams that show larger deformations and are therefore harder to control. The simulation of the beam and adjoint dynamics uses \(A=5\) nodes in space for a length of \(L=1\). This results in a space step width of \(\Delta s = \frac{1}{5}\). The simulation time is \(T=0.5\) using \(N=600\) node in time and a time step of \(h = \frac{1}{1200}\). The initial guess for the input is \(u^{0} = 50\) for all time intervals. The beam has a square cross-section of \(A_{\mathrm{cross}}=0.0025\) with a side length of \(l_{s}=0.05\). The Young’s modulus is \(E=50{,}000\) and the mass density \(\rho = 1000\). The Poisson ratio is \(\nu =0.35\). KelvinVoigt type damping is applied with \(\eta = 1\cdot 10^{-1}\) and \(\zeta =1\cdot 10^{-2}\).

The weighting for the end configuration is \(S_{q}=10^{2}\). For this numerical experiment, the input weight was set to \(R=0\) since the chosen end configuration gets increasingly harder to reach for more flexible beams.

The optimization results are depicted in Fig. 11. The input in Fig. 12(a) is increased compared to the initial guess. In addition, oscillations are present. The gradient depicted in Fig. 12(b) shows oscillations with high frequency that are likely caused by the dynamics of the beam in normal direction as these deformations are of much higher frequency than bending deformations due to the difference in stiffness. The objective is depicted in 12(c). The largest decrease happens at the start of the optimization. Figure 12(d) depicts the total energy and its parts. During the optimization, mainly the translational part of the kinetic energy increases as well as the potential energy due to the gravitation.

Fig. 11
figure 11

Snapshot of a very flexible beam at \(n=0\) as well as during the upward movement of a very flexible beam

Fig. 12
figure 12

Optimization results of beam dynamics of a very flexible beam

5 Summary

The discrete adjoint method for variational integration of (constrained) ODEs is derived, and its convergence properties are demonstrated with the help of numerical examples. Quadratic convergence results of the configuration variables as well as for the adjoint variables based on simulations of a mathematical pendulum are observed. The discrete adjoint method is also applied to the multisymplectic Galerkin Lie group integrator for geometrically exact beam dynamics, in particular to the optimal control of the upward motion of a pendulum-like beam. The discrete adjoint method directly derives fitting equations at the boundary based on the discretization chosen for the variational integrator. The discrete adjoint method for constrained systems with null space projection and nodal reparametrization also directly results in the null space projection of the discrete adjoint equations. The properties of the discrete adjoint method applied to structure preserving integrators have to be analyzed further as to understand the connection in a more general setting.