1 Introduction

Simulators for mechanical systems are widely used, for example, in testing and verification [1, 2], in model-based control strategies [3, 4], or in learning-based methods [5, 6]. However, many common simulators have numerical difficulties with more complex mechanical systems involving constraints [7]. Such constraints can represent joints connecting rigid bodies, which may form kinematic loops, for example, in exoskeletons. Constraints can also be used to confine the movement of bodies, for example, to model joint limits in robotic arms or to describe contact with other bodies or the environment in walking and grasping. Exactly enforcing such constraints can cause numerical issues, for example, due to the stiff nature of contact interactions. To alleviate these numerical issues, simulators often allow small constraint violations by representing all constraints as spring-damper elements, as in MuJoCo [8] and Brax [9], or by accepting interpenetration of bodies, as in Drake [10] and Bullet [11]. Small violations can sometimes be acceptable, for example, contact interpenetration in the order of micrometers for meter-scale walking robots. But millimeter or centimeter violations, for example in MuJoCo, can be considered too large. Employing these methods and accepting larger constraint violations for stable simulations contributes to the sim-to-real gap, which is a major issue in robotics [12].

Therefore, this article addresses physically accurate simulations of mechanical systems. Specifically, we focus on good modeling of the energy behavior of such systems, as this quantity is important for the stability of dynamical systems from a control theory perspective. Moreover, we treat the correct enforcement of constraints, since allowing for constraint violations can lead to problematic results. For example, allowing softness in contact interactions can lead to wrong contact points, which can cause a robot to fail to properly grasp an object when the simulation results are directly transferred to a real-world application.

Most simulators, including the ones listed above except Brax, use minimal coordinates (generalized coordinates) as a mechanism’s state representation. Given a system with \(m\) degrees of freedom and a set of constraints that removes \(c\) degrees of freedom, the system is parameterized by \(n=m-c\) independent coordinates. In minimal coordinates, the constraints are implicitly incorporated into the dynamics equations, and only the smallest possible number of variables is retained, which leads to dimensionally small problem sizes. However, eliminating constraints requires specialized treatment of each constraint type and is not possible for nonholonomic constraints such as contacts. Furthermore, the algorithms typically used for minimal coordinates can have numerical issues such as ill-conditioning [13, 14], which may be increased by their recursive nature, although a rigorous study of the numerical effects and stability of minimal-coordinate dynamics algorithms remains to be done. We provide some empirical evidence for these numerical issues in Sect. 5.

It is also possible to explicitly retain all or part of the constraints on a mechanical system by introducing constraint forces to adhere to these constraints. With this approach, redundant coordinates are obtained since the constraints make part of the coordinates mathematically dependent. There are many ways to parameterize mechanical systems in redundant coordinates. Natural coordinates use three points in Cartesian coordinates to parameterize the configuration of each body [15]. Alternatively, the position of the center of mass of a body and a director frame for the orientation can be used [16, 17]. These methods have in common that they aim at circumventing the inherent intricacies of parameterizing rotations. In contrast, we use so-called maximal coordinates [18], which represent each body in a mechanism with its three degrees of freedom for the position of the center of mass and three degrees of freedom for the orientation by purposefully building on the group structure of rotations. All kinematic relations, such as joints or contacts, are formulated as explicit constraints. The use of maximal coordinates allows for very modular modeling of dynamics, which makes it possible to describe many different kinds of constraints in a unified and non-specialized manner. Additionally, resorting to maximal and redundant coordinates enables the use of different algorithms than for minimal coordinates, specifically direct matrix methods, which generally have well-investigated numerical properties and enable the formulation of stable and well-conditioned algorithms [19]. The core idea of direct matrix methods is to modify the underlying matrix before, during, and after it is used in an algorithm to improve the stability and accuracy of an algorithm. Examples of such modifications are scaling certain matrix entries or iterative solution refinement. The few existing works on simulation, specifically in maximal coordinates, have explored algorithms for continuous-time forward dynamics [18] and discrete-time algorithms for mechanisms without kinematic loops or contacts [20]. Besides their application in simulation, maximal coordinates have shown promising results in control applications [21, 22]. The simulator Brax is also based on maximal coordinates, but suffers from the aforementioned issue of soft constraint handling.

Besides the numerical issues related to constraints, all of the packaged simulators above use integrators for the dynamics that are not well suited for mechanical systems due to the unrealistic energy behavior of certain explicit or implicit Runge–Kutta methods. Deriving a perfect integrator is generally not trivial [23, 24], but certain classes of integrators exhibit excellent properties, among them symmetric and symplectic methods [25]. An elegant way to derive symplectic integrators for mechanical systems is through a variational perspective, leading to variational integrators. These integrators are derived by discretizing the derivation of the mechanism’s differential equations instead of the differential equations themselves. Thus, certain properties such as energy and momentum conservation are maintained [26]. Compared to classical discretizations such as explicit Runge–Kutta methods, larger time steps can be taken due to the increased physical accuracy, and constraint drift is avoided entirely. Computationally efficient variational integrators have been theoretically investigated in minimal coordinates [2729]. However, the implementations are limited to simple mechanical systems with few joints types and no contact interactions, potentially due to numerical difficulties. Variational and other structure-preserving integrators have also been investigated in redundant coordinates. Here, much effort has been put into variational integrators for systems with explicit constraints [17, 26, 30]. Another investigated approach is nullspace methods which eliminate constraints while remaining structure-preserving [31, 32]. Redundant coordinates are also well suited for structure-preserving integration of flexible multibody systems [3335]. These presented works deal with advancing the theoretical properties of structure-preserving integrators. In this article, we derive a unified and modular integrator and develop methods for the numerically efficient implementation of such an integrator.

More specifically, the contribution of this article to maximal-coordinate simulators is as follows. Firstly, we derive a variational integrator in maximal coordinates for physically accurate simulations, including good energy behavior and rigid, non-drifting constraints. While variational integrators are generally not new, we derive this integrator in a unified framework for typical dynamics components, including rigid bodies, joints, contacts, friction, actuators and external forces, springs, and dampers. Secondly, we provide an efficient graph-based solver algorithm for the system of nonlinear equations that form the integrator. The solver exploits the sparsity of the nonlinear system of equations to account for the increased number of variables in maximal coordinates. It achieves linear computational complexity in the number of links and joints for mechanisms without kinematic loops and reduces the complexity for mechanisms with kinematic loops. For environment contacts and friction, the solver also achieves linear computational complexity in the number of links and contact points, while reducing the complexity for inter-mechanism contacts. Besides these theoretical contributions for efficient maximal-coordinate simulators based on variational integrators, we also provide an open-source implementation of such a simulator (see Sect. 5), which achieves competitive timing results compared to state-of-the-art simulators.

This article is structured as follows. In Sect. 2, the desired simulator components are mathematically formalized. Based on these components, the variational integrator is derived in Sect. 3, resulting in a system of nonlinear equations. The solver for this system of equations is presented in Sect. 4. An evaluation of the theoretical and numerical properties and application examples are given in Sect. 5, and conclusions are drawn in Sect. 6. The appendices provide background information, including our quaternion notation in Appendix A.

2 Dynamics components

This section formulates the dynamics components for which the variational integrator is derived in Sect. 3. Unit quaternions are used for rotations and orientations due to their computational efficiency (see Appendix A for our quaternion notation). However, other representations, such as rotation matrices, of which the group of unit quaternions is a double cover [36], could be used as well. Figure 1 shows a mechanism with the treated components.

Fig. 1
figure 1

Exemplary depiction of the simulator components. (a) A two-link mechanism with joints, a spring-damper, and a contact subject to friction. (b) A four-sided linearized friction cone

Rigid body

In maximal coordinates, each of the \(n_{\mathrm{b}}\) rigid bodies in a mechanism has a position \(\boldsymbol{x}\in \mathbb{R}^{3}\) and orientation \(\boldsymbol{q}\in \mathbb{H}\) (\(\boldsymbol{q}^{ \mathrm{T}}\boldsymbol{q}=1\)), as well as a translational velocity \(\boldsymbol{v}\in \mathbb{R}^{3}\) and angular velocity \(\boldsymbol{\omega }\in \mathbb{R}^{3}\). The configuration of a body is denoted as \(\boldsymbol{z} = [\boldsymbol{x}^{\mathsf{T}}\ \boldsymbol{q}^{ \mathsf{T}}]^{\mathsf{T}}\), and the velocity is denoted as \(\dot{\boldsymbol{z}} = [\boldsymbol{v}^{\mathsf{T}}\ \boldsymbol{\omega }^{\mathsf{T}}]^{\mathsf{T}}\). Each body has a mass \(m\in \mathbb{R}\) and a symmetric moment of inertia matrix \(\boldsymbol{J}\in \mathbb{R}^{3\times 3}\). All quantities refer to the center of mass of a body. Given \(\boldsymbol{M}=m\boldsymbol{I}_{3\times 3}\), the kinetic energy of each body is \(\mathcal{T}(\dot{\boldsymbol{z}}) = \frac{1}{2}\boldsymbol{v}^{ \mathsf{T}}\boldsymbol{M}\boldsymbol{v} + \frac{1}{2} \boldsymbol{\omega }^{\mathsf{T}}\boldsymbol{J}\boldsymbol{\omega }\).

Conservative forces, potentials, and springs

Conservative forces in the dynamics are derived from potential functions \(\mathcal{V}(\boldsymbol{z}_{\mathrm{a}}, \boldsymbol{z}_{\mathrm{b}}, \boldsymbol{z}_{\mathrm{c}}, \ldots )\in \mathbb{R}\) involving one or multiple bodies. Such potentials can represent, for example, gravity or springs.

Non-conservative forces, actuators, and dampers

Non-conservative forces, including actuators and dampers, are directly added to the dynamics as external forces \(\boldsymbol{\mathrm{f}}\in \mathbb{R}^{3}\) or torques \(\boldsymbol{\tau }\in \mathbb{R}^{3}\). Forces are described in the global frame and torques in the body frame. The wrench on a body is denoted as \(\boldsymbol{\mathrm{w}} = [\boldsymbol{\mathrm{f}}^{\mathsf{T}}\ \boldsymbol{\tau }^{\mathsf{T}}]^{\mathsf{T}}\). These wrenches are added for each body individually. Actuators fixed at joints are formulated by expressing their wrench in the frames of the connected bodies.

As an example, for an actuator with wrench \(\boldsymbol{\mathrm{w}}_{\mathrm{act}} = [\boldsymbol{\mathrm{f}}_{ \mathrm{act}}^{\mathsf{T}}\ \boldsymbol{\tau }_{\mathrm{act}}^{ \mathsf{T}}]^{\mathsf{T}}\) at a joint between bodies a (parent) and b (child), the resulting wrenches for the bodies are

(1a)
(1b)

where \(\mathrm{N}\) is the global frame, \(\mathrm{A}\) and \(\mathrm{B}\) are reference frames of bodies a and b, respectively, and \(\boldsymbol{p}\) is a vector from center of mass to actuator.

Joints and equality constraints

The joints of a mechanism consisting of one or multiple rigid bodies are represented by differentiable equality constraints \(\boldsymbol{g}(\boldsymbol{z}_{\mathrm{a}}, \boldsymbol{z}_{ \mathrm{b}}, \boldsymbol{z}_{\mathrm{c}}, \ldots )=\boldsymbol{0}\in \mathbb{R}^{n_{\mathrm{e}}}\), where \(n_{\mathrm{e}}\) is the number of equality constraints on the mechanism. Typically, equality constraints are formulated for all \(i\) kinematically connected pairs of rigid bodies independently and subsequently stacked into one constraint function \(\boldsymbol{g} = [\boldsymbol{g}_{1}^{\mathsf{T}}\cdots \boldsymbol{g}_{i}^{\mathsf{T}}]^{\mathsf{T}}\). In maximal coordinates, two generic equality constraint functions, one for the translational and one for the rotational movement of the two connected bodies, can be combined to create most of the common joints encountered in mechanisms. This insight greatly simplifies analytic gradient calculations for fast computations and gives direct access to minimal coordinates as well (see Appendix B for a list of possible joints and the recovery of minimal coordinates).

Contacts and inequality constraints

Rigid contacts between multiple bodies and with the environment are represented by differentiable inequality constraints \(\boldsymbol{\phi }(\boldsymbol{z}_{\mathrm{a}}, \boldsymbol{z}_{ \mathrm{b}}, \boldsymbol{z}_{\mathrm{c}}, \ldots )\geq \boldsymbol{0} \in \mathbb{R}^{n_{\mathrm{i}}}\), where \(n_{\mathrm{i}}\) is the number of inequality constraints on the mechanism. As for joints, inequality constraints are typically formulated independently for all \(j\) pairs of rigid bodies from their signed distance function and subsequently stacked into one constraint function \(\boldsymbol{\phi } = [\phi _{1}^{\mathsf{T}}\cdots \phi _{j}^{ \mathsf{T}}]^{\mathsf{T}}\).

As an example, ground contact of a single point contact on a body in maximal coordinates always has the form \(\phi (\boldsymbol{z}) = \boldsymbol{e}_{\mathrm{z}}^{\mathsf{T}}( \boldsymbol{x} + \boldsymbol{q}\cdot \boldsymbol{p}\cdot \boldsymbol{q}^{-1}) \geq 0\), where \(\boldsymbol{e}_{\mathrm{z}}\) is the z-axis unit vector in the global frame and \(\boldsymbol{p}\in \mathbb{R}^{3}\) points from center of mass to contact point in the body’s frame. As before, this consistent structure allows for the calculation of analytic gradients to reduce computation time.

Static and sliding friction

In the dynamics, static and sliding friction are added as external forces on the bodies. These forces are calculated by solving an optimization problem derived from the maximum dissipation principle [37]. We are using a linearized friction cone [38], although a nonlinear friction cone could be used as well. The maximum dissipation principle states that the energy dissipation rate of the bodies in contact is maximized by a friction force \(\boldsymbol{\beta }\in \mathbb{R}^{n_{\mathrm{c}}n_{\mathrm{f}}}\), where \(n_{\mathrm{c}}\) is the number of contacts and \(n_{\mathrm{f}}\) is the even number of basis vectors of the friction cone. The basis vectors \(\boldsymbol{b}_{i}\in \mathbb{R}^{3}\) of the linearized friction cone are depicted in Fig. 1 (b). A detailed derivation of the optimization problem is stated in Appendix C. The optimization problem resulting from the maximum dissipation principle for the bodies of a mechanism is

min β [ z ˙ a T z ˙ b T z ˙ c T ] B ( z a , z b , z c , ) T β,
(2a)
s.t. E T β C f γ,
(2b)
β0,
(2c)

where \(\boldsymbol{B}\in \mathbb{R}^{6n_{\mathrm{b}}\times n_{\mathrm{c}}n_{ \mathrm{f}}}\) maps the \(n_{\mathrm{f}}\)-dimensional friction forces at each of the \(n_{\mathrm{c}}\) contact point to a six-dimensional wrench on the respective bodies. The constraint \(\boldsymbol{E}^{\mathsf{T}}\boldsymbol{\beta } \leq \boldsymbol{C}_{ \mathrm{f}}\boldsymbol{\gamma }\) describes the limit on the friction forces with \(\boldsymbol{E}=\mathrm{diag}(\pmb{1}, \ldots , \pmb{1})\in \mathbb{R}^{n_{\mathrm{c}}n_{\mathrm{f}}\times n_{\mathrm{c}}}\), \(\pmb{1} = [1 \ \cdots \ 1]^{\mathsf{T}}\in \mathbb{R}^{n_{\mathrm{f}}}\), normal forces \(\boldsymbol{\gamma }\in \mathbb{R}^{n_{\mathrm{c}}}\), and the friction coefficient matrix \(\boldsymbol{C}_{\mathrm{f}} = \mathrm{diag}(c_{\mathrm{f},1},\ldots ,c_{ \mathrm{f},n_{\mathrm{c}}})\in \mathbb{R}^{n_{\mathrm{c}}\times n_{ \mathrm{c}}}\) containing the friction coefficients \(c_{\mathrm{f}}\) for each contact point. The friction wrenches for each body resulting from the optimization are added as external wrenches to the dynamics.

As an example, the optimization problem for ground contact of a single point contact on a body in maximal coordinates always has the form

min β z ˙ T B T β,
(3a)
(3b)
β0,
(3c)

with \(\boldsymbol{B}^{\mathsf{T}}=[\boldsymbol{B}_{\boldsymbol{x}} \ \boldsymbol{B}_{\boldsymbol{q}}]^{\mathsf{T}}\) consisting of

$$\begin{aligned} \boldsymbol{B}_{\boldsymbol{x}}^{\mathsf{T}}&= \begin{bmatrix} \boldsymbol{b}_{1} ~ -\boldsymbol{b}_{1} ~ \cdots ~ \boldsymbol{b}_{ \frac{n_{\mathrm{f}}}{2}} ~ -\boldsymbol{b}_{\frac{n_{\mathrm{f}}}{2}} \end{bmatrix}\in \mathbb{R}^{3\times n_{\mathrm{f}}}, \end{aligned}$$
(4a)
$$\begin{aligned} \boldsymbol{B}_{\boldsymbol{q}}^{\mathsf{T}}&= \boldsymbol{p}^{ \times }\boldsymbol{Q}(\boldsymbol{q})\boldsymbol{B}_{\boldsymbol{x}}^{ \mathsf{T}}\in \mathbb{R}^{3\times n_{\mathrm{f}}}, \end{aligned}$$
(4b)

where the vector from the body’s center of mass to the contact point is denoted \(\boldsymbol{p}\), the × operator creates the skew-symmetric matrix from this vector, and \(\boldsymbol{Q}(\boldsymbol{q})\) is the rotation matrix for quaternion \(\boldsymbol{q}\).

3 Mathematical integrator

A first-order variational (symplectic) integrator [26] is derived for the simulator components described in the previous section. This integrator discretizes the rigid-body dynamics while maintaining energy and momentum conservation properties as well as constraint satisfaction. Higher-order variational integrators are possible [39, 40], and we restrict the derivation to the first order for clarity. First, the derivation for unconstrained dynamics is provided. Afterward, equality and inequality constraints are added to the integrator, and finally, friction dynamics are incorporated.

3.1 Unconstrained dynamics

The derivation of the integrator for unconstrained dynamics is split into translational and rotational components for clarity. Note that the derivation also holds for coupled translational and rotational dynamics. Variational integrators are based on the principle of least action, which states that a mechanical system takes the path of least action when going from a fixed starting point to a fixed ending point. External forces and torques are incorporated with the Lagrange-d’Alembert principle. Action has the dimensions \([\text{Energy}]\times [\text{Time}]\) and the unconstrained action integral \(S_{0}\) with is defined as

$$ S_{0}(\boldsymbol{z},\dot{\boldsymbol{z}}) = \int _{t_{0}}^{t_{N}} \mathcal{L}(\boldsymbol{z},\dot{\boldsymbol{z}}) ~ \mathrm{d}t + \int _{t_{0}}^{t_{N}} \begin{bmatrix} \boldsymbol{\mathrm{f}}^{\mathsf{T}}& 2\boldsymbol{L}(\boldsymbol{q}) \boldsymbol{V}^{\mathsf{T}}\boldsymbol{\tau }\end{bmatrix} ^{\mathsf{T}}\boldsymbol{z} ~ \mathrm{d}t, $$
(5)

where \(\mathcal{L} = \mathcal{T}-\mathcal{V}\) is the Lagrangian with kinetic energy \(\mathcal{T}\) and potential energy \(\mathcal{V}\). A brief explanation of the external force and torque components in (5), as well as our quaternion notation, are given in Appendix A. Further details on virtual work for quaternions can be found in [41, 42].

Translational component

The translational component of the action integral (5) is

$$ S_{0,\mathrm{T}}(\boldsymbol{x},\boldsymbol{v}) = \int _{t_{0}}^{t_{N}} \mathcal{L}_{\mathrm{T}}(\boldsymbol{x},\boldsymbol{v}) ~ \mathrm{d}t + \int _{t_{0}}^{t_{N}}\boldsymbol{\mathrm{f}}^{\mathsf{T}} \boldsymbol{x} ~ \mathrm{d}t. $$
(6)

For numerical integration, (6) is discretized. A first-order discretization of the integral and a first-order approximation of the velocity,

$$ \boldsymbol{v}_{k} = \frac{\boldsymbol{x}_{k+1}-\boldsymbol{x}_{k}}{\Delta t}, $$
(7)

with step size \(\Delta t\) is used to obtain the discrete action sum

$$ S_{\mathrm{d}, 0, \mathrm{T}}(\boldsymbol{x}_{k},\boldsymbol{v}_{k}) = \sum _{k=0}^{N-1}\left ( \mathcal{L}_{\mathrm{T}}(\boldsymbol{x}_{k}, \boldsymbol{v}_{k}) + \boldsymbol{\mathrm{f}}_{k}^{\mathsf{T}} \boldsymbol{x}_{k}\right ) \Delta t. $$
(8)

For a first-order integrator, the principle of least action must be fulfilled for trajectories consisting of three knot points, i.e., from 0 to \(N=2\). Since \(\boldsymbol{x}_{0}\) and \(\boldsymbol{x}_{2}\) are fixed start and end points of the trajectory, only \(\boldsymbol{x}_{1}\) can vary. Therefore, the action sum is minimized with respect to the position \(\boldsymbol{x}_{1}\):

$$ \nabla _{\boldsymbol{x}_{1}}S_{\mathrm{d}, 0, \mathrm{T}} = - \boldsymbol{d}_{\mathrm{0,T}}\Delta t = \boldsymbol{0}, $$
(9)

where \(\boldsymbol{d}_{\mathrm{0,T}}\) are the resulting implicit discretized translational dynamics. In other words, if Equation (9) is fulfilled, we have found the discrete approximation of the physically correct trajectory consisting of the knot points \(\boldsymbol{x}_{0}\), \(\boldsymbol{x}_{1}\), and \(\boldsymbol{x}_{2}\). Note that the derivative with respect to \(\boldsymbol{x}_{1}\) is taken for each body in a mechanism.

An example for a single body with potential function \(\mathcal{V}(\boldsymbol{x})\) yields

$$\begin{aligned} \boldsymbol{d}_{\mathrm{0,T}}(\boldsymbol{v}_{1}) &= -\nabla _{ \boldsymbol{x}_{1}}\left (\mathcal{L}_{\mathrm{T}}(\boldsymbol{x}_{0}, \boldsymbol{v}_{0}) + \boldsymbol{\mathrm{f}}_{0}^{\mathsf{T}} \boldsymbol{x}_{0} + \mathcal{L}_{\mathrm{T}}(\boldsymbol{x}_{1}, \boldsymbol{v}_{1}) + \boldsymbol{\mathrm{f}}_{1}^{\mathsf{T}} \boldsymbol{x}_{1}\right ) \\ &= -\left (\boldsymbol{M} \frac{\boldsymbol{x}_{1}-\boldsymbol{x}_{0}}{\Delta t} - \boldsymbol{M}\frac{\boldsymbol{x}_{2}-\boldsymbol{x}_{1}}{\Delta t} - \nabla _{\boldsymbol{x}_{1}}\mathcal{V}(\boldsymbol{x}_{1}) + \boldsymbol{\mathrm{f}}_{1}\right ) \\ &= \boldsymbol{M} \frac{\boldsymbol{v}_{1}-\boldsymbol{v}_{0}}{\Delta t} + \nabla _{ \boldsymbol{x}_{1}}\mathcal{V}(\boldsymbol{x}_{1}) - \boldsymbol{\mathrm{f}}_{1} = \boldsymbol{0}, \end{aligned}$$
(10)

which resembles the discretized version of Newton’s second law, \(\boldsymbol{M}\dot{\boldsymbol{v}}-\boldsymbol{\mathrm{f}}= \boldsymbol{0}\).

The physically accurate dynamics are obtained by varying \(\boldsymbol{x}_{1}\) with fixed \(\boldsymbol{x}_{0}\) and \(\boldsymbol{x}_{2}\). However, when integrating dynamics forward in time, we start from a known initial state \(\boldsymbol{x}_{0}\) and \(\boldsymbol{v}_{0}\). From (7), the position at the next time step, \(\boldsymbol{x}_{1}\), is calculated as

$$ \boldsymbol{x}_{1} = \boldsymbol{x}_{0} + \boldsymbol{v}_{0}\Delta t. $$
(11)

Then, the implicit dynamics equations (9) are solved to obtain the velocity \(\boldsymbol{v}_{1}\). This resulting integration scheme, consisting of (11) and (9), is the symplectic Euler method [25].

Rotational component

The rotational component of the integrator can be derived similarly to the translation case. A description for an unconstrained floating single rigid body is presented in [43], and we later extend the derivation to constrained multibody systems.

The action integral for the rotational component is

$$ S_{0,\mathrm{R}}(\boldsymbol{q},\boldsymbol{\omega }) = \int _{t_{0}}^{t_{N}} \mathcal{L}_{\mathrm{R}}(\boldsymbol{q},\boldsymbol{\omega })~ \mathrm{d}t + \int _{t_{0}}^{t_{N}}2\boldsymbol{\tau }^{\mathsf{T}} \boldsymbol{V} \boldsymbol{L}(\boldsymbol{q})^{\mathsf{T}} \boldsymbol{q}~\mathrm{d}t. $$
(12)

To maintain unit norm in the quaternion update (see Appendix A), the discrete quaternion angular velocity is defined as

$$ \bar{\boldsymbol{\omega }}_{k} = \begin{bmatrix} \sqrt{\left (\frac{2}{\Delta t}\right )^{2} - \boldsymbol{\omega }_{k}^{ \mathsf{T}}\boldsymbol{\omega }_{k}}~ \\ \boldsymbol{\omega }_{k}\end{bmatrix} = \frac{2}{\Delta t}\boldsymbol{L}(\boldsymbol{q}_{k})^{ \mathsf{T}}\boldsymbol{q}_{k+1}. $$
(13)

As before, the action integral (12) is discretized to obtain the action sum

$$ S_{\mathrm{d},0,\mathrm{R}}(\boldsymbol{q}_{k},\boldsymbol{\omega }_{k}) = \sum _{k=0}^{N-1}\left (\mathcal{L}_{\mathrm{R}}(\boldsymbol{q}_{k}, \boldsymbol{\omega }_{k}) + 2\boldsymbol{\tau }_{k}^{\mathsf{T}} \boldsymbol{V} \boldsymbol{L}(\boldsymbol{q}_{k})^{\mathsf{T}} \boldsymbol{q}_{k}\right ) \Delta t. $$
(14)

The principle of least action is fulfilled by minimizing the discrete action sum from 0 to \(N=2\) over the orientation \(\boldsymbol{q}_{1}\):

$$ \nabla ^{\mathrm{r}}_{\boldsymbol{q}_{1}}S_{\mathrm{d},0,\mathrm{R}} = -\boldsymbol{d}_{\mathrm{0,R}}\Delta t = \boldsymbol{0}, $$
(15)

where \(\boldsymbol{d}_{\mathrm{0,R}}\) are the implicit discretized rotational dynamics. Note that we have used the rotational gradient \(\nabla ^{\mathrm{r}}\) (see Appendix A) and that the derivative with respect to \(\boldsymbol{q}_{1}\) is taken for each body in a mechanism.

An example for a single body with potential function \(\mathcal{V}(\boldsymbol{q})\) yields

d 0 , R ( ω 1 ) = J ω 1 4 Δ t 2 ω 1 T ω 1 + ω 1 × J ω 1 J ω 0 4 Δ t 2 ω 0 T ω 0 + ω 0 × J ω 0 + q 1 r V ( q 1 ) 2 τ 2 = 0 ,
(16)

which—analogous to the translational case—bears resemblance to Euler’s equations for rotations \(J\dot{\boldsymbol{\omega }}+\boldsymbol{\omega }^{\times}J \boldsymbol{\omega } - \boldsymbol{\tau } = \boldsymbol{0}\).

An integration step given \(\boldsymbol{q}_{0}\) and \(\boldsymbol{\omega }_{0}\) is performed by first calculating

$$ \boldsymbol{q}_{1} = \frac{\Delta t}{2}\boldsymbol{L}(\boldsymbol{q}_{0}) \bar{\boldsymbol{\omega }}_{0}, $$
(17)

and subsequently solving (15) for the angular velocity \(\boldsymbol{\omega }_{1}\).

3.2 Equality constrained dynamics

Equality constraint functions \(\boldsymbol{g}(\boldsymbol{z})\) with Lagrange multiplier \(\boldsymbol{\lambda }\in \mathbb{R}^{n_{\mathrm{e}}}\) are added to the integrator by appending them to the action integral:

$$ S(\boldsymbol{z},\dot{\boldsymbol{z}},\boldsymbol{\lambda }) = S_{0}( \boldsymbol{z},\dot{\boldsymbol{z}}) + \int _{t_{0}}^{t_{N}} \boldsymbol{\lambda }^{\mathsf{T}}\boldsymbol{g}(\boldsymbol{z}) ~ \mathrm{d}t. $$
(18)

Accordingly, the discrete action sum changes to

$$ S_{\mathrm{d}}(\boldsymbol{z}_{k},\dot{\boldsymbol{z}}_{k}, \boldsymbol{\lambda }_{k}) = S_{\mathrm{d},0}(\boldsymbol{z}_{k}, \dot{\boldsymbol{z}}_{k})+\sum _{k=0}^{N-1}\boldsymbol{\lambda }_{k}^{ \mathsf{T}}\boldsymbol{g}(\boldsymbol{z}_{k}) \Delta t. $$
(19)

Taking the gradient of (19) with respect to \(\boldsymbol{x}_{1}\) and \(\boldsymbol{q}_{1}\) yields the constrained implicit discretized dynamics

$$\begin{aligned} \boldsymbol{d}(\dot{\boldsymbol{z}}_{1},\boldsymbol{\lambda }_{1}) = \boldsymbol{d}_{0}(\dot{\boldsymbol{z}}_{1}) - \boldsymbol{G}( \boldsymbol{z}_{1})^{\mathsf{T}}\boldsymbol{\lambda }_{1} &= \boldsymbol{0}, \end{aligned}$$
(20a)
$$\begin{aligned} \boldsymbol{g}\left (\boldsymbol{z}_{2}\left (\dot{\boldsymbol{z}}_{1} \right )\right ) &= \boldsymbol{0}, \end{aligned}$$
(20b)

where

$$ \boldsymbol{G}(\boldsymbol{z}) = \begin{bmatrix} \frac{\partial \boldsymbol{g}(\boldsymbol{z})}{\partial \boldsymbol{x}} & \frac{\partial \boldsymbol{g}(\boldsymbol{z})}{\partial ^{\mathrm{r}} \boldsymbol{q}} \end{bmatrix} . $$
(21)

Physically, the constraint forces \(\boldsymbol{G}(\boldsymbol{z}_{1})^{\mathsf{T}}\boldsymbol{\lambda }_{1}\) act on the rigid bodies to guarantee satisfaction of constraints \(\boldsymbol{g}=\boldsymbol{0}\). Mathematically, \(\boldsymbol{\lambda }\) serves a similar purpose as Lagrange multipliers in constrained optimization.

The integration step starting from \(\boldsymbol{z}_{0}\) and \(\dot{\boldsymbol{z}}_{0}\) is calculated as follows. First, \(\boldsymbol{z}_{1}\) is calculated from the update rules (11) and (17). Then, the nonlinear system of equations (20a)–(20b) is solved for \(\dot{\boldsymbol{z}}_{1}\) and \(\boldsymbol{\lambda }_{1}\). Note that the constraints are fulfilled for \(\boldsymbol{z}_{2}\), which depends on \(\dot{\boldsymbol{z}}_{1}\) through update rules (11) and (17). Therefore, the resulting velocity \(\dot{\boldsymbol{z}}_{1}\) always ensures constraint satisfaction for the next position \(\boldsymbol{z}_{2}\).

3.3 Inequality constrained dynamics

Inequality constraint functions \(\boldsymbol{\phi }(\boldsymbol{z})\) with Lagrange multipliers \(\boldsymbol{\gamma }\in \mathbb{R}^{n_{\mathrm{i}}}\) are added to the integrator in a similar fashion. Physically, the multipliers \(\boldsymbol{\gamma }\) are the magnitudes of the normal forces at the contacts. To add the constraints to the dynamics, they are discretized and formulated as a nonlinear complementarity problem (NCP)

$$\begin{aligned} \boldsymbol{\phi }(\boldsymbol{z}_{k}) &\geq \boldsymbol{0}, \end{aligned}$$
(22a)
$$\begin{aligned} \boldsymbol{\gamma }_{k} &\geq \boldsymbol{0}, \end{aligned}$$
(22b)
$$\begin{aligned} \boldsymbol{\phi }(\boldsymbol{z}_{k})^{\mathsf{T}} \boldsymbol{\gamma }_{k} &= \boldsymbol{0}, \end{aligned}$$
(22c)

with element-wise ≥, for which we use the standard shorthand notation

$$ \boldsymbol{0}\leq \boldsymbol{\phi }(\boldsymbol{z}_{k})\perp \boldsymbol{\gamma }_{k}\geq \boldsymbol{0}. $$
(23)

The resulting dynamics are

$$\begin{aligned} \boldsymbol{d}(\dot{\boldsymbol{z}}_{1},\boldsymbol{\gamma }_{1}) = \boldsymbol{d}_{0}(\dot{\boldsymbol{z}}_{1}) - \boldsymbol{N}( \boldsymbol{z}_{1})^{\mathsf{T}}\boldsymbol{\gamma }_{1} &= \boldsymbol{0}, \end{aligned}$$
(24a)
$$\begin{aligned} \boldsymbol{0}\leq \boldsymbol{\phi }(\boldsymbol{z}_{2}( \dot{\boldsymbol{z}}_{1}))\perp \boldsymbol{\gamma }_{1}&\geq \boldsymbol{0}, \end{aligned}$$
(24b)

where

$$ \boldsymbol{N}(\boldsymbol{z}) = \begin{bmatrix} \frac{\partial \boldsymbol{\phi }(\boldsymbol{z})}{\partial \boldsymbol{x}} & \frac{\partial \boldsymbol{\phi }(\boldsymbol{z})}{\partial ^{\mathrm{r}} \boldsymbol{q}} \end{bmatrix} . $$
(25)

The integration step starting from \(\boldsymbol{z}_{0}\) and \(\dot{\boldsymbol{z}}_{0}\) is again performed by first calculating \(\boldsymbol{z}_{1}\) from (11) and (17), and then solving (24a)–(24b) for \(\dot{\boldsymbol{z}}_{1}\) and \(\boldsymbol{\gamma }_{1}\).

3.4 Friction dynamics

The friction dynamics are included in the variational integrator by discretizing the maximum dissipation principle (2a)–(2c):

min β z ˙ k T B ( z k ) T β k ,
(26a)
s.t. E T β k C f γ k ,
(26b)
β k 0.
(26c)

Note that the normal force multipliers \(\boldsymbol{\gamma }\) result from contact inequality constraints in the form of (22a)–(22c).

As with the contact constraint, we formulate (26a)–(26c) as an NCP with Lagrange multipliers \(\boldsymbol{\psi }_{k}\in \mathrm{R}^{n_{\mathrm{c}}}\)—the tangential velocities at the contact points—and \(\boldsymbol{\eta }_{k}\in \mathbb{R}^{n_{\mathrm{c}}n_{\mathrm{f}}}\):

$$\begin{aligned} \boldsymbol{B}(\boldsymbol{z}_{k})\dot{\boldsymbol{z}}_{k} + \boldsymbol{E}\boldsymbol{\psi }_{k} - \boldsymbol{\eta }_{k} &= \boldsymbol{0}, \end{aligned}$$
(27a)
$$\begin{aligned} \boldsymbol{0} \leq \boldsymbol{C}_{\mathrm{f}}\boldsymbol{\gamma }_{k} - \boldsymbol{E}^{\mathsf{T}}\boldsymbol{\beta }_{k} \perp \boldsymbol{\psi }_{k} &\geq \boldsymbol{0}, \end{aligned}$$
(27b)
$$\begin{aligned} \boldsymbol{0} \leq \boldsymbol{\beta }_{k} \perp \boldsymbol{\eta }_{k} &\geq \boldsymbol{0}. \end{aligned}$$
(27c)

Given a mechanism with \(n_{\mathrm{c}}\) contact inequality constraints with friction, the resulting dynamics are

$$\begin{aligned} \boldsymbol{d}(\dot{\boldsymbol{z}}_{1},\boldsymbol{\gamma }_{1}, \boldsymbol{\beta }_{1},\boldsymbol{\psi }_{1},\boldsymbol{\eta }_{1}) = \boldsymbol{d}_{0}(\dot{\boldsymbol{z}}_{1}) - \boldsymbol{N}( \boldsymbol{z}_{1})^{\mathsf{T}}\boldsymbol{\gamma }_{1} - \boldsymbol{B}(\boldsymbol{z}_{1})^{\mathsf{T}}\boldsymbol{\beta }_{1} &= \boldsymbol{0}, \end{aligned}$$
(28a)
$$\begin{aligned} \boldsymbol{0}\leq \boldsymbol{\phi }(\boldsymbol{z}_{2}( \dot{\boldsymbol{z}}_{1}))\perp \boldsymbol{\gamma }_{1}&\geq \boldsymbol{0}, \end{aligned}$$
(28b)
$$\begin{aligned} \boldsymbol{B}(\boldsymbol{z}_{1})\dot{\boldsymbol{z}}_{1} + \boldsymbol{E}\boldsymbol{\psi }_{1} - \boldsymbol{\eta }_{1} &= \boldsymbol{0}, \end{aligned}$$
(28c)
$$\begin{aligned} \boldsymbol{0} \leq \boldsymbol{C}_{\mathrm{f}}\boldsymbol{\gamma }_{1} - \boldsymbol{E}^{\mathsf{T}}\boldsymbol{\beta }_{1} \perp \boldsymbol{\psi }_{1} &\geq \boldsymbol{0}, \end{aligned}$$
(28d)
$$\begin{aligned} \boldsymbol{0} \leq \boldsymbol{\beta }_{1} \perp \boldsymbol{\eta }_{1} &\geq \boldsymbol{0}, \end{aligned}$$
(28e)

and the integration step starting from \(\boldsymbol{z}_{0}\) and \(\dot{\boldsymbol{z}}_{0}\) is performed by first calculating \(\boldsymbol{z}_{1}\) from (11) and (17), and then solving (28a)–(28e) for \(\dot{\boldsymbol{z}}_{1}\), \(\boldsymbol{\gamma }_{1}\), \(\boldsymbol{\beta }_{1}\), \(\boldsymbol{\psi }_{1}\), and \(\boldsymbol{\eta }_{1}\).

3.5 Complete integrator

Putting all components together, the simulation of one time step given \(\boldsymbol{z}_{0}\) and \(\dot{\boldsymbol{z}}_{0}\) is performed by first calculating \(\boldsymbol{z}_{1}\) from (11) and (17). Subsequently, the constrained implicit dynamics must be solved:

$$\begin{aligned} \boldsymbol{d}(\dot{\boldsymbol{z}}_{1},\boldsymbol{\lambda }_{1}, \boldsymbol{\gamma }_{1},\boldsymbol{\beta }_{1} ,\boldsymbol{\psi }_{1},\boldsymbol{\eta }_{1}) = \boldsymbol{d}_{0}(\dot{\boldsymbol{z}}_{1}) - \boldsymbol{G}^{ \mathsf{T}}\boldsymbol{\lambda }_{1} - \boldsymbol{N}^{\mathsf{T}} \boldsymbol{\gamma }_{1} - \boldsymbol{B}^{\mathsf{T}} \boldsymbol{\beta }_{1} &= \boldsymbol{0}, \end{aligned}$$
(29a)
$$\begin{aligned} \boldsymbol{g}\left (\boldsymbol{z}_{2}(\dot{\boldsymbol{z}}_{1}) \right ) &= \boldsymbol{0}, \end{aligned}$$
(29b)
$$\begin{aligned} \boldsymbol{0}\leq \boldsymbol{\phi }\left (\boldsymbol{z}_{2}( \dot{\boldsymbol{z}}_{1})\right )\perp \boldsymbol{\gamma }_{1}&\geq \boldsymbol{0}, \end{aligned}$$
(29c)
$$\begin{aligned} \boldsymbol{B}\dot{\boldsymbol{z}}_{1} + \boldsymbol{E} \boldsymbol{\psi }_{1} - \boldsymbol{\eta }_{1} &= \boldsymbol{0}, \end{aligned}$$
(29d)
$$\begin{aligned} \boldsymbol{0} \leq \boldsymbol{C}_{\mathrm{f}} \boldsymbol{\gamma }_{1} - \boldsymbol{E}^{\mathsf{T}} \boldsymbol{\beta }_{1} \perp \boldsymbol{\psi }_{1} & \geq \boldsymbol{0}, \end{aligned}$$
(29e)
$$\begin{aligned} \boldsymbol{0} \leq \boldsymbol{\beta }_{1} \perp \boldsymbol{\eta }_{1} &\geq \boldsymbol{0} . \end{aligned}$$
(29f)

Note that (29a)–(29f) contains the equations and constraints for all bodies of a mechanism.

If the initial velocity \(\dot{\boldsymbol{z}}_{0}\) is unknown, and a non-constraint-fulfilling one is chosen, an error occurs at the first time step. The magnitude of the error depends on the magnitude of the initial constraint violation. The discrete Legendre transform can be used to determine a constraint-fulfilling initial velocity. We refer to the extensive explanation in [30] for details on initializing a simulation with the Legendre transform.

4 Numerical solver

The variational integrator (29a)–(29f) can be summarized as a system of nonlinear equations with inequality constraints:

$$\begin{aligned} \boldsymbol{f}(\boldsymbol{s})&=\boldsymbol{0}, \end{aligned}$$
(30a)
$$\begin{aligned} \boldsymbol{h}(\boldsymbol{s})&\geq \boldsymbol{0}, \end{aligned}$$
(30b)

with solution vector \(\boldsymbol{s} = [\dot{\boldsymbol{z}}_{1}^{\mathsf{T}}\ \boldsymbol{\lambda }_{1}^{\mathsf{T}}\ \boldsymbol{\gamma }_{1}^{ \mathsf{T}}\ \boldsymbol{\beta }_{1}^{\mathsf{T}}\ \boldsymbol{\psi }_{1}^{ \mathsf{T}}\ \boldsymbol{\eta }_{1}^{\mathsf{T}}]^{\mathsf{T}}\). In this section, the algorithms derived for solving the system (30a)–(30b) are applicable to the class of Newton-based root-finding methods. Certain methods of this class, for example, interior-point methods [44], introduce slack variables \(\boldsymbol{\sigma }\) and additional constraints \(\boldsymbol{h}(\boldsymbol{s})=\boldsymbol{\sigma }\) to facilitate the numerical treatment of the inequality constraints. Since the slack variables match the inequality constraints up to the desired solution tolerance, the theoretical properties of the integrator still hold. The solution vector \(\boldsymbol{s}\) and the system \(\boldsymbol{f}(\boldsymbol{s})\) would be extended by these slack variables and constraints, but the computational complexity does not change if each inequality constraint and its associated slack constraint are considered as a single node in a graph. Moreover, the graph-based argument is not limited to mechanical systems and can be implemented for any graph-based system, but we will restrict the discussion to mechanical systems.

At the core, Newton-based methods iteratively produce solution approximations for (30a)–(30b) with the procedure

$$ \boldsymbol{s}^{(i+1)} = \boldsymbol{s}^{(i)} - \boldsymbol{F}( \boldsymbol{s}^{(i)})^{-1}\boldsymbol{f}(\boldsymbol{s}^{(i)}), $$
(31)

where

$$ \boldsymbol{F}(\boldsymbol{s}) = \frac{\partial \boldsymbol{f}(\boldsymbol{s})}{\partial \boldsymbol{s}}. $$
(32)

Numerically, (31) is formulated as a linear system of equations

$$ \boldsymbol{F}(\boldsymbol{s}^{(i)})\Delta \boldsymbol{s}^{(i)} = - \boldsymbol{f}(\boldsymbol{s}^{(i)}), $$
(33)

where the result \(\Delta \boldsymbol{s}^{(i)}\) is used to obtain \(\boldsymbol{s}^{(i+1)} = \boldsymbol{s}^{(i)} + \Delta \boldsymbol{s}^{(i)}\).

Linear systems of the form (33) are solved with decomposition and backsubstitution with overall cubic computational complexity \(\mathcal{O}(n^{3})\). For general integrators, including the variational integrator derived in the previous section, (33) is neither symmetric nor block symmetric. Therefore, the LDU decomposition [45] for asymmetric systems is chosen as the foundation of the algorithms. While (33) is asymmetric, its sparsity pattern, i.e., the zero and non-zero entries, is block-symmetric, and the following algorithms exploit this sparsity to improve complexity.

4.1 Linear-complexity algorithm

For mechanisms without kinematic loops, the LDU decomposition can be modified to obtain decomposition and backsubstitution with linear computational complexity \(\mathcal{O}(n)\), where \(n\) is the number of nodes in the corresponding graph. This modification is achieved by taking into account the graph representing the components of a mechanism and its associated \(\boldsymbol{F}\) matrix. Consider, for example, the mechanism and graph in Fig. 2.

Fig. 2
figure 2

(a) A mechanism with five links and four joints. (b) A graph representing the mechanism and its matrix. Squares represent links, circles represent joints. Coloring represents different levels of the tree-shaped graph

According to [46], the \(\boldsymbol{F}\) matrix corresponding to an acyclic graph, i.e., a mechanism without kinematic loops, can be decomposed with linear complexity by traversing the graph from leaves to root.

A depth-first search (DFS) starting from the (arbitrary) root is performed to find the correct processing order. The found nodes are stored in a list with the root as the last element and the last-found node as the first element. This list is then used in the modified LDU decomposition (Algorithm 1) and backsubstitution (Algorithm 2). Note that in the algorithms, vector indices \(i\) stand for the respective rows of node \(i\), and matrix indices \(i\), \(j\) stand for the respective rows of node \(i\) and columns of node \(j\).

Algorithm 1
figure 3

Sparse in-place LDU decomposition, complexity \(\mathcal{O}(n)\)

Algorithm 2
figure 4

Sparse in-place LDU backsubstitution, complexity \(\mathcal{O}(n)\)

The decomposition in Algorithm 1 processes the matrix \(\boldsymbol{F}\) according to the graph structure from leaves to root. Additionally, for each node, computations are only performed for connected components, since computations for disconnected components are zero in the LDU decomposition. The linear complexity is a direct result.

Decomposition complexity: In an acyclic graph with \(n\) nodes, each node has at most one parent, so there are \(\mathcal{O}(n)\) children (and \(\mathcal{O}(n)\) parents). Therefore, a total of \(\mathcal{O}(n)\) evaluations of the for-loop on line 2 of Algorithm 1 are required. The result is a linear complexity \(\mathcal{O}(n)\).

Backsubstitution complexity: In an acyclic graph with \(n\) nodes, there are \(\mathcal{O}(n)\) children and \(\mathcal{O}(n)\) parents. Therefore, a total of \(\mathcal{O}(n)\) evaluations of the for-loops on lines 3 and 9 of Algorithm 2 are required. The result is a linear complexity \(\mathcal{O}(n)\).

Articulated mechanism example

The comparison of dense and sparse LDU decomposition for the example in Fig. 2 is shown in Fig. 3.

Fig. 3
figure 5

Matrices for the mechanism in Fig. 2 with matching color scheme. Fill-in indicated with “•”. (a) Unordered matrix with fill-in after LDU decomposition. (b) Rearranged matrix without fill-in after LDU decomposition

The matrices in Fig. 3 have off-diagonal entries only at the intersection of directly connected nodes. For the example mechanism in Fig. 2, the diagonal entries \(\boldsymbol{D}_{1}\) to \(\boldsymbol{D}_{5}\) are the derivatives of the dynamics \(\boldsymbol{d}\) of each body, and the off-diagonal entries \(\boldsymbol{c}_{ij}\) are the equality constraint derivatives representing joints between two bodies. Note that when processing the matrix in the wrong order, a so-called fill-in is created. This fill-in occurs at the off-diagonals of nodes indirectly connected through a node that is processed before it becomes a leaf. Since fill-in must also be processed in the decomposition and backsubstitution, linear complexity is no longer achieved.

Environment contact example

A direct result of the linear-complexity property for acyclic graphs is the following. Mechanisms without kinematic loops and only environment contact, i.e., no contact between bodies, correspond to acyclic graphs and, therefore, have linear complexity in the number of bodies and contact points. Consider the exemplary mechanism, graph, and matrix in Fig. 4.

Fig. 4
figure 6

(a) A walking mechanism with four links, three joints, and two contact points. (b) A graph representing the mechanism. (c) The corresponding rearranged matrix without fill-in after decomposition

Since all environment contacts are leaves in the graph, adding contact points leads to linear scaling with the correct processing order. This result is especially interesting for bipedal or quadrupedal walking robots.

4.2 Reduced-fill-in algorithm

If the graph of a mechanism has cycles, fill-in can generally no longer be avoided entirely, since nodes of a cycle are never leaves. However, by processing the leaves attached to a cycle first, the amount of fill-in is reduced. As an example, the mechanism from Fig. 2 is modified to contain a kinematic loop, displayed in Fig. 5.

Fig. 5
figure 7

(a) A mechanism with five links and four joints containing a kinematic loop. (b) A cyclic graph representing the mechanism and its matrix

Note that for computational reasons, two joints are combined into node 6, since these two joints form the beginning of the kinematic loop. Such loop-openers are also found with the depth-first search, since they are simply the first and last joints in a detected loop. In the algorithms, a distinction is made between nodes that are part of a cycle and nodes that are not. This distinction is also determined by the depth-first search.

Since any square matrix can be represented by a (potentially cyclic) graph, the following algorithms are applicable to all such matrices. In the case of mechanical systems, they can be used for all components defined in Sect. 2. The sparse decomposition for such systems is formulated in Algorithm 3 and the backsubstitution in Algorithm 4.

Algorithm 3
figure 8

Sparse in-place LDU decomposition, complexity \(\mathcal{O}(n+kn^{2})\)

Algorithm 4
figure 9

Sparse in-place LDU backsubstitution, complexity \(\mathcal{O}(n+nk)\)

Note that all lists in Algorithms 3 and 4 are sorted in the same order as the depth-first search list.

Decomposition complexity: In a cyclic graph with \(n\) nodes and \(k\) cycles, there are \(\mathcal{O}(n)\) acyclic children and \(\mathcal{O}(n)\) acyclic parents. Additionally, there are \(\mathcal{O}(n)\) cyclic children and \(\mathcal{O}(n)\) loop-opening parents per cycle, since each cycle contains at most all nodes and each of these nodes has the same loop-opening parent. Therefore, a total of \(\mathcal{O}(n)\) evaluations of the for-loop on line 2 of Algorithm 3 and a total of \(\mathcal{O}(kn^{2})\) evaluations of the for-loop on line 8 are required. Resulting is a complexity \(\mathcal{O}(n+kn^{2})\)—quadratic in \(n\) and linear in \(k\).

Backsubstitution complexity: In a cyclic graph with \(n\) nodes and \(k\) cycles, there are \(\mathcal{O}(n)\) acyclic children, \(\mathcal{O}(n)\) acyclic parents, \(\mathcal{O}(n)\) cyclic children per cycle, and \(\mathcal{O}(n)\) loop-opening parents per cycle. Therefore, a total of \(\mathcal{O}(n+nk)\) evaluations of the for-loop on line 3 of Algorithm 4 and a total of \(\mathcal{O}(n+nk)\) evaluations of the for-loop on line 9 are required. This results in complexity \(\mathcal{O}(n+nk)\)—linear in \(n\) and linear in \(k\).

In the worst case of a fully connected graph, i.e., a fully dense matrix, each node has \(\mathcal{O}(n)\) cyclic children and parents, resulting in a complexity of \(\mathcal{O}(n+n^{3})\). However, for real systems, the theoretical complexities are often too conservative. For a system without intersecting cycles, i.e., each node belongs to at most a single cycle, there are a total of \(\mathcal{O}(n)\) cyclic children and parents and, therefore, the overall complexity is \(\mathcal{O}(n+n^{2})\). In the case of a constant cycle size, there is a total of \(\mathcal{O}(k)\) cyclic children and parents, and a linear complexity \(\mathcal{O}(n + k)\) is obtained. In a combined setting with non-intersecting cycles of fixed size, for example, a mechanism with disconnected identical legs made of kinematic loops, there are \(\mathcal{O}(n)\) cycles with \(\mathcal{O}(1)\) cyclic children and parents each, resulting in a linear complexity \(\mathcal{O}(n)\). Improvements in the complexity are theoretically possible, but finding a processing order that creates the minimum fill-in is generally NP-hard [46].

The comparison of dense and sparse decomposition for the example in Fig. 4 with additional dampers at each joint is shown in Fig. 6.

Fig. 6
figure 10

Matrices for the mechanism in Fig. 5. (a) Disordered, almost fully dense matrix due to fill-in after LDU decomposition. (b) Rearranged matrix with minimal fill-in after LDU decomposition

The dampers depend on the relative velocity between the two connected bodies, leading to additional off-diagonal entries. As Fig. 6 shows, with a bad processing order, an almost fully dense matrix is obtained due to fill-in, while the correct processing order creates fill-in only at the off-diagonals of nodes that are part of cycles and the loop-openers.

5 Evaluation

The evaluation of the simulator is comprised of four parts. First, the physical accuracy of the variational integrator is analyzed. Then, the runtime and computational complexity of the graph-based algorithms are investigated. Next, the numerical robustness of the simulator is tested. Lastly, two application examples for the simulator are given. Comparisons are made to different simulators and integrators. We compare to the dynamics simulator RigidBodyDynamics [47] as it is written in the same programming language as our integrator, Julia [48], and to the widely used simulator MuJoCo, which is representative for soft constraint handling. Variational integrator comparisons are made with two minimal-coordinate integrators [28, 29] and an integrator in redundant coordinates for constrained systems [17]. In order to solve the integrator equations (29a)–(29f), a basic interior-point method is used and described in Appendix D. Note that the focus of the implementation is on the theoretical properties and not runtime optimization. Nonetheless, even this basic implementation achieves reasonable timing results, benefitting from the easy and modular implementation in maximal coordinates. Code for the simulatorFootnote 1 and the graph-based system solverFootnote 2 including all experiments and additional examples is available in Julia. All experiments are carried out on an Asus ZenBook with an i7-8565U CPU and 16 GB RAM.

5.1 Physical accuracy

The physical accuracy of the simulator is examined in four scenarios: constraint drift, energy conservation, energy dissipation, and contact violation. A comparison to commonly used alternative implementations is provided for reference. The results are displayed in Fig. 7.

Fig. 7
figure 11

Evaluation of the physical accuracy of the variational integrator (blue) with comparison (red). (a) Zero constraint drift with variational integrator, drift resulting from explicit 4th-order Runge–Kutta. (b) Bounded energy error with variational integrator for conservative system, increasing error with 2nd-order Runge–Kutta. (c) Small energy error with variational integrator for dissipative system, large error with 1st-order Runge–Kutta. (d) No contact violation for box drop with rigid contact formulation, violation with MuJoCo’s soft formulation. (Color figure online)

For the constraint drift in Fig. 7 (a), a three-link mechanism forming a kinematic loop is used with link lengths \(l_{1}=1\text{ m}\), \(l_{2}={\frac{\sqrt{2}}{2}}\text{ m}\), \(l_{3}=1\text{ m}\), masses \(m_{1}=1\text{ kg}\), \(m_{2}={\frac{\sqrt{2}}{2}}\text{ kg}\), \(m_{3}=1\text{ kg}\), and a step size \(\Delta t=0.01\text{ s}\). In minimal coordinates, the loop-closure constraints must be explicitly enforced. In non-variational integrators, the dynamics and constraints are formulated on an acceleration level. Without constraint stabilization [49], i.e., a spring-damper connection, the links of the mechanism start to drift apart. The explicit 4th-order Runge–Kutta–Munte–Kaas integrator [25] exhibits constraint drift without such stabilization. The 1st-order variational integrator prevents constraint drift entirely and does not require constraint stabilization.

The energy conservation in Fig. 7 (b) is evaluated on a frictionless double pendulum with link lengths \(l=1\text{ m}\), masses \(m=1\text{ kg}\), and a step size \(\Delta t=0.01\text{ s}\). Unlike variational integrators, strictly explicit or implicit Runge–Kutta integrators do not generally have tight bounds on the energy error for conservative mechanical systems, although certain methods, for example symmetric ones, do [25]. The explicit 2nd-order Runge–Kutta method (Heun’s method) injects energy into the system, while the energy error stays bounded for the variational integrator.

Accurate energy dissipation is an important property, for example, in passivity-based control approaches [50]. The dissipation behavior in Fig. 7 (c) is evaluated on a damped pendulum with link length \(l=1\text{ m}\), mass \(m=1\text{ kg}\), joint damping \(d=\frac{1}{2}\text{N}\frac{\text{s}}{\text{m}}\) and a step size \(\Delta t=0.1\text{ s}\). Euler’s method used for comparison shows poor dissipation behavior in drastically underdamping the pendulum, whereas the variational integrator demonstrates good dissipation performance after a small initial error.

Correct simulation of rigid contacts is crucial for transferring learned or optimized control policies from simulation to real systems. To compare rigid and soft contacts, a cube with edge length \(l=0.5\text{ m}\), mass \(m=1\text{ kg}\), and step size \(\Delta t=0.01\text{ s}\) is dropped from a height (bottom-to-ground) \(h=0.4\text{ m}\). MuJoCo’s default solver parameters (\(\mathrm{solimp} = (0.9, 0.95, 0.001, 0.5, 2)\), \(\mathrm{solref} = (0.02, 1)\)) and default Euler integrator are used and result in a ground violation of \(2.7\text{ cm}\). We also analyzed drops from other heights up to 1 m, which resulted in similar violations. In contrast, the rigid contact formulation with the variational integrator in maximal coordinates stops 43\(\mathrm{\mu}\)m above the ground due to the interior-point formulation, which practically satisfies the constraint.

5.2 Computational complexity

The evaluation of the computational complexity serves two purposes. We show that the complexity of the graph-based algorithms holds in practice, and by using these algorithms, maximal coordinates can achieve competitive timing results compared to minimal coordinates, despite their larger dimension. For all timings, the best result of 100 samples is used to diminish right-skewing computer noise. The linear complexity of the simulator and a comparison to minimal coordinates are shown in Fig. 8. The performance for systems with kinematic loops and comparisons to a dense solver are displayed in Fig. 9.

Fig. 8
figure 12

Computation time comparison for simulating 1000 time steps with our simulator (blue) compared to RigidBodyDynamics (red). (a) n-link pendulum with revolute (solid) and spherical (dashed) joints. (b) n-sphere chain with contacts and spherical joints. (Color figure online)

Fig. 9
figure 13

Evaluation and comparison of our methods (blue) for cyclic structures compared to standard dense implementations (red). (a) Simulation of 1000 time steps of a chain consisting of 4-link segments. (b) Solving of linear system from ladder graph. (c) Solving of linear system from net graph. (d) Solving of linear system from crystal graph. (Color figure online)

Figure 8 (a) compares the computation time of our simulator with the RigidBodyDynamics simulator for 1000 time steps of \(n\)-link pendulums with link length \(l=1\text{ m}\), mass \(m=1\text{ kg}\), and step size \(\Delta t=0.01\text{ s}\). The comparison is made for revolute and spherical (ball-and-socket) joints between the links. The main result is that despite the higher dimension of maximal coordinates, comparable computation times are achieved. Minimal coordinates are naturally faster for revolute joints, as they have the smallest number of degrees of freedom in minimal coordinates and the most constraints in maximal coordinates. On the other hand, spherical joints perform worse in minimal coordinates as these joints increase their state dimension and reduce the number of constraints in maximal coordinates. The linear complexity for both minimal and maximal coordinates becomes clearly visible.

In Fig. 8 (b), the linear complexity for contacts is demonstrated for a chain of spheres with radius \(r=0.25\text{ m}\), mass \(m=1\text{ kg}\), and a step size \(\Delta t=0.01\text{ s}\). The spheres are connected by spherical joints. A comparison to RigidBodyDynamics is not possible due to the limited support of contacts. Besides the complexity, this experiment demonstrates the numerical robustness of the maximal-coordinate approach for treating contacts. We also implemented the experiment in MuJoCo. While faster, MuJoCo with default solver parameters consistently has contact violations of 10 cm or more when simulating more than 20 spheres and fails to compute chains of more than 44 spheres. Therefore, a fair comparison is difficult.

The computation time for simulating 1000 time steps of a chain of 4-link segments is shown in Fig. 9 (a). The links have a length \(l=1\text{ m}\), mass \(m=1\text{ kg}\), and the step size is \(\Delta t=0.01\text{ s}\). While no longer fully linear, the increase in computation time for this system is modest due to the reduction of fill-in. A simulation of this system with the RigidBodyDynamics simulator failed for more than one 4-link segment, even for smaller step sizes. Numerically, MuJoCo can simulate this system successfully, but with the default solver parameters, the explicit loop-closure constraints introduce high damping into the system, resulting in bad energy conservation behavior.

Comparisons of the sparse graph-based system solver with a dense one are shown in Figs. 9 (c)–(d). The ladder graph consists of cycles with four nodes each, and two nodes are shared by two cycles, except for the first and last two nodes. The net graph is a square net of nodes, where inner nodes are part of four cycles, edge nodes part of two, and corner nodes part of one. The crystal graph is a three-dimensional cubic structure of nodes, where inner nodes are part of twelve cycles, surface nodes part of eight, edge nodes part of five, and corner nodes part of three. The graphs with \(n\) nodes are represented by matrices with \(6n\) entries. The sparse algorithms outperform the dense ones in most cases, often by more than two orders of magnitude. The crystal graph relates to a rather dense matrix, resulting in decreasing advantage of the sparse approach over the dense one and slightly better performance for \(6^{3} = 216\) nodes. This last example is an extreme case. The algorithms are aimed at robotic systems, which typically have significantly fewer nodes and cycles, and the sparse performance is convincing for such systems.

5.3 Numerical robustness

We investigate three different scenarios regarding the numerical robustness of the simulator. A comparison with two state-of-the-art variational integrators in minimal coordinates is made to demonstrate the ability to simulate systems with decreasing solution tolerance. The results are displayed in Fig. 10. Since constrained mechanical systems are known to become ill-conditioned with decreasing step sizes [51, 52], we show the ability to simulate systems for reasonable step sizes in Fig. 11. A double-four-bar linkage—a commonly-used benchmark problem that exhibits singular constraint configurations—is simulated, and the results are compared to another variational integrator for constrained systems in Fig. 12.

Fig. 10
figure 14

Performance comparison simulating 1000 time steps of n-link pendulums with varying solution tolerances. Comparison of our 1st-order algorithm (blue) to a 2nd-order (red) and a 3rd-order variational integrator (yellow). (a) Tolerance of \(10^{-6}\). (b) Tolerance of \(10^{-8}\). (c) Tolerance of \(10^{-10}\). (Color figure online)

Fig. 11
figure 15

Evaluation of the solver performance for pendulums of scale \(s=1\) (solid pink), \(s=0.5\) (dashed yellow), and \(s=0.1\) (dotted green) for scaled step sizes. (a) Mean iteration number. (b) Error of final angle. (Color figure online)

Fig. 12
figure 16

Comparison of the double four-bar linkage benchmark system for our algorithm (blue) and the variational GGL method (red). (a) Simulation trajectory. (b) Energy error. (c) Constraint violation. (d) Energy components. (Color figure online)

In Fig. 10, we compare our algorithm to state-of-the-art second- and third-order variational integrators with the same n-link pendulum we use for the timing comparison in Sect. 5.2. Only revolute joints and loop-free structures are tested as ball-and-socket joints, and loop-closures were, to the best of our knowledge, not part of the implementations of these integrators. To demonstrate the robustness of our algorithm, we run simulations with solution tolerances of \(10^{-6}\), \(10^{-8}\), and \(10^{-10}\) for the Newton methods in all algorithms. All three algorithms display the theoretical linear computational complexity, and despite the higher dimensionality of the maximal-coordinate algorithm, we achieve comparable timing results. However, while our algorithm successfully simulates the n-link pendulums for all tolerances, the minimal-coordinate integrators fail for smaller tolerances and an increasing number of links, hinting at potential numerical issues in minimal coordinates.

Generally, large simulation step sizes are desirable for fast simulations. However, smaller step sizes are required for physically smaller systems to capture their high-frequency motions. Moreover, smaller step sizes may be necessary to achieve highly accurate results. By simulating a physically scaled pendulum with different step sizes, we determine the time-step range for which we can successfully simulate the systems. The pendulum of scale \(s\in [1,0.5,0.1]\) has a mass \(m=s\), length \(l=s^{2}\), joint inertia \(J=\frac{1}{3}ml^{2}\), and gravitational acceleration \(g=9.81\). Accordingly, the period of the pendulum is \(T=2\pi \sqrt{\frac{2}{3}\frac{l}{g}}=2\pi \sqrt{\frac{2}{3} \frac{1}{g}}s\), i.e., proportional to the scale \(s\). The pendulum is initialized at an angle and angular velocity \([\theta _{0},\dot{\theta }_{0}]=[\frac{\pi }{2},0]\) and the simulation time is \(\frac{T}{2}\), i.e., a half swing. The results in Fig. 11 are plotted for a step size scaled by \(s\) to account for the different frequencies of the systems. An increase in the average iteration number per time step for smaller step sizes can be seen in Fig. 11 (a), which is consistent with the increased ill-conditioning of the systems. Nonetheless, the simulation is successful and achieves accurate results even for small step sizes. The accuracy is measured as \(\mathrm{norm}(-\frac{\pi }{2}-\theta _{T})\), i.e., the error of the angle at the last time step.

We simulate the double four-bar linkage, a common benchmark system [7], to evaluate the performance of our simulator since the system encounters configuration singularities during simulation when the bars are in parallel. We compare our algorithm to the GGL method in [17] and use the same notation and parameters for the system. The simulation step size is \(\Delta t=0.01\text{ s}\). As Fig. 12 (a) and (b) show, our method simulates the system with a bounded error on the energy, whereas the GGL method incurs an increasing energy error, which also leads to an increasingly wrong trajectory. Regarding the constraint satisfaction shown in Fig. 12 (c), both algorithms achieve very low constraint violations, although GGL performs better, potentially because the GGL method enforces the constraints also on a velocity level. Figure 12 (d) shows the kinetic, potential, and total energy of the system as a reference for other benchmarks. We also ran the benchmark for a step size \(\Delta t=0.001\text{ s}\), where almost no difference exists between the two methods. A computation time comparison between uncompiled Matlab and compiled Julia code is difficult, and the GGL method is not designed for numerical efficiency, but it is still worth mentioning that our algorithm was more than 100 times faster than the GGL method.

5.4 Application examples

This article focuses on the theoretical foundation for building efficient and physically accurate maximal-coordinate simulators. However, an implementation with a basic interior-point method can already be used for real-world applications. As such application examples, we show that the control parameters for a quadrupedal robot can be learned with a simple sampling-based approach and how controller gains of an exoskeleton can be personalized to impaired patients in simulation to avoid injury. The quadrupedal robot and learning progress are displayed in Fig. 13. The exoskeleton and resulting torques for original and adapted controller gains are visualized in Fig. 14.

Fig. 13
figure 17

(a) The trained walking quadruped. (b) Progress for learning to walk

Fig. 14
figure 18

(a) The donned exoskeleton in simulation. (b) Schematics of the exoskeleton kinematics (not to scale) showing the joints and kinematic loops. Offset between exoskeleton and body exaggerated. (c) Elbow torques during rehabilitation task for high gains. Within limits for healthy patient, exceeding limits for spastic patient. (d) Elbow torques for adapted gains within the limits for both patients. (Color figure online)

We use the Unitree A1 quadrupedal robot for sampling-based learning. The simulation step size is \(\Delta t = 0.001\text{ s}\). The learning algorithm is stated in Appendix E. For each episode, a random set of control parameters in the vicinity of the current set of parameters is chosen, and a simulation rollout is performed for 10 seconds. In case of progress, i.e., the walking distance increased, this set of parameters is selected as the starting point for new sample draws. The robot learns to walk a distance of more than 12 m (average velocity of \(1.2~\frac{\text{m}}{\text{s}}\)) in 100 episodes. Due to the rigid contacts of the simulation, a successful transfer of the learned control parameters to a real system is more likely than with an incorrectly soft contact model.

For the gain tuning, a 4-degrees-of-freedom (DoF) exoskeleton for an arm is used. Three DoF actuate the shoulder and upper arm, and one DoF the elbow and lower arm. The attachments of the upper and lower arm are modeled as two 3-DoF spring-damper joints with two translational and one rotational DoF. As a result, there are two connected kinematic loops in the mechanism. Commonly, attachments between the exoskeleton and the human body are modeled as pure spring-damper connections without joints to avoid such kinematic loops, for example, in [2]. However, such models are not necessarily correct. For the exoskeleton application, we assume a healthy and a spastic patient that perform a rehabilitation routine in the exoskeleton. The shoulder and elbow flexion/extension joints are supposed to follow sinusoidal trajectories (see Appendix E). A limit of 5 Nm is assumed to be a comfortable elbow torque for the patients. While the original gains stay within these limits for the healthy patient, they are exceeded for the spastic patient. By tuning the gains in simulation, an adapted set of gains adhering to the torque limits for both patients is found. With this approach, uncomfortable and potentially harmful tuning on a real patient can be avoided or at least reduced. The modeling accuracy, including proper kinematic loops, should provide closer estimates of the controller gains for the real system.

6 Conclusions

This article introduces a maximal-coordinate variational integrator and efficient graph-based solver for simulating mechanical systems with common components such as springs and dampers, actuated joints, and contacts with friction. Besides the theoretical formulation of the integrator and solver algorithms, an application-ready implementation of the simulator is provided as open-source code.

Building maximal-coordinate simulators on variational integrators is useful not just for the conservation properties and physical accuracy, but also for avoiding constraint drift in the naturally constrained formulation. The increased state dimension is treated with efficient numerical solver algorithms, which reduce the computational complexity in theory and render the simulator useable in practice for various applications. Additionally, it appears that the formulation in maximal coordinates increases the numerical robustness and allows the simulation of systems with contacts or kinematic loops that other simulators in minimal coordinates fail to compute.

Because of the simple modular formulation in maximal coordinates, additional components can be added to the integrator, such as a nonlinear friction model, joint limits, or physically correct elastic contacts. The graph-based nature of the solver algorithms also opens up the possibility of parallelizing computations on different branches of a graph.